Skip to content
This repository has been archived by the owner on Dec 8, 2021. It is now read-only.

optimize the performance of lightning #281

Merged
merged 35 commits into from
Mar 16, 2020
Merged

optimize the performance of lightning #281

merged 35 commits into from
Mar 16, 2020

Conversation

july2993
Copy link
Contributor

@july2993 july2993 commented Mar 10, 2020

What problem does this PR solve?

optimize the performance of lightning

the main performance improve is by setting SetGCPercent as 500 default compare to Use pool for mutation and Reuse slice of record. disable UpdateDeltaForTable if TxnCtx is nil should reduce 14% time in Encode according to the work by kennytm before in the comment.

What is changed and how it works?

  • set SetGCPercent as 500 default

Lightning allocates too many transient objects and heap size is small,
so garbage collections happen too frequently and lots of time is spent in GC component.

In a test of loading the table order_line.csv of 14k TPCC.
The time need for encode kv data and write step reduce from 52m4s to 37m30s when change
GOGC from 100 to 500, the total time needed to restore the table reduce near 15m too.
The cost of this is the memory of lightning at runtime grows from about 200M to 700M, but it's acceptable.

So we set the GC percentage as 500 default to reduce the GC frequency instead of 100.

  • Use pool for mutation

see commit cce3ea6

Check List

Tests

  • Unit test
  • Integration test

Side effects

Related changes

  • Need to cherry-pick to the release branch
  • Need to be included in the release note

XuHuaiyu and others added 30 commits March 10, 2020 17:34
…g/csv

Conflicts:
	lightning/mydump/csv_parser_generated.go
This expected to avoid about 3.5% of alloc_objects
alloc_objects:
  Total:   773496750  773873722 (flat, cum)  7.18%
    177            .          .           	parser.fieldIndexes = parser.fieldIndexes[:0]
    178            .          .
    179            .          .           	isEmptyLine := true
...
    225    386621314  386621314           	str := string(parser.recordBuffer) // Convert to string once to batch allocations
    226    386875436  386875436           	dst := make([]string, len(parser.fieldIndexes))
This take most alloc in WriteRows:
    ROUTINE ======================== github.com/pingcap/tidb-lightning/lightning/backend.(*importer).WriteRows in /Users/huangjiahao/go/src/github.com/pingcap/tidb-lightning/lightning/backend/importer.go
     797370418  980241246 (flat, cum)  9.09% of Total
             .          .    155:   kvs := rows.(kvPairs)
    ...
    ...
             .          .    192:   for i, pair := range kvs {
     772641868  772641868    193:           mutations[i] = &kv.Mutation{
             .          .    194:                   Op:    kv.Mutation_Put,
             .          .    195:                   Key:   pair.Key,
             .          .    196:                   Value: pair.Val,
             .          .    197:           }
             .          .    198:   }
Lightning allocates too many transient objects and heap size is small,
so garbage collections happen too frequently and lots of time is spent in GC component.

In a test of loading the table `order_line.csv` of 14k TPCC.
The time need of `encode kv data and write` step reduce from 52m4s to 37m30s when change
GOGC from 100 to 500, the total time needed reduce near 15m too.
The cost of this is the memory of lightnin at runtime grow from about 200M to 700M, but it's acceptable.

So we set the gc percentage as 500 default to reduce the GC frequency instead of 100.
has been move to Importer part
For pingcap/tidb@495f8b7
disable UpdateDeltaForTable if TxnCtx is nil
@july2993 july2993 force-pushed the xhy/refine_encode branch from 97d8711 to 0181a3f Compare March 16, 2020 11:47
@july2993 july2993 changed the title refine encode optimize the performance of lightning Mar 16, 2020
@july2993 july2993 added the status/PTAL This PR is ready for review. Add this label back after committing new changes label Mar 16, 2020
@july2993 july2993 marked this pull request as ready for review March 16, 2020 12:12
@july2993
Copy link
Contributor Author

/run-all-tests

@july2993 july2993 requested review from kennytm and 3pointer March 16, 2020 12:47
cmd/tidb-lightning/main.go Outdated Show resolved Hide resolved
lightning/backend/importer.go Outdated Show resolved Hide resolved
lightning/backend/session.go Outdated Show resolved Hide resolved
lightning/restore/restore.go Show resolved Hide resolved
lightning/mydump/csv_parser.go Show resolved Hide resolved
lightning/restore/restore.go Outdated Show resolved Hide resolved
@july2993 july2993 requested a review from kennytm March 16, 2020 14:01
@july2993
Copy link
Contributor Author

@kennytm PTAL

Copy link
Collaborator

@kennytm kennytm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kennytm kennytm added status/LGT1 One reviewer already commented LGTM (LGTM1) and removed status/PTAL This PR is ready for review. Add this label back after committing new changes labels Mar 16, 2020
Copy link
Contributor

@3pointer 3pointer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kennytm kennytm merged commit 98bc849 into master Mar 16, 2020
@kennytm kennytm deleted the xhy/refine_encode branch March 16, 2020 15:42
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
status/LGT1 One reviewer already commented LGTM (LGTM1)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants