-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
executor: load data/batch insert improvement reducing memory allocation/free #11284
Conversation
Could you provide some memory usage test results or performance test results? |
/run-all-tests |
I think it should work, we can try it on the IDC server. |
Golang using managed memory, so view rss in os level maybe useless - -, maybe can take look go's memstat https://blog.xiaoba.me/2017/09/02/how-to-play-golang-with-gdb-and-python.html.. but IMHO it's hard to observe too - - maybe take look other metric like import speed? :D |
500w data load this is tested on idc machine, speed and process mem use almost the same(not very stable, sometimes even slower) |
if uint64(cap(e.rows)) < limit { | ||
e.rows = make([][]types.Datum, 0, limit) | ||
for i := 0; uint64(i) < limit; i++ { | ||
e.rows = append(e.rows, make([]types.Datum, len(e.Table.Cols()))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we only need new limit - cap(e.rows)
types.Datum here if the limit can be dynamical change
and the other optimize maybe or not work is make([]types.Datum, (limit -cap(e.rows)) * len(e.Table.Cols()))
once and then cut slice from this big array to append
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems trace not support load data now ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lysu currently init will be called only once. in another pr making variable settable will do this
Please squash your commit message before you file the PR. |
@jackysp PTAL |
On IDC, we can use ansible to start tikv as the storage engine. You could take a look at the tidb heap separately during the test. @lysu will help. |
some tests results on idc tidb cluster, loading 500w rows sysbench table schema
|
According to the memory profile output, there is not much difference between master and current PR. |
thanks @lysu for another mem usage opt suggestion for "getLine" function, memory usage better:
|
executor/load_data.go
Outdated
@@ -16,6 +16,7 @@ package executor | |||
import ( | |||
"context" | |||
"fmt" | |||
"github.com/pingcap/tidb/util/hack" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move it to the 3rd party libs.
executor/builder.go
Outdated
@@ -681,6 +681,9 @@ func (b *executorBuilder) buildLoadData(v *plannercore.LoadData) Executor { | |||
}, | |||
} | |||
|
|||
var defaultLoadDataBatchCnt uint64 = 20000 //TODO this will be changed to variable in another pr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment should start with whitespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rest LGTM
…ing copy times fix getLine function causing mem copy problem
/run-all-tests |
1 similar comment
/run-all-tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PTAL @lysu |
/run-circleci-tests |
Codecov Report
@@ Coverage Diff @@
## master #11284 +/- ##
================================================
- Coverage 81.8226% 81.2799% -0.5428%
================================================
Files 424 423 -1
Lines 92131 90256 -1875
================================================
- Hits 75384 73360 -2024
- Misses 11480 11600 +120
- Partials 5267 5296 +29 |
Codecov Report
@@ Coverage Diff @@
## master #11284 +/- ##
===========================================
Coverage 81.6377% 81.6377%
===========================================
Files 423 423
Lines 91318 91318
===========================================
Hits 74550 74550
Misses 11467 11467
Partials 5301 5301 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What problem does this PR solve?
plz help taking a look for row buffers if they are still used after one batch commitment, this pr will make these memories reused after one batch transaction committing, by now unit tests show good.
What is changed and how it works?
do not alloc mem for every row per batch, use one mem buf for all batches
Check List
Tests
Code changes
Side effects