Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

executor: support spill intermediate data for unparalleled hash agg #25714

Merged
merged 18 commits into from
Jul 15, 2021

Conversation

wshwsh12
Copy link
Contributor

@wshwsh12 wshwsh12 commented Jun 23, 2021

What problem does this PR solve?

Issue Number: close #xxx

Problem Summary:

What is changed and how it works?

Proposal: Design

What's Changed:
Based PR #25820, introduce soft limit.
Support spilling intermediate date for unparalleled hashAgg.

How it Works:

  1. When the memory usage is higher than the quota, switch the HashAgg executor to spill-mode.
  2. When HashAgg is in spill-mode, keep the tuple in the hashMap no longer growing.
    a. If the key exists in the Map, aggreagte the result.
    b. If the key doesn't exist in the Map, spill the data to disk.
  3. After all data have been processed, output the aggreagte result in the Map, clear the Map. Then read the spilling data from disk, repeat the Step1-Step3 until all data have been aggregated.

Related changes

  • PR to update pingcap/docs/pingcap/docs-cn:

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
    Set AggregateConcurrency to 1 and run all correctness test in tidb repo.

Side effects

  • Performance regression
    • Consumes more CPU
    • Consumes more MEM

Release note

  • Support spill intermediate data for unparalleled hash agg

@ti-chi-bot ti-chi-bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 23, 2021
@github-actions github-actions bot added the sig/execution SIG execution label Jun 23, 2021
@wshwsh12 wshwsh12 marked this pull request as ready for review June 24, 2021 03:34
@wshwsh12 wshwsh12 requested a review from a team as a code owner June 24, 2021 03:34
@wshwsh12 wshwsh12 requested review from XuHuaiyu and removed request for a team June 24, 2021 03:34
@ti-chi-bot ti-chi-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 24, 2021
@wshwsh12
Copy link
Contributor Author

wshwsh12 commented Jun 28, 2021

Simple Benchmark

cpu: AMD Ryzen 7 3700X 8-Core Processor

Old Implement:
BenchmarkHashAggRows
BenchmarkHashAggRows/(execType:hash,_aggFunc:sum,_ndv:1000,_hasDistinct:false,_rows:100000,_concurrency:1,_sorted:true)
BenchmarkHashAggRows/(execType:hash,_aggFunc:sum,_ndv:1000,_hasDistinct:false,_rows:100000,_concurrency:1,_sorted:true)-16         	      79	  23354686 ns/op
BenchmarkHashAggRows/(execType:hash,_aggFunc:sum,_ndv:1000,_hasDistinct:false,_rows:1000000,_concurrency:1,_sorted:true)
BenchmarkHashAggRows/(execType:hash,_aggFunc:sum,_ndv:1000,_hasDistinct:false,_rows:1000000,_concurrency:1,_sorted:true)-16        	       7	 150919829 ns/op
BenchmarkHashAggRows/(execType:hash,_aggFunc:sum,_ndv:1000,_hasDistinct:false,_rows:10000000,_concurrency:1,_sorted:true)
BenchmarkHashAggRows/(execType:hash,_aggFunc:sum,_ndv:1000,_hasDistinct:false,_rows:10000000,_concurrency:1,_sorted:true)-16       	       1	1315785257 ns/op

New Implement:
BenchmarkHashAggRows
BenchmarkHashAggRows/(execType:hash,_aggFunc:sum,_ndv:1000,_hasDistinct:false,_rows:100000,_concurrency:1,_sorted:true)
BenchmarkHashAggRows/(execType:hash,_aggFunc:sum,_ndv:1000,_hasDistinct:false,_rows:100000,_concurrency:1,_sorted:true)-16         	      52	  20963917 ns/op
BenchmarkHashAggRows/(execType:hash,_aggFunc:sum,_ndv:1000,_hasDistinct:false,_rows:1000000,_concurrency:1,_sorted:true)
BenchmarkHashAggRows/(execType:hash,_aggFunc:sum,_ndv:1000,_hasDistinct:false,_rows:1000000,_concurrency:1,_sorted:true)-16        	       7	 145583208 ns/op
BenchmarkHashAggRows/(execType:hash,_aggFunc:sum,_ndv:1000,_hasDistinct:false,_rows:10000000,_concurrency:1,_sorted:true)
BenchmarkHashAggRows/(execType:hash,_aggFunc:sum,_ndv:1000,_hasDistinct:false,_rows:10000000,_concurrency:1,_sorted:true)-16       	       1	1327683063 ns/op

Memory usage test

Workload: tpch-sf=3g 1tidb 1tikv 1pd
Config: oom-action = "log",mem-quota-query = 1 << 30, and close copr-cache
User-veriables: tidb_hashagg_final_concurrency = 1; tidb_hashagg_partial_concurrency = 1;

Old Implement
[tidb]> desc analyze select avg(L_EXTENDEDPRICE) from lineitem group by L_PARTKEY   ,L_SUPPKEY , L_QUANTITY;
+---------------------------+-------------+----------+-----------+----------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+------+
| id                        | estRows     | actRows  | task      | access object  | execution info                                                                                                                                                                                                                                                                             | operator info                                                                                                                                                                                   | memory   | disk |
+---------------------------+-------------+----------+-----------+----------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+------+
| HashAgg_9                 | 14397287.20 | 16776228 | root      |                | time:4m23.7s, loops:16385                                                                                                                                                                                                                                                                  | group by:test.lineitem.l_partkey, test.lineitem.l_quantity, test.lineitem.l_suppkey, funcs:avg(Column#19, Column#20)->Column#18                                                                 | 3.14 GB  | N/A  |
| └─TableReader_10          | 14397287.20 | 17963641 | root      |                | time:3m40.2s, loops:41, cop_task: {num: 40, max: 32.5s, min: 1.55s, avg: 19.7s, p95: 31.7s, max_proc_keys: 613883, p95_proc_keys: 462536, tot_proc: 12m46.5s, tot_wait: 9.29s, rpc_num: 40, rpc_time: 13m6.7s, copr_cache: disabled}                                                       | data:HashAgg_5                                                                                                                                                                                  | 320.5 MB | N/A  |
|   └─HashAgg_5             | 14397287.20 | 17963641 | cop[tikv] |                | tikv_task:{proc max:19.4s, min:873ms, p80:18.1s, p95:19.3s, iters:17591, tasks:40}, scan_detail: {total_process_keys: 17996609, total_keys: 17996649, rocksdb: {delete_skipped_count: 0, key_skipped_count: 17996609, block: {cache_hit_count: 57170, read_count: 0, read_byte: 0 Bytes}}} | group by:test.lineitem.l_partkey, test.lineitem.l_quantity, test.lineitem.l_suppkey, funcs:count(test.lineitem.l_extendedprice)->Column#19, funcs:sum(test.lineitem.l_extendedprice)->Column#20 | N/A      | N/A  |
|     └─TableFullScan_8     | 17996609.00 | 17996609 | cop[tikv] | table:lineitem | tikv_task:{proc max:10.9s, min:511ms, p80:10.1s, p95:10.7s, iters:17591, tasks:40}                                                                                                                                                                                                         | keep order:false, stats:pseudo                                                                                                                                                                  | N/A      | N/A  |
+---------------------------+-------------+----------+-----------+----------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+------+
4 rows in set (4 min 24.836 sec)


New Implement:
[tidb]> desc analyze select avg(L_EXTENDEDPRICE) from lineitem group by L_PARTKEY   ,L_SUPPKEY , L_QUANTITY;

+---------------------------+-------------+----------+-----------+----------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+---------+
| id                        | estRows     | actRows  | task      | access object  | execution info                                                                                                                                                                                                                                                                             | operator info                                                                                                                                                                                   | memory   | disk    |
+---------------------------+-------------+----------+-----------+----------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+---------+
| HashAgg_9                 | 14397287.20 | 16776228 | root      |                | time:4m8.2s, loops:16385                                                                                                                                                                                                                                                                   | group by:test.lineitem.l_partkey, test.lineitem.l_quantity, test.lineitem.l_suppkey, funcs:avg(Column#19, Column#20)->Column#18                                                                 | 819.2 MB | 3.41 GB |
| └─TableReader_10          | 14397287.20 | 17963641 | root      |                | time:52.6s, loops:41, cop_task: {num: 40, max: 30.5s, min: 1.88s, avg: 22.7s, p95: 30.1s, max_proc_keys: 613883, p95_proc_keys: 462536, tot_proc: 14m39.4s, tot_wait: 18.7s, rpc_num: 40, rpc_time: 15m8.5s, copr_cache: disabled}                                                         | data:HashAgg_5                                                                                                                                                                                  | 137.3 MB | N/A     |
|   └─HashAgg_5             | 14397287.20 | 17963641 | cop[tikv] |                | tikv_task:{proc max:19.2s, min:1.27s, p80:17.5s, p95:18.9s, iters:17591, tasks:40}, scan_detail: {total_process_keys: 17996609, total_keys: 17996649, rocksdb: {delete_skipped_count: 0, key_skipped_count: 17996609, block: {cache_hit_count: 57170, read_count: 0, read_byte: 0 Bytes}}} | group by:test.lineitem.l_partkey, test.lineitem.l_quantity, test.lineitem.l_suppkey, funcs:count(test.lineitem.l_extendedprice)->Column#19, funcs:sum(test.lineitem.l_extendedprice)->Column#20 | N/A      | N/A     |
|     └─TableFullScan_8     | 17996609.00 | 17996609 | cop[tikv] | table:lineitem | tikv_task:{proc max:10.9s, min:731ms, p80:9.91s, p95:10.7s, iters:17591, tasks:40}                                                                                                                                                                                                         | keep order:false, stats:pseudo                                                                                                                                                                  | N/A      | N/A     |
+---------------------------+-------------+----------+-----------+----------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+---------+
4 rows in set (4 min 8.565 sec)

Memory usage in Grafana montior

Left is the old impelement, right is the new implement.
metrics

@wshwsh12 wshwsh12 mentioned this pull request Jul 2, 2021
5 tasks
@XuHuaiyu XuHuaiyu added the type/enhancement The issue or PR belongs to an enhancement. label Jul 8, 2021
executor/aggregate.go Outdated Show resolved Hide resolved
@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Jul 8, 2021
return e.spillAction
}

const maxSpillTimes = 10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment for this

Comment on lines 200 to 206
listInDisk *chunk.ListInDisk
lastChunkNum int
processIdx int
spillMode uint32
spillChunk *chunk.Chunk
spillAction *AggSpillDiskAction
childDrained bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need comments for these variables

return err
}
}
if e.spillAction != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Useless.. I remove the code now.

@@ -233,6 +243,15 @@ func (e *HashAggExec) Close() error {
if e.memTracker != nil {
e.memTracker.ReplaceBytesUsed(0)
}
if e.listInDisk != nil {
if err := e.listInDisk.Close(); err != nil {
return err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we close the chilrenExec? This may cause leaks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

e.executed, e.childDrained = false, false
e.listInDisk = chunk.NewListInDisk(retTypes(e.children[0]))
e.spillChunk = newFirstChunk(e.children[0])
if e.ctx.GetSessionVars().TrackAggregateMemoryUsage {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to check this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If tidb doesn't track aggregate executor memory usgae, should we also try to spill hashAgg when exceeded?
In addition, oom-use-tmp-storage also should be check... I add the check now. PTAL again.

@@ -922,6 +974,17 @@ func (e *HashAggExec) execute(ctx context.Context) (err error) {
for j := 0; j < e.childResult.NumRows(); j++ {
groupKey := string(e.groupKeyBuffer[j]) // do memory copy here, because e.groupKeyBuffer may be reused.
if !e.groupSet.Exist(groupKey) {
if atomic.LoadUint32(&e.spillMode) == 1 && e.groupSet.Count() > 0 {
e.spillChunk.Append(e.childResult, j, j+1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use Chunk.sel to optimize this if-block?

  1. We can check e.groupSet.Exist(groupKey) and build the sel firstly, and then invoke e.spillChunk.Append based on the sel.
  2. Further, if len(sel) == len(e.childResult), we can invoke e.listInDisk.Add(e.childResult) directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

@ti-chi-bot ti-chi-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 12, 2021
@ti-chi-bot ti-chi-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 13, 2021
@wshwsh12 wshwsh12 requested a review from XuHuaiyu July 13, 2021 07:31

// spill unprocessed data when exceeded.
if len(sel) > 0 {
err = e.spillUnprocessedData(sel)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The input argument sel is useless?

e.childResult.SetSel(sel)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.childResult.SetSel(sel) will let len(sel) == len(e.childResult) always true, and e.listInDisk.Add(e.childResult) directly. If there are only a few elements in sel, it maybe have performance issue.
I remove the logic e.listInDisk.Add(e.childResult) and always append to tmpChkForSpill, PTAL

executor/aggregate.go Outdated Show resolved Hide resolved
executor/aggregate.go Outdated Show resolved Hide resolved
executor/aggregate.go Outdated Show resolved Hide resolved
executor/aggregate.go Show resolved Hide resolved
executor/aggregate.go Outdated Show resolved Hide resolved
executor/aggregate.go Outdated Show resolved Hide resolved
listInDisk *chunk.ListInDisk // listInDisk is the chunks to store row values for spilling data.
lastChunkNum int // lastChunkNum indicates the num of spilling chunk.
processIdx int // processIdx indicates the num of processed chunk in disk.
spillMode uint32 // spillMode means that no new groups are added to hash table.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. isSpillModeSet?
  2. Add an explanation for what does 0 and 1 mean

executor/aggregate.go Outdated Show resolved Hide resolved
@wshwsh12 wshwsh12 requested a review from XuHuaiyu July 15, 2021 06:21
executor/aggregate.go Outdated Show resolved Hide resolved
executor/aggregate.go Outdated Show resolved Hide resolved
executor/aggregate.go Outdated Show resolved Hide resolved
executor/aggregate.go Outdated Show resolved Hide resolved
executor/aggregate.go Outdated Show resolved Hide resolved
executor/aggregate.go Outdated Show resolved Hide resolved
executor/aggregate.go Outdated Show resolved Hide resolved
executor/aggregate.go Outdated Show resolved Hide resolved
executor/aggregate.go Outdated Show resolved Hide resolved
executor/aggregate.go Outdated Show resolved Hide resolved
executor/aggregate.go Outdated Show resolved Hide resolved
Co-authored-by: HuaiyuXu <xuhuaiyu@pingcap.com>
Copy link
Contributor

@XuHuaiyu XuHuaiyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot
Copy link
Member

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • XuHuaiyu
  • mmyj

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Jul 15, 2021
@XuHuaiyu
Copy link
Contributor

/merge

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: ffbcf52

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Jul 15, 2021
@ti-chi-bot
Copy link
Member

@wshwsh12: Your PR was out of date, I have automatically updated it for you.

At the same time I will also trigger all tests for you:

/run-all-tests

If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@wshwsh12
Copy link
Contributor Author

/run-unit-test

@ti-chi-bot ti-chi-bot merged commit 2412437 into pingcap:master Jul 15, 2021
@wshwsh12 wshwsh12 deleted the unparalleled-hash-agg branch January 29, 2022 07:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/execution SIG execution size/L Denotes a PR that changes 100-499 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2. type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants