-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs/design: Support Spilling Unparalleled HashAgg #25792
Conversation
|
||
## Impacts & Risks | ||
|
||
* Memory will still grow without increasing the number of new tuples in HashMap for distinct aggregate function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this. It's seem that there is a contradiction between memory still grow
and without increasing the number of new tuples
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, it is not contradict. For the distinct agg function, it is necessary to record those values that have appeared. We are using a set to record this information. This set will still grow during the aggregation process without increasing the number of new tuples in aggPartialResultMapper.
For example,
type partialResult4CountDistinctInt struct {
valSet set.Int64SetWithMemoryUsage
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
# Proposal: Support Spilling Unparalleled HashAgg | ||
|
||
- Author(s): [@wshwsh12](https://github.com/wshwsh12) | ||
- Discussion PR: N/A |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Discussion PR: N/A | |
- Discussion PR: https://github.com/pingcap/tidb/pull/25792 |
* When the unparallel-agg exceeds the memory quota, this feature helps reduce memory usage and run the sql successfully. | ||
* When the parallel-agg exceeds the memory quota, the SQL will be canceled before. After the agg-concurrency args are set to 1, the SQL can run successfully. | ||
* When the ndv of the data is low, the SQL contains distinct function will be canceled before. After the agg-concurrency args are set to 1, the SQL can run successfully. | ||
* When the ndv of the data is high, the SQL contains distinct function will be canceled before. After the agg-concurrency args are set to 1, the SQL can be canceled successfully if there is insufficient memory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to set the concurrency-related args when there exists an aggregation function with the keyword distinct
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by submitting an approval review. |
/merge |
This pull request has been accepted and is ready to merge. Commit hash: d78daac
|
What problem does this PR solve?
Issue Number: close #xxx
Problem Summary:
What is changed and how it works?
Proposal: xxx
What's Changed: Design docs.
How it Works:
Related changes
pingcap/docs
/pingcap/docs-cn
:Release note