Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelize the converting hash table to two level process after finishing hash agg build #8956

Closed
gengliqi opened this issue Apr 17, 2024 · 0 comments · Fixed by #8957
Closed
Labels
type/enhancement The issue or PR belongs to an enhancement.

Comments

@gengliqi
Copy link
Contributor

gengliqi commented Apr 17, 2024

Enhancement

By accident, I found that TPC-H Q21 can run 200ms faster sometimes on my server. I felt curious about it because my server had almost no noise.
After some investigation, I found that the 200ms improvement was in the first phase of hash agg. After digging a little deeper, I found the reason.

Hash agg triggers converting hash table to two level according to two thresholds. If no fine-grained hash agg is used (always true for the first phase of hash agg) and one hash table has become to two level, any other hash table must be converted to two level before merging process.

When Q21 runs slowly here, a small number of tasks with a large amount of data trigger two levels, but most tasks do not.
Therefore, all hash tables that are not two level will be converted into two levels in a single thread.

When Q21 is getting faster by 200ms, the data distribution between different tasks is relatively even, and all hash agg build tasks are not triggered converting hash table to two level.

Considering that it is not easy to control the data distribution, I think a more general way to solve this problem is to parallelize the converting hash table to two level process after finishing hash agg build.

@gengliqi gengliqi added the type/enhancement The issue or PR belongs to an enhancement. label Apr 17, 2024
ti-chi-bot bot pushed a commit that referenced this issue Apr 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
1 participant