Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculate group_by in parallel #2701

Merged
merged 1 commit into from
Aug 4, 2024
Merged

Calculate group_by in parallel #2701

merged 1 commit into from
Aug 4, 2024

Conversation

texodus
Copy link
Member

@texodus texodus commented Aug 3, 2024

This PR parallelizes column population during group_by tree calculation. The result is a substantial improvement to the wall-clock runtime of View() creation using this field, which scales with the # of cores and View columns. For example, in an 8-thread pool tested on with 50k rows * 20 columns dataset grouped by a string column, the wall clock improvements is ~2x.

Screenshot 2024-08-03 at 3 17 39 PM

@texodus texodus added the enhancement Feature requests or improvements label Aug 3, 2024
@texodus texodus force-pushed the sparse-tree-parallel branch 2 times, most recently from f608e12 to 6c53cb6 Compare August 3, 2024 23:55
Signed-off-by: Andrew Stein <steinlink@gmail.com>
@texodus texodus marked this pull request as ready for review August 4, 2024 02:32
@texodus texodus merged commit 012c403 into master Aug 4, 2024
8 checks passed
@texodus texodus deleted the sparse-tree-parallel branch August 4, 2024 02:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature requests or improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant