Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use builders for GroupBy initialization and updates #2886

Merged
merged 29 commits into from
Sep 26, 2022

Conversation

lbooker42
Copy link
Contributor

@lbooker42 lbooker42 commented Sep 20, 2022

Similar to #2852, change GroupByChunkedOperator to use RowSetSequentialBuilder during the initial creation of the groupBy table.

Closes #2873

Zulu11.54+23-CA (build 11.0.14+9-LTS) on MacOS, M1                
Rows        Buckets    groupBy (ms)    groupBy_opt (ms)    speedup
100,000,000 100        1,754           1,485               15.3%
100,000,000 25,000     106,402         11,268              89.4%
100,000,000 1,000,000  32,800          27,687              15.6%
                
                
Zulu17.34+19-CA (build 17.0.3+7-LTS) on MacOS, M1                
Rows        Buckets    groupBy (ms)    groupBy_opt (ms)    speedup
100,000,000 100        1,518           1,485               2.1%
100,000,000 25,000     93,826          10,995              88.3%
100,000,000 1,000,000  3,313           22,397              48.3%

@lbooker42 lbooker42 added this to the Sept 2022 milestone Sep 20, 2022
@lbooker42 lbooker42 self-assigned this Sep 20, 2022
Copy link
Member

@rcaudy rcaudy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As-written, this only speeds up the instantiation pass. For partition-by, since we already needed to keep track of adds, removes, and modifies each cycle, changing update processing was a no-brainer. Do you think it would be worthwhile to start using random builders in updating processing to track adds and removes in group-by? I suspect the trade-off to be worth it.

@rcaudy
Copy link
Member

rcaudy commented Sep 20, 2022

A thing I've just thought of for group-by and partition-by. If we're using initial groups and !preserveEmpty, we might be orphaning empty builders in the addedBuilders sources indefinitely, since we only do any work with the ones that make it into the initial row set. I'm not sure it's a big deal, but it's not optimal.

I think we could pass the max slot used into propagateInitialState to allow the code to address this.

Copy link
Member

@rcaudy rcaudy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is basically ready to merge, if nightlies look good. Minor nitpicks only.

@lbooker42 lbooker42 changed the title Use sequential builders for GroupBy initialization Use builders for GroupBy initialization and updates Sep 26, 2022
@lbooker42 lbooker42 merged commit a23b5e9 into deephaven:main Sep 26, 2022
@github-actions github-actions bot locked and limited conversation to collaborators Sep 26, 2022
@lbooker42 lbooker42 deleted the lab-groupby-opt branch June 26, 2024 20:00
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve initialization performance of GroupBy operator
2 participants