-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add skipped_aggregation_rows
metric to aggregate operator
#11706
Conversation
skipped_aggregation_rows
to aggregate operator
a9a34db
to
ae6f4d6
Compare
ae6f4d6
to
223b959
Compare
@@ -611,6 +629,9 @@ impl Stream for GroupedHashAggregateStream { | |||
match ready!(self.input.poll_next_unpin(cx)) { | |||
Some(Ok(batch)) => { | |||
let _timer = elapsed_compute.timer(); | |||
if let Some(probe) = self.skip_aggregation_probe.as_mut() { | |||
probe.record_skipped(&batch); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the actual call that records the metrics . The rest of the PR is comments and plumbing
options.skip_partial_aggregation_probe_rows_threshold; | ||
let probe_ratio_threshold = | ||
options.skip_partial_aggregation_probe_ratio_threshold; | ||
let skipped_aggregation_rows = MetricBuilder::new(&agg.metrics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that it should be better not to add this counter into all group bys (as another baseline metric) by default 👍
FYI @Dandandan would you have time to review this PR (to help performance profiling of the group by aggregation skipping code)? |
skipped_aggregation_rows
to aggregate operatorskipped_aggregation_rows
metric to aggregate operator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @alamb
Thanks for the review @andygrove and @korowa |
Which issue does this PR close?
Rationale for this change
#11627 from @korowa adds a "partial aggregation skipping" mode to the hash aggregate exec that switches aggregate behavior dynamically at runtime,
It would be very nice to know if the path is being executed or not, and the way to do this in DataFusion is metrics.
What changes are included in this PR?
Add a "skipped_aggregation_rows" counter which records the number of rows
Are these changes tested?
I tested them manually
For example, this line shows
skipped_aggregation_rows=98293561
:Here is the entire output of testing with a query:
Are there any user-facing changes?
Another metric in a plan if partial aggregation is skipped