Support re-enabling partial aggregation adaptively #11361

lukasz-stec · 2022-03-08T08:42:48Z

With #11011, trino aggregation operator partial step can be switched off if, for the rows processed so far, it did not reduce the number of rows much (most of the input rows are distinct).

If this happens, but the rows that are yet to be processed have a different distribution i.e. a small number of distinct values, we want the partial aggregation step to be re-enabled.

One idea of how to implement this is to calculate or estimate (e.g. using hyper log log) the number of distinct values in the split once in a while (possibly with exponential backoff for the window between calculations), and enable partial aggregation again if the number of distinct values to input position count is low for the given split.

Another idea, that may not be doable but I will just put it here, is that if we had per split statistics of the number of distinct values per column (with the correlation between column stats in a perfect world), we could decide to enable or disable partial aggregation on per split basis.
Parquet format has support for per column chunk and per page distinct_count but I suspect it's not populated in most real-life scenarios

The text was updated successfully, but these errors were encountered:

lukasz-stec changed the title ~~Support enabling partial aggregation adaptively~~ Support re-enabling partial aggregation adaptively Mar 8, 2022

lukasz-stec mentioned this issue Mar 8, 2022

Make partial aggregation adaptive #11011

Merged

sopel39 mentioned this issue Apr 25, 2023

Be more conservative when turning off partial aggregation #17143

Merged

sopel39 closed this as completed in #17143 Apr 25, 2023

sopel39 mentioned this issue Apr 25, 2023

Release notes for 415 #17135

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support re-enabling partial aggregation adaptively #11361

Support re-enabling partial aggregation adaptively #11361

lukasz-stec commented Mar 8, 2022

Support re-enabling partial aggregation adaptively #11361

Support re-enabling partial aggregation adaptively #11361

Comments

lukasz-stec commented Mar 8, 2022