-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory is coupled to group by
cardinality, even when the aggregate output is truncated by a limit
clause
#7191
Comments
Perhaps we could add a redact group API to the new row accumulators, this would allow using them for this as well as for window functions |
I struggled with this for a bit. Originally I rejected using I don't think we'd want to always evict groups, because we might not even need to add them in the first place if the value being aggregated is less/greater than the min/max of the priority queue - so it would be a no-op. |
Also, I think usually this optimization would be applied for single terms |
Yeah, I think you need both a priority queue to work out which groups to keep, along with a HashMap to work out which rows belong to which groups. I can't think of an obvious way to avoid this.
I was envisaging something like adding support to the Or something to that effect, just spitballing here. I really want to get Window functions using |
Interesting... I thought we were going the other way, due to this comment. |
By scalar, I am referring to the ones based around ScalarValue, i.e. https://docs.rs/datafusion/latest/datafusion/physical_plan/trait.Accumulator.html |
Is your feature request related to a problem or challenge?
Currently, there is only one Aggregation:
GroupedHashAggregateStream
. It does a lovely job, but it allocates memory for every uniquegroup by
value.For large datasets, this can cause OOM errors, even if the very next operation is a
sort by max(x) limit y
.Describe the solution you'd like
I would like to add a
GroupedAggregateStream
based on aPriorityQueue
of grouped values that can be used instead ofGroupedHashAggregateStream
under the specific conditions above, so that Top K queries work even on datasets with cardinality larger than available memory.Describe alternatives you've considered
A more generalized implementation where we:
emit
ing rows in a stream as the aggregate for each group is computedTopKExec
node that is only responsible for doing the top K operationUnfortunately, despite being more general, I'm told that this approach will still OOM in our case.
Additional context
Please see the following similar (but not same) tickets for related top K issues:
ROW_NUMBER < 5
/ TopK #6899The text was updated successfully, but these errors were encountered: