-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEAT] Support for aggregation expressions that use multiple AggExprs #3296
Conversation
CodSpeed Performance ReportMerging #3296 will degrade performances by 22.71%Comparing Summary
Benchmarks breakdown
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3296 +/- ##
==========================================
+ Coverage 77.50% 77.57% +0.07%
==========================================
Files 666 667 +1
Lines 81335 81685 +350
==========================================
+ Hits 63041 63371 +330
- Misses 18294 18314 +20
|
/// Optimization rule for lifting expressions that can be done in a project out of an aggregation. | ||
/// | ||
/// After a pass of this rule, the top level expressions in each aggregate should all be aliases or agg exprs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it'd be helpful to add a bit more to the docstring here.
Can you add an example of before+after the optimizer pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kevinzwang, so i know this is just semantics, but this seems like more of a rewrite instead of an optimization. I wonder if it's worth making the distinction between optimizing and rewriting.
to elaborate on how i see them as different, an optimization implies that it is rewriting the query to make it more performant. On the other hand, this rule seems like it's only rewriting the query in a way that makes it executable. (I could be wrong here though).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's true. I mean technically I have this rule because moving things out into a project allows for other optimizations, however the implementation of the aggregate translation assumes that this rule is applied.
In general, should we have rewriting rules in the optimizer instead of in each of the ops themselves? AKA is it okay for the logical plan to be in an invalid state until it has been optimized? I personally don't have very strong thoughts about this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 I don't like the idea of having invalid plans, as that wouldnt allow us to disable certain optimizations.
I know there's been some discussion with @samster25 about potentially introducing a 'unresolved' or 'dsl' plan that wouldn't be guaranteed to be semantically valid, only syntactically valid, but that's definitely out of scope for this PR.
Maybe just add a comment here saying that this is actually a 'rewrite' to make it into a valid plan, just so we don't lose track of it over time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think the ExprResolver
is a much better abstraction. Good work!
This enables expressions such as
sum("a") + sum("b")
ormean("a") / 100
in aggregations. This PR enables Q8 and Q14 of TPC-H and is also necessary for Q17 and Q20 (which are also missing subquery).