-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modify AggregationPlanNode to consider not nullable columns that do not contain nulls #14342
Modify AggregationPlanNode to consider not nullable columns that do not contain nulls #14342
Conversation
cc @bziobrowski |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
case LITERAL: | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could potentially be the null
literal value too but looks like this doesn't matter for both the cases? The non-scan based aggregation operator is only chosen if the aggregation function input operand is an identifier and for fast filtered count, it shouldn't matter because COUNT(literal)
is treated the same as COUNT(*)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, looks like Postgres and MySQL return 0
for COUNT(null)
interestingly. We might want to do the same although I doubt it's a valid use case anyone cares about (and is also orthogonal to this PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we can transform this into:
return argument.getLiteral().isNull();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The non-scan based aggregation operator is only chosen if the aggregation function input operand is an identifier and for fast filtered count, it shouldn't matter because COUNT(literal) is treated the same as COUNT(*)
But that is the fast literal and non scan limitation and responsibility. This functions returns whether there are nulls or not. If in the future we change the requirements say non scan aggregations, we shouldn't need to modify this code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that makes sense 👍
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #14342 +/- ##
============================================
+ Coverage 61.75% 63.79% +2.04%
- Complexity 207 1556 +1349
============================================
Files 2436 2660 +224
Lines 133233 145916 +12683
Branches 20636 22339 +1703
============================================
+ Hits 82274 93090 +10816
- Misses 44911 45949 +1038
- Partials 6048 6877 +829
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
ea03f93
to
f0632f6
Compare
List<?> inputExpressions = aggregationFunction.getInputExpressions(); | ||
if (inputExpressions.isEmpty()) { | ||
continue; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if aggregation function has more than one input expression ?
btw, it'd be good to add some test(s).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if aggregation function has more than one input expression ?
I don't get it. Then all of them need to fulfill the conditions imposed here. This method returns false if and only if we are sure all inputs for all aggregation functions are not nullable. We cannot be sure in the case of functions and an aggregation function without inputs trivially fulfill this requirement.
btw, it'd be good to add some test(s).
This is actually being tested: https://app.codecov.io/gh/apache/pinot/pull/14342/blob/pinot-core/src/main/java/org/apache/pinot/core/plan/AggregationPlanNode.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ups, there is a bug in the case it is a literal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant is - shouldn't the function be checking all input expressions instead of only the first one ?
inputExpressions.get(0);
As for testing - is there a test asserting a plan similar to the one you mention in PR description ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh! But that was on a different line 😆.
Formally, you are right. But then isFitForNonScanBasedPlan
is doing the same thing, so it has no side effect. Anyway, in order to decouple both methods, it would be better to check all inputs. I'm changing that.
This PR modifies AggregationPlanNode to improve performance when null handling is enabled for the query but the involved columns do not contain nulls for this specific segment.
For example, a query like:
Where
deviceOS
is nullable returned:While with this PR it returns: