Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify AggregationPlanNode to consider not nullable columns that do not contain nulls #14342

Merged

Conversation

gortiz
Copy link
Contributor

@gortiz gortiz commented Oct 31, 2024

This PR modifies AggregationPlanNode to improve performance when null handling is enabled for the query but the involved columns do not contain nulls for this specific segment.

For example, a query like:

set enableNullHandling=true;
set explainAskingServers=true;
explain plan for
select count(deviceOS) from userAttributes limit 10

Where deviceOS is nullable returned:

LogicalSort(fetch=[10])
  PinotLogicalSortExchange(distribution=[hash], collation=[[]], isSortOnSender=[false], isSortOnReceiver=[false])
    PinotLogicalAggregate(group=[{}], agg#0=[COUNT($0)])
      PinotLogicalExchange(distribution=[hash])
        LeafStageCombineOperator(table=[userAttributes])
          StreamingInstanceResponse
            CombineAggregate
              Aggregate(aggregations=[[count(deviceOS)]]) <-- see this
                Project(columns=[[deviceOS]])
                  DocIdSet(maxDocs=[40000])
                    FilterMatchEntireSegment(numDocs=[10000])

While with this PR it returns:

LogicalSort(fetch=[10])
  PinotLogicalSortExchange(distribution=[hash], collation=[[]], isSortOnSender=[false], isSortOnReceiver=[false])
    PinotLogicalAggregate(group=[{}], agg#0=[COUNT($0)])
      PinotLogicalExchange(distribution=[hash])
        LeafStageCombineOperator(table=[userAttributes])
          StreamingInstanceResponse
            CombineAggregate
              FastFilteredCount <-- see this
                FilterMatchEntireSegment(numDocs=[10000])

@gortiz
Copy link
Contributor Author

gortiz commented Oct 31, 2024

cc @bziobrowski

Copy link
Collaborator

@yashmayya yashmayya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines 141 to 146
case LITERAL:
return false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could potentially be the null literal value too but looks like this doesn't matter for both the cases? The non-scan based aggregation operator is only chosen if the aggregation function input operand is an identifier and for fast filtered count, it shouldn't matter because COUNT(literal) is treated the same as COUNT(*).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, looks like Postgres and MySQL return 0 for COUNT(null) interestingly. We might want to do the same although I doubt it's a valid use case anyone cares about (and is also orthogonal to this PR).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can transform this into:

          return argument.getLiteral().isNull();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The non-scan based aggregation operator is only chosen if the aggregation function input operand is an identifier and for fast filtered count, it shouldn't matter because COUNT(literal) is treated the same as COUNT(*)

But that is the fast literal and non scan limitation and responsibility. This functions returns whether there are nulls or not. If in the future we change the requirements say non scan aggregations, we shouldn't need to modify this code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that makes sense 👍

@codecov-commenter
Copy link

codecov-commenter commented Oct 31, 2024

Codecov Report

Attention: Patch coverage is 75.00000% with 4 lines in your changes missing coverage. Please review.

Project coverage is 63.79%. Comparing base (59551e4) to head (1472d9f).
Report is 1287 commits behind head on master.

Files with missing lines Patch % Lines
...rg/apache/pinot/core/plan/AggregationPlanNode.java 75.00% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #14342      +/-   ##
============================================
+ Coverage     61.75%   63.79%   +2.04%     
- Complexity      207     1556    +1349     
============================================
  Files          2436     2660     +224     
  Lines        133233   145916   +12683     
  Branches      20636    22339    +1703     
============================================
+ Hits          82274    93090   +10816     
- Misses        44911    45949    +1038     
- Partials       6048     6877     +829     
Flag Coverage Δ
custom-integration1 100.00% <ø> (+99.99%) ⬆️
integration 100.00% <ø> (+99.99%) ⬆️
integration1 100.00% <ø> (+99.99%) ⬆️
integration2 0.00% <ø> (ø)
java-11 63.75% <75.00%> (+2.04%) ⬆️
java-21 63.69% <75.00%> (+2.06%) ⬆️
skip-bytebuffers-false 63.79% <75.00%> (+2.04%) ⬆️
skip-bytebuffers-true 63.66% <75.00%> (+35.94%) ⬆️
temurin 63.79% <75.00%> (+2.04%) ⬆️
unittests 63.79% <75.00%> (+2.04%) ⬆️
unittests1 55.52% <75.00%> (+8.62%) ⬆️
unittests2 34.14% <0.00%> (+6.41%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@gortiz gortiz force-pushed the better-null-handling-in-aggregation-plan-node branch from ea03f93 to f0632f6 Compare October 31, 2024 11:23
List<?> inputExpressions = aggregationFunction.getInputExpressions();
if (inputExpressions.isEmpty()) {
continue;
}
Copy link
Contributor

@bziobrowski bziobrowski Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if aggregation function has more than one input expression ?
btw, it'd be good to add some test(s).

Copy link
Contributor Author

@gortiz gortiz Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if aggregation function has more than one input expression ?

I don't get it. Then all of them need to fulfill the conditions imposed here. This method returns false if and only if we are sure all inputs for all aggregation functions are not nullable. We cannot be sure in the case of functions and an aggregation function without inputs trivially fulfill this requirement.

btw, it'd be good to add some test(s).

This is actually being tested: https://app.codecov.io/gh/apache/pinot/pull/14342/blob/pinot-core/src/main/java/org/apache/pinot/core/plan/AggregationPlanNode.java

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ups, there is a bug in the case it is a literal

Copy link
Contributor

@bziobrowski bziobrowski Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant is - shouldn't the function be checking all input expressions instead of only the first one ?

inputExpressions.get(0);

As for testing - is there a test asserting a plan similar to the one you mention in PR description ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! But that was on a different line 😆.

Formally, you are right. But then isFitForNonScanBasedPlan is doing the same thing, so it has no side effect. Anyway, in order to decouple both methods, it would be better to check all inputs. I'm changing that.

@gortiz gortiz merged commit 4588d0b into apache:master Nov 8, 2024
19 of 21 checks passed
@gortiz gortiz deleted the better-null-handling-in-aggregation-plan-node branch November 8, 2024 14:39
davecromberge pushed a commit to davecromberge/pinot that referenced this pull request Nov 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants