-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor function registry for multi-stage engine #13573
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #13573 +/- ##
=============================================
- Coverage 61.75% 27.72% -34.03%
+ Complexity 207 198 -9
=============================================
Files 2436 2553 +117
Lines 133233 140470 +7237
Branches 20636 21851 +1215
=============================================
- Hits 82274 38949 -43325
- Misses 44911 98533 +53622
+ Partials 6048 2988 -3060
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
8b0f391
to
777503c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Jackie-Jiang, this is a really nice improvement with lots of cleanups! I had a few minor comments and questions to better my understanding of these areas.
pinot-common/src/main/java/org/apache/pinot/common/function/PinotScalarFunction.java
Outdated
Show resolved
Hide resolved
pinot-common/src/main/java/org/apache/pinot/common/function/PinotScalarFunction.java
Outdated
Show resolved
Hide resolved
pinot-spi/src/main/java/org/apache/pinot/spi/annotations/ScalarFunction.java
Show resolved
Hide resolved
pinot-common/src/main/java/org/apache/pinot/common/function/FunctionRegistry.java
Outdated
Show resolved
Hide resolved
pinot-common/src/main/java/org/apache/pinot/common/function/FunctionRegistry.java
Show resolved
Hide resolved
...rc/main/java/org/apache/pinot/core/operator/transform/function/TransformFunctionFactory.java
Show resolved
Hide resolved
pinot-core/src/test/java/org/apache/pinot/core/function/FunctionDefinitionRegistryTest.java
Outdated
Show resolved
Hide resolved
pinot-core/src/test/java/org/apache/pinot/core/function/FunctionDefinitionRegistryTest.java
Outdated
Show resolved
Hide resolved
return call; | ||
public boolean canReduce(AggregateCall call) { | ||
SqlKind kind = call.getAggregation().getKind(); | ||
return kind == SqlKind.SUM || kind == SqlKind.AVG; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this customized rule? Which of the original Calcite rule's reductions don't work in Pinot?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some javadoc to make it clear. Currently STDDEV_POP, STDDEV_SAMP, VAR_POP, VAR_SAMP, COVAR_POP, COVAR_SAMP
breaks because of the original rule. Take a look at the changes in StatisticAggregates.json
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah okay, makes sense now, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this rule applies in the leaf stage or also in the intermediate stage? How we also merge data from different workers in the not simpler form?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When this rule is applied, we don't have leaf stage concept yet (leaf stage is determined with PinotAggregateExchangeNodeInsertRule
).
We don't really need these rewrite because Pinot can directly handle SUM
and AVG
with proper null handling (the rule is needed for engine without proper null handling). I didn't directly remove the rule because that is out of the scope of this PR, and null handling support requires some more tweaks.
pinot-common/src/main/java/org/apache/pinot/common/function/TransformFunctionType.java
Outdated
Show resolved
Hide resolved
c6f2303
to
5447423
Compare
Looks like this test needs to be updated with the new exception type and message due to the changes in |
pinot-common/src/main/java/org/apache/pinot/common/function/scalar/LogicalFunctions.java
Show resolved
Hide resolved
pinot-common/src/main/java/org/apache/pinot/common/function/scalar/ArrayFunctions.java
Show resolved
Hide resolved
pinot-common/src/main/java/org/apache/pinot/common/function/FunctionRegistry.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still need more time to finish the first read and probably have a second one :D
@Deprecated | ||
@Nullable | ||
public static FunctionInfo getFunctionInfo(String name, int numArguments) { | ||
return lookupFunctionInfo(canonicalize(name), numArguments); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would even recommend to create a class called CannonicalName
that contains a String. Then use that class as input. We may have a static class that transforms Strings into CanonicalNames.
This will let us:
- Have type safe checks, so we Java don't let us call lookupFunctionInfo with non canonical names.
- We can cache the CanonicalNames, so we don't need to allocate.
pinot-common/src/main/java/org/apache/pinot/common/function/FunctionRegistry.java
Show resolved
Hide resolved
pinot-common/src/main/java/org/apache/pinot/common/function/PinotScalarFunction.java
Outdated
Show resolved
Hide resolved
public FunctionInfo getFunctionInfo(int numArguments) { | ||
FunctionInfo functionInfo = _functionInfoMap.get(numArguments); | ||
return functionInfo != null ? functionInfo : _functionInfoMap.get(VAR_ARG_KEY); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be cool to have documented somewhere how functions can be registered. AFAIU that should be something like:
- Using annotated methods: Simpler and shorted but less expressive. For example, you cannot support polymorphism.
- Using annotated classes that implement PinotScalarFunction: More expressive.
By @Jackie-Jiang comment here it looks like there is a third way that consist on registering the function explicitly in PinotOperatorTable. But function won't be usable in V1, am I right?
pinot-common/src/main/java/org/apache/pinot/common/function/TransformFunctionType.java
Outdated
Show resolved
Hide resolved
pinot-core/src/test/java/org/apache/pinot/queries/TimestampQueriesTest.java
Outdated
Show resolved
Hide resolved
...r/src/main/java/org/apache/pinot/calcite/rel/rules/PinotAggregateExchangeNodeInsertRule.java
Show resolved
Hide resolved
return call; | ||
public boolean canReduce(AggregateCall call) { | ||
SqlKind kind = call.getAggregation().getKind(); | ||
return kind == SqlKind.SUM || kind == SqlKind.AVG; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this rule applies in the leaf stage or also in the intermediate stage? How we also merge data from different workers in the not simpler form?
pinot-query-planner/src/main/java/org/apache/pinot/calcite/sql/fun/PinotOperatorTable.java
Show resolved
Hide resolved
pinot-query-planner/src/main/java/org/apache/pinot/calcite/sql/fun/PinotOperatorTable.java
Show resolved
Hide resolved
36f30a4
to
d8e9770
Compare
if (functionInfo == null) { | ||
if (FunctionRegistry.contains(canonicalName)) { | ||
throw new IllegalArgumentException( | ||
String.format("Unsupported function: %s with argument types: %s", functionName, | ||
Arrays.toString(argumentTypes))); | ||
} else { | ||
throw new IllegalArgumentException(String.format("Unsupported function: %s", functionName)); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we should also include the variants that were not selected.
We could do that by adding a method in FunctionRegistry
that returns all FunctionInfo for a given name and then showing these options here.
I don't think it has to be added in this PR, but it is something that will be useful to debug problems
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a little bit tricky though because the matches could be done via the type inference. We can probably add some usage info within each PinotScalarFunction
which can be lookup up in the FunctionRegistry
. Added a TODO to follow up
"sql": "SELECT PERCENTILE_TDIGEST(float_col, 50), PERCENTILE_TDIGEST(double_col, 5), PERCENTILE_TDIGEST(int_col, 75), PERCENTILE_TDIGEST(long_col, 75) FROM {tbl}", | ||
"outputs": [[1.75, 1.0, 137, 137]] | ||
"outputs": [[1.75, 1.0, 137.75, 137.75]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we are changing the return type, right? Couldn't this produce problems in production? Does it only happen with PERCETILE_TDIGEST or are other aggregations that can change their type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only PERCENTILE_TDIGEST
where we registered the wrong return type before. This is actually a bugfix.
d8e9770
to
96040f0
Compare
Here are the main changes:
PinotCatalog
is just a wrapper over table cache to resolve database name and extract table schema.PinotOperatorTable
PinotOperatorTable
:SqlStdOperatorTable