-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support aggregate functions in row pattern recognition #8738
Conversation
Example output from
|
51a7fb5
to
1302493
Compare
a60097c
to
e5ad1db
Compare
e5ad1db
to
c11e350
Compare
01d5018
to
5ecdc01
Compare
@mosabua could you please review the docs part? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just had a look at the docs and suggested minor changes. Great job. .. and sorry for being a bit late.
b7d0fd6
to
eedc5fc
Compare
@mosabua I addressed your comments. |
5b48467
to
ea1611f
Compare
core/trino-main/src/main/java/io/trino/operator/window/matcher/ArrayView.java
Outdated
Show resolved
Hide resolved
core/trino-parser/src/main/antlr4/io/trino/sql/parser/SqlBase.g4
Outdated
Show resolved
Hide resolved
ae16032
to
f6e5e94
Compare
2ffbc87
to
e4338f8
Compare
core/trino-main/src/main/java/io/trino/sql/planner/planprinter/PlanPrinter.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/rowpattern/AggregateArgumentsRewriter.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/rowpattern/AggregateArgumentsRewriter.java
Outdated
Show resolved
Hide resolved
...ain/src/main/java/io/trino/sql/planner/rowpattern/ExpressionAndValuePointersEquivalence.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/rowpattern/LogicalIndexExtractor.java
Outdated
Show resolved
Hide resolved
@JsonSubTypes.Type(value = ScalarValuePointer.class, name = "scalar"), | ||
@JsonSubTypes.Type(value = AggregationValuePointer.class, name = "aggregation"), | ||
}) | ||
public abstract class ValuePointer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be an interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried that, but this way the ValuePointers didn't serialize.
...main/src/test/java/io/trino/sql/planner/assertions/PatternRecognitionExpressionRewriter.java
Show resolved
Hide resolved
...main/java/io/trino/sql/planner/iterative/rule/PushDownProjectionsFromPatternRecognition.java
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/operator/window/pattern/ArgumentComputation.java
Outdated
Show resolved
Hide resolved
367f91f
to
2cf1fcf
Compare
core/trino-main/src/main/java/io/trino/operator/window/pattern/MatchAggregation.java
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/operator/window/pattern/MeasureComputation.java
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/operator/window/pattern/MeasureComputation.java
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/operator/window/pattern/MeasureComputation.java
Outdated
Show resolved
Hide resolved
private int aggregated; | ||
|
||
// length of the prefix of the match where all applicable positions were identified, | ||
// and stored in `allPositions`. During the pattern matching phase it might exceed | ||
// the `aggregated` prefix due to `getAllPositions()` calls. | ||
private int evaluated; | ||
|
||
// all identified applicable positions for the current match until the `evaluated` position, starting from 0 | ||
// this list is updated: | ||
// - by `resolveNewPositions()` | ||
// - by `getAllPositions()` (only during the pattern matching phase) | ||
private final IntList allPositions; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unclear what aggregated vs evaluated vs allPositions represent. I guess part of my confusion is due to not understanding how this class fits in the overall picture. Can you elaborate a bit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The SetEvaluator
is part of the state of MatchAggregation
.
Its main purpose is to provide new applicable positions(*) to be aggregated by the MatchAggregation, since the previous call.
The additional purpose is to keep a collection of all applicable positions to be used by ThreadEquivalence.
Accordingly to the two purposes, two pointers are used to mark how much input was processed:
aggregated
indicates how much input was processed by the MatchAggregation,evaluated
indicates how much input was processed to find applicable positions.
evaluated
is never behind aggregated
, because if we run the aggregation on some portion of input, we must have resolved the applicable positions first.
evaluated
might be ahead of aggregated
, in the case when we resolved the positions, but didn't run the actual aggregation. That happens via the getAllPositions()
call, when ThreadEquivalence asks for the set of positions and compares them, but doesn't run the actual aggregation.
allPositions
are all the applicable positions up until theevaluated
pointer
(*) applicable positions are the positions with certain labels, e.g. for avg(A.x)
these will be the positions labeled with A
in the current match, wrt the running or final semantics.
core/trino-main/src/main/java/io/trino/operator/window/pattern/FeedablePagesWindowIndex.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/analyzer/ExpressionAnalyzer.java
Outdated
Show resolved
Hide resolved
core/trino-parser/src/main/java/io/trino/sql/tree/DereferenceExpression.java
Show resolved
Hide resolved
core/trino-parser/src/main/java/io/trino/sql/tree/DereferenceExpression.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/operator/window/matcher/ThreadEquivalence.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/operator/window/pattern/MeasureComputation.java
Outdated
Show resolved
Hide resolved
@@ -175,6 +184,18 @@ public static Block compute( | |||
return work.getResult(); | |||
} | |||
|
|||
public static Block[] precomputeNulls(List<PhysicalValueAccessor> expectedLayout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method doesn't really need to know about PhysicalValueAccessor
s. It just cares about the types. Also, this doesn't seem like the right place to hold it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method doesn't really need to know about PhysicalValueAccessors. It just cares about the types
It cares about the types, but it also has to skip aggregations, which can be present in the list.
It all happens at construction time, and PhysicalValueAccessor
s are one of the constructor parameters, so I think we can pass just the accessors list instead of extracting the types in the constructor and passing them to the helper method.
this doesn't seem like the right place to hold it
Why? What is the right place?
12b7513
to
2ad80f3
Compare
2ad80f3
to
d70a37f
Compare
FunctionCall classifier = classifierCall.get(); | ||
if (!classifier.getArguments().isEmpty()) { | ||
IrLabel label = irLabel(((Identifier) getOnlyElement(classifier.getArguments())).getCanonicalValue()); | ||
Set<IrLabel> labels = subsets.get(label); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can a subset refer to other subsets? If so, does this guarantee that all the subset references are expanded all the way down to the leaves?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, according to the spec, a subset can only refer to primary row pattern variables.
core/trino-main/src/main/java/io/trino/sql/planner/iterative/rule/ExpressionRewriteRuleSet.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/iterative/rule/ExpressionRewriteRuleSet.java
Outdated
Show resolved
Hide resolved
private static class AggregationIndexes | ||
{ | ||
final boolean foundNoClassifierAggregations; | ||
final List<List<Integer>> noClassifierAggregations; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add an explanation of what this list and each of the contained lists represents. The signature of the field doesn't make that clear enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is explained in a comment above the method classifyAggregations()
, and also mentioned in the class javadoc, and commented in the method equivalent()
where those structures are used.
|
||
private static class AggregationIndexes | ||
{ | ||
final boolean foundNoClassifierAggregations; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems unnecessary. Why not just rely on noClassifierAggregations.isEmpty()
or something along those lines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
noClassifierAggregations
as well as classifierAggregations
are List<List<>>.
They have an entry for every label, and this entry is a list of relevant aggregations.
So, this is not that easy to determine if the whole structure is empty. If I didn't keep the boolean, I would have to iterate over the top-level list to find out.
I added the boolean in answer to your comment: #8738 (comment) It allows to skip part of the computation at runtime.
} | ||
} | ||
} | ||
noClassifierAggregations.add(noClassifierAggregationIndexes.build()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does it mean if noClassifierAggregationIndexes
is empty? (i.e., none of the pointers hit the first branch). Same question for the other list below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It means that for a given label, in the label's defining condition, there are no aggregations of this kind:
- "no-classifier aggregations" is the kind of aggregations whose result does not depend on the actual labels, e.g.
max(A.price)
- "classifier aggregations" is the kind of aggregations whose result depends on the actual labels, e.g.
array_agg(CLASSIFIER(U))
, whereU
is a subset{A, B}
These two kinds of aggregations are explained in the comment above the method classifyAggregations
.
This might look like arbitrary partitioning, but it is based on the observation that for some aggregations it is enough to compare the sets of positions, while other aggregations require also comparing the matched labels.
The overall purpose is to optimize ThreadEquivalence
and make it more accurate by skipping unnecessary comparisons.
core/trino-main/src/main/java/io/trino/operator/window/pattern/PhysicalValueAccessor.java
Outdated
Show resolved
Hide resolved
*/ | ||
package io.trino.operator.window.pattern; | ||
|
||
public class MatchAggregationPointer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this represent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MatchAggregationPointer
replaces AggregationValuePointer
after the optimization phase is finished.
This is an accessor to aggregation at execution time. At this time, all MatchAggregation
s are gathered in an array, and MatchAggregationPointer
contains an index to that array.
777d4df
to
656a27f
Compare
checkArgument(currentRow >= patternStart && currentRow < patternStart + matchedLabels.length(), "current row is out of bounds of the match"); | ||
checkState(aggregated <= evaluated && evaluated <= matchedLabels.length(), "SetEvaluator in inconsistent state"); | ||
|
||
IntList positions = new IntList(DEFAULT_CAPACITY); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this a subset of allPositions? Why not return an ArrayView over a portion of that, instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
positions
are relative to partition start, and allPositions
are relative to match start.
core/trino-main/src/main/java/io/trino/operator/window/pattern/MeasureComputation.java
Outdated
Show resolved
Hide resolved
656a27f
to
93f05e0
Compare
This rule identifies and pre-projects all eligible arguments of aggregations in pattern recognition expressions. The purpose of it is to limit the number of runtime-evaluated arguments.
After this change, the pattern recognition expressions are processed by rewrite-based optimizations such as SimplifyExpressions or Desugar... transformations. The change applies to: - top-level pattern recognition expressions - aggregation arguments in pattern recognition context
93f05e0
to
42c96aa
Compare
No description provided.