-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feat](nereids) add merge aggregate rule #31811
Conversation
Thank you for your contribution to Apache Doris. |
run buildall |
run buildall |
private Plan mergeTwoAggregate(Plan plan) { | ||
LogicalAggregate<Plan> outerAgg = (LogicalAggregate<Plan>) plan; | ||
LogicalAggregate<Plan> innerAgg = (LogicalAggregate<Plan>) outerAgg.child(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private Plan mergeTwoAggregate(Plan plan) { | |
LogicalAggregate<Plan> outerAgg = (LogicalAggregate<Plan>) plan; | |
LogicalAggregate<Plan> innerAgg = (LogicalAggregate<Plan>) outerAgg.child(); | |
private Plan mergeTwoAggregate(LogicalAggregate<LogicalAggregate<Plan>> outerAgg) { | |
LogicalAggregate<Plan> innerAgg = outerAgg.child(); | |
|
||
Map<ExprId, AggregateFunction> innerAggExprIdToAggFunc = innerAgg.getOutputExpressions().stream() | ||
.filter(expr -> (expr instanceof Alias) && (expr.child(0) instanceof AggregateFunction)) | ||
.collect(Collectors.toMap(NamedExpression::getExprId, value -> (AggregateFunction) value.child(0))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add mergeFunction in case of duplicate key
private Plan mergeAggProjectAgg(Plan plan) { | ||
LogicalAggregate<Plan> outerAgg = (LogicalAggregate<Plan>) plan; | ||
LogicalProject<Plan> project = (LogicalProject<Plan>) outerAgg.child(); | ||
LogicalAggregate<Plan> innerAgg = (LogicalAggregate<Plan>) project.child(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private Plan mergeAggProjectAgg(LogicalAggregate<LogicalProject<LogicalAggregate<Plan>>> outerAgg) {
LogicalProject<LogicalAggregate<Plan>> project = outerAgg.child();
LogicalAggregate<Plan> innerAgg = project.child();
if (innerFunc.isDistinct()) { | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inner distinct is ok if outer group by keys are exactly same with inner keys?
return false; | ||
} | ||
// support sum(sum),min(min),max(max),sum(count),any_value(any_value) | ||
if (!(outerFunc.getName().equals("sum") && innerFunc.getName().equals("count")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trans sum(count())
to count()
lead to nullable changed if outer agg is scalar agg. so we need to wrap the final expression with nullable()
function to change its nullable to true
fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/MergeAggregate.java
Show resolved
Hide resolved
|
||
sql "sync" | ||
|
||
qt_maxMax_minMin_sumSum_sumCount """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add some shape check or ut to ensure merge agg work well
run buildall |
run buildall |
TPC-H: Total hot run time: 37964 ms
|
TPC-DS: Total hot run time: 187125 ms
|
ClickBench: Total hot run time: 30.81 s
|
Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
|
PR approved by at least one committer and no changes requested. |
PR approved by anyone and no changes requested. |
fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/MergeAggregate.java
Show resolved
Hide resolved
introduced by #31811 sql like this: select col1, col2 from (select a as col1, a as col2 from mal_test1 group by a) t group by col1, col2 ; Transformation Description: In the process of optimizing the query, an agg-project-agg pattern is transformed into a project-agg pattern: Before Transformation: LogicalAggregate +-- LogicalPrject +-- LogicalAggregate After Transformation: LogicalProject +-- LogicalAggregate Before the transformation, the projection in the LogicalProject was a AS col1, a AS col2, and the outer aggregate group by keys were col1, col2. After the transformation, the aggregate group by keys became a, a, and the projection remained a AS col1, a AS col2. Problem: When building the project projections, the group by key a, a needed to be transformed to a AS col1, a AS col2. The old code had a bug where it used the slot as the map key and the alias in the projections as the map value. This approach did not account for the situation where aliases might have the same slot. Solution: The new code fixes this issue by using the original outer aggregate group by expression's exprId. It searches within the original project projections to find the NamedExpression that has the same exprId. These expressions are then placed into the new projections. This method ensures that the correct aliases are maintained, resolving the bug.
introduced by #31811 sql like this: select col1, col2 from (select a as col1, a as col2 from mal_test1 group by a) t group by col1, col2 ; Transformation Description: In the process of optimizing the query, an agg-project-agg pattern is transformed into a project-agg pattern: Before Transformation: LogicalAggregate +-- LogicalPrject +-- LogicalAggregate After Transformation: LogicalProject +-- LogicalAggregate Before the transformation, the projection in the LogicalProject was a AS col1, a AS col2, and the outer aggregate group by keys were col1, col2. After the transformation, the aggregate group by keys became a, a, and the projection remained a AS col1, a AS col2. Problem: When building the project projections, the group by key a, a needed to be transformed to a AS col1, a AS col2. The old code had a bug where it used the slot as the map key and the alias in the projections as the map value. This approach did not account for the situation where aliases might have the same slot. Solution: The new code fixes this issue by using the original outer aggregate group by expression's exprId. It searches within the original project projections to find the NamedExpression that has the same exprId. These expressions are then placed into the new projections. This method ensures that the correct aliases are maintained, resolving the bug.
introduced by #31811 sql like this: select col1, col2 from (select a as col1, a as col2 from mal_test1 group by a) t group by col1, col2 ; Transformation Description: In the process of optimizing the query, an agg-project-agg pattern is transformed into a project-agg pattern: Before Transformation: LogicalAggregate +-- LogicalPrject +-- LogicalAggregate After Transformation: LogicalProject +-- LogicalAggregate Before the transformation, the projection in the LogicalProject was a AS col1, a AS col2, and the outer aggregate group by keys were col1, col2. After the transformation, the aggregate group by keys became a, a, and the projection remained a AS col1, a AS col2. Problem: When building the project projections, the group by key a, a needed to be transformed to a AS col1, a AS col2. The old code had a bug where it used the slot as the map key and the alias in the projections as the map value. This approach did not account for the situation where aliases might have the same slot. Solution: The new code fixes this issue by using the original outer aggregate group by expression's exprId. It searches within the original project projections to find the NamedExpression that has the same exprId. These expressions are then placed into the new projections. This method ensures that the correct aliases are maintained, resolving the bug.
…e members (#36145) This bug is induced by #31811. The innerAggExprIdToAggFunc was a member of MergeAggregate, which was wrong. Because rules like MergeAggregate are single instances, same rules applied to different sub-plans will affect each other. This pr changes innerAggExprIdToAggFunc to a local variable, fixes this bug. No regression use case was added because it’s not a problem that will definitely reoccur and requires the same rule to be applied to multiple plans at the same time.
…e members (apache#36145) This bug is induced by apache#31811. The innerAggExprIdToAggFunc was a member of MergeAggregate, which was wrong. Because rules like MergeAggregate are single instances, same rules applied to different sub-plans will affect each other. This pr changes innerAggExprIdToAggFunc to a local variable, fixes this bug. No regression use case was added because it’s not a problem that will definitely reoccur and requires the same rule to be applied to multiple plans at the same time.
…e members (#36145) This bug is induced by #31811. The innerAggExprIdToAggFunc was a member of MergeAggregate, which was wrong. Because rules like MergeAggregate are single instances, same rules applied to different sub-plans will affect each other. This pr changes innerAggExprIdToAggFunc to a local variable, fixes this bug. No regression use case was added because it’s not a problem that will definitely reoccur and requires the same rule to be applied to multiple plans at the same time.
…on all (#41613) (#41909) introduce by #31811 and #39450 ```sql select count(1) from(select 3, 6 union all select 1, 3) t ``` wrong LogicalUnion plan: ```sql LogicalUnion( qualifier=ALL, outputs=[3#6], regularChildrenOutputs=[], constantExprsList=[[], []], hasPushedFilter=false ``` this sql will report error in explain, because the logical union outputs has a slot, but the logical union has no child and has a empty constantExprList, which is wrong set in column prune. this pr fixes it by consider when require columns is empty and keep the min slot and min slot corresponding const expressions.
aggregate can be merged to
this pr add a RBO rule to perform this transform