Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat](nereids) add merge aggregate rule #31811

Merged
merged 6 commits into from
Mar 13, 2024

Conversation

feiniaofeiafei
Copy link
Contributor

select 
  deptno c, 
  min(y), 
  max(z) z, 
  sum(r), 
  sum(m) n, 
  sum(x) sal 
from 
  (
    select 
      deptno, 
      ename, 
      sum(sal) x, 
      max(sal) z, 
      min(sal) y, 
      count(hiredate) m, 
      count(mgr) r 
    from 
      sales.emp 
    group by 
      deptno, 
      ename
  ) t 
group by 
  deptno;

aggregate can be merged to

select 
  deptno c, 
  min(sal), 
  max(sal) z, 
  count(mgr), 
  count(hiredate) n, 
  sum(sal) sal 
from 
     sales.emp 
group by 
  deptno;

this pr add a RBO rule to perform this transform

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

@feiniaofeiafei
Copy link
Contributor Author

run buildall

@feiniaofeiafei
Copy link
Contributor Author

run buildall

Comment on lines 51 to 54
private Plan mergeTwoAggregate(Plan plan) {
LogicalAggregate<Plan> outerAgg = (LogicalAggregate<Plan>) plan;
LogicalAggregate<Plan> innerAgg = (LogicalAggregate<Plan>) outerAgg.child();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private Plan mergeTwoAggregate(Plan plan) {
LogicalAggregate<Plan> outerAgg = (LogicalAggregate<Plan>) plan;
LogicalAggregate<Plan> innerAgg = (LogicalAggregate<Plan>) outerAgg.child();
private Plan mergeTwoAggregate(LogicalAggregate<LogicalAggregate<Plan>> outerAgg) {
LogicalAggregate<Plan> innerAgg = outerAgg.child();


Map<ExprId, AggregateFunction> innerAggExprIdToAggFunc = innerAgg.getOutputExpressions().stream()
.filter(expr -> (expr instanceof Alias) && (expr.child(0) instanceof AggregateFunction))
.collect(Collectors.toMap(NamedExpression::getExprId, value -> (AggregateFunction) value.child(0)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add mergeFunction in case of duplicate key

Comment on lines 65 to 68
private Plan mergeAggProjectAgg(Plan plan) {
LogicalAggregate<Plan> outerAgg = (LogicalAggregate<Plan>) plan;
LogicalProject<Plan> project = (LogicalProject<Plan>) outerAgg.child();
LogicalAggregate<Plan> innerAgg = (LogicalAggregate<Plan>) project.child();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private Plan mergeAggProjectAgg(LogicalAggregate<LogicalProject<LogicalAggregate<Plan>>> outerAgg) {
    LogicalProject<LogicalAggregate<Plan>> project = outerAgg.child();
    LogicalAggregate<Plan> innerAgg = project.child();

Comment on lines 158 to 160
if (innerFunc.isDistinct()) {
return false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inner distinct is ok if outer group by keys are exactly same with inner keys?

return false;
}
// support sum(sum),min(min),max(max),sum(count),any_value(any_value)
if (!(outerFunc.getName().equals("sum") && innerFunc.getName().equals("count"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trans sum(count()) to count() lead to nullable changed if outer agg is scalar agg. so we need to wrap the final expression with nullable() function to change its nullable to true


sql "sync"

qt_maxMax_minMin_sumSum_sumCount """
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add some shape check or ut to ensure merge agg work well

@feiniaofeiafei
Copy link
Contributor Author

run buildall

@feiniaofeiafei
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 37964 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 6fe78ae5a7bdee8a7780729e244659c8daf1037e, data reload: false

------ Round 1 ----------------------------------
q1	17678	4062	4039	4039
q2	2026	152	143	143
q3	10554	959	910	910
q4	4661	939	946	939
q5	7600	2845	2979	2845
q6	180	125	124	124
q7	1283	822	814	814
q8	9506	2040	2020	2020
q9	7304	6439	6483	6439
q10	8265	2594	2595	2594
q11	415	223	228	223
q12	751	324	321	321
q13	18721	3027	2958	2958
q14	284	263	258	258
q15	491	452	442	442
q16	471	408	395	395
q17	943	882	849	849
q18	6869	5936	5872	5872
q19	1572	1508	1516	1508
q20	550	280	280	280
q21	7611	3673	3736	3673
q22	821	342	318	318
Total cold run time: 108556 ms
Total hot run time: 37964 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4154	4133	4108	4108
q2	332	237	238	237
q3	3105	3085	3060	3060
q4	1897	1895	1904	1895
q5	5368	5361	5296	5296
q6	207	123	117	117
q7	2300	1881	1843	1843
q8	3214	3310	3274	3274
q9	8536	8511	8483	8483
q10	6163	3704	3712	3704
q11	527	445	442	442
q12	688	502	523	502
q13	12749	2800	2784	2784
q14	277	246	264	246
q15	477	443	435	435
q16	461	400	418	400
q17	1693	1665	1631	1631
q18	7826	7340	7203	7203
q19	3029	1590	1582	1582
q20	1907	1678	1732	1678
q21	4858	4732	4760	4732
q22	535	466	461	461
Total cold run time: 70303 ms
Total hot run time: 54113 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 187125 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 6fe78ae5a7bdee8a7780729e244659c8daf1037e, data reload: false

query1	927	377	338	338
query2	7398	2057	2123	2057
query3	6707	211	218	211
query4	26888	21115	21081	21081
query5	4269	4158	481	481
query6	268	189	172	172
query7	4627	310	309	309
query8	238	178	177	177
query9	8511	2316	2296	2296
query10	447	238	263	238
query11	14933	14418	14264	14264
query12	142	97	115	97
query13	1649	446	437	437
query14	10624	10131	9914	9914
query15	273	210	192	192
query16	8010	281	276	276
query17	1705	612	572	572
query18	2087	294	287	287
query19	206	164	173	164
query20	101	88	89	88
query21	203	131	137	131
query22	4649	4495	4490	4490
query23	31764	30673	30696	30673
query24	10771	3090	3081	3081
query25	663	386	410	386
query26	1518	175	175	175
query27	3041	366	362	362
query28	6530	1917	1874	1874
query29	1025	622	624	622
query30	315	150	154	150
query31	934	754	767	754
query32	100	71	69	69
query33	747	278	271	271
query34	987	471	502	471
query35	850	656	644	644
query36	927	813	814	813
query37	148	76	77	76
query38	3254	3114	3106	3106
query39	1422	1352	1395	1352
query40	206	131	121	121
query41	59	56	52	52
query42	104	117	112	112
query43	440	395	417	395
query44	1097	716	708	708
query45	269	260	262	260
query46	1046	796	803	796
query47	1654	1580	1535	1535
query48	427	352	360	352
query49	1148	338	338	338
query50	786	398	390	390
query51	6738	6656	6697	6656
query52	117	98	95	95
query53	359	288	285	285
query54	306	264	267	264
query55	90	84	90	84
query56	263	246	237	237
query57	1072	1008	994	994
query58	254	223	223	223
query59	2505	2249	2279	2249
query60	273	278	259	259
query61	128	125	137	125
query62	666	434	451	434
query63	309	282	283	282
query64	4957	3279	3167	3167
query65	3037	3020	3059	3020
query66	843	331	328	328
query67	15110	14465	14423	14423
query68	12798	585	576	576
query69	726	398	385	385
query70	1351	1064	1102	1064
query71	612	270	285	270
query72	10157	2650	2493	2493
query73	2287	336	342	336
query74	7198	6840	6707	6707
query75	12137	8058	8014	8014
query76	8184	1160	1182	1160
query77	970	255	264	255
query78	10107	9467	9456	9456
query79	8870	522	539	522
query80	1175	399	403	399
query81	492	207	204	204
query82	345	202	202	202
query83	243	146	141	141
query84	282	79	76	76
query85	1179	334	329	329
query86	367	290	288	288
query87	3397	3186	3253	3186
query88	2964	2364	2365	2364
query89	537	382	385	382
query90	2474	179	178	178
query91	164	129	130	129
query92	68	48	49	48
query93	3346	546	533	533
query94	1544	196	194	194
query95	470	362	353	353
query96	594	269	264	264
query97	4510	4310	4353	4310
query98	230	211	216	211
query99	1136	761	767	761
Total cold run time: 315035 ms
Total hot run time: 187125 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.81 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 6fe78ae5a7bdee8a7780729e244659c8daf1037e, data reload: false

query1	0.04	0.04	0.03
query2	0.06	0.03	0.02
query3	0.23	0.06	0.06
query4	1.67	0.10	0.10
query5	0.51	0.53	0.52
query6	1.23	0.69	0.68
query7	0.02	0.01	0.01
query8	0.04	0.03	0.03
query9	0.58	0.52	0.51
query10	0.58	0.59	0.57
query11	0.13	0.10	0.10
query12	0.12	0.10	0.10
query13	0.58	0.57	0.56
query14	0.73	0.75	0.76
query15	0.83	0.80	0.79
query16	0.36	0.38	0.36
query17	0.96	0.97	0.96
query18	0.27	0.26	0.25
query19	1.79	1.77	1.71
query20	0.01	0.01	0.01
query21	15.42	0.63	0.66
query22	3.37	3.86	2.50
query23	17.30	1.04	0.90
query24	2.18	0.60	0.31
query25	0.26	0.08	0.05
query26	0.16	0.14	0.14
query27	0.02	0.03	0.03
query28	12.15	0.84	0.82
query29	12.57	3.26	3.53
query30	0.63	0.62	0.57
query31	2.78	0.33	0.35
query32	3.39	0.44	0.44
query33	2.91	2.91	2.85
query34	15.51	4.31	4.28
query35	4.30	4.30	4.27
query36	1.10	1.01	1.01
query37	0.06	0.05	0.05
query38	0.04	0.03	0.03
query39	0.02	0.02	0.02
query40	0.17	0.14	0.13
query41	0.08	0.02	0.02
query42	0.03	0.02	0.02
query43	0.02	0.03	0.03
Total cold run time: 105.21 s
Total hot run time: 30.81 s

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit 6fe78ae5a7bdee8a7780729e244659c8daf1037e with default session variables
Stream load json:         18 seconds loaded 2358488459 Bytes, about 124 MB/s
Stream load orc:          58 seconds loaded 1101869774 Bytes, about 18 MB/s
Stream load parquet:      31 seconds loaded 861443392 Bytes, about 26 MB/s
Insert into select:       16.2 seconds inserted 10000000 Rows, about 617K ops/s

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Mar 13, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

@starocean999 starocean999 merged commit f3fdb30 into apache:master Mar 13, 2024
27 of 31 checks passed
morrySnow pushed a commit that referenced this pull request May 27, 2024
introduced by #31811

sql like this:

    select col1, col2 from  (select a as col1, a as col2 from mal_test1 group by a) t group by col1, col2 ;

Transformation Description:
In the process of optimizing the query, an agg-project-agg pattern is transformed into a project-agg pattern:
Before Transformation:

LogicalAggregate
+-- LogicalPrject
    +-- LogicalAggregate

After Transformation:

LogicalProject
+-- LogicalAggregate

Before the transformation, the projection in the LogicalProject was a AS col1, a AS col2, and the outer aggregate group by keys were col1, col2. After the transformation, the aggregate group by keys became a, a, and the projection remained a AS col1, a AS col2.

Problem:
When building the project projections, the group by key a, a needed to be transformed to a AS col1, a AS col2. The old code had a bug where it used the slot as the map key and the alias in the projections as the map value. This approach did not account for the situation where aliases might have the same slot.

Solution:
The new code fixes this issue by using the original outer aggregate group by expression's exprId. It searches within the original project projections to find the NamedExpression that has the same exprId. These expressions are then placed into the new projections. This method ensures that the correct aliases are maintained, resolving the bug.
yiguolei pushed a commit that referenced this pull request May 28, 2024
introduced by #31811

sql like this:

    select col1, col2 from  (select a as col1, a as col2 from mal_test1 group by a) t group by col1, col2 ;

Transformation Description:
In the process of optimizing the query, an agg-project-agg pattern is transformed into a project-agg pattern:
Before Transformation:

LogicalAggregate
+-- LogicalPrject
    +-- LogicalAggregate

After Transformation:

LogicalProject
+-- LogicalAggregate

Before the transformation, the projection in the LogicalProject was a AS col1, a AS col2, and the outer aggregate group by keys were col1, col2. After the transformation, the aggregate group by keys became a, a, and the projection remained a AS col1, a AS col2.

Problem:
When building the project projections, the group by key a, a needed to be transformed to a AS col1, a AS col2. The old code had a bug where it used the slot as the map key and the alias in the projections as the map value. This approach did not account for the situation where aliases might have the same slot.

Solution:
The new code fixes this issue by using the original outer aggregate group by expression's exprId. It searches within the original project projections to find the NamedExpression that has the same exprId. These expressions are then placed into the new projections. This method ensures that the correct aliases are maintained, resolving the bug.
dataroaring pushed a commit that referenced this pull request May 28, 2024
introduced by #31811

sql like this:

    select col1, col2 from  (select a as col1, a as col2 from mal_test1 group by a) t group by col1, col2 ;

Transformation Description:
In the process of optimizing the query, an agg-project-agg pattern is transformed into a project-agg pattern:
Before Transformation:

LogicalAggregate
+-- LogicalPrject
    +-- LogicalAggregate

After Transformation:

LogicalProject
+-- LogicalAggregate

Before the transformation, the projection in the LogicalProject was a AS col1, a AS col2, and the outer aggregate group by keys were col1, col2. After the transformation, the aggregate group by keys became a, a, and the projection remained a AS col1, a AS col2.

Problem:
When building the project projections, the group by key a, a needed to be transformed to a AS col1, a AS col2. The old code had a bug where it used the slot as the map key and the alias in the projections as the map value. This approach did not account for the situation where aliases might have the same slot.

Solution:
The new code fixes this issue by using the original outer aggregate group by expression's exprId. It searches within the original project projections to find the NamedExpression that has the same exprId. These expressions are then placed into the new projections. This method ensures that the correct aliases are maintained, resolving the bug.
morrySnow pushed a commit that referenced this pull request Jun 13, 2024
…e members (#36145)

This bug is induced by #31811.
The innerAggExprIdToAggFunc was a member of MergeAggregate, which was
wrong. Because rules like MergeAggregate are single instances, same
rules applied to different sub-plans will affect each other. This pr
changes innerAggExprIdToAggFunc to a local variable, fixes this bug.

No regression use case was added because it’s not a problem that will
definitely reoccur and requires the same rule to be applied to multiple
plans at the same time.
feiniaofeiafei added a commit to feiniaofeiafei/doris that referenced this pull request Jun 13, 2024
…e members (apache#36145)

This bug is induced by apache#31811.
The innerAggExprIdToAggFunc was a member of MergeAggregate, which was
wrong. Because rules like MergeAggregate are single instances, same
rules applied to different sub-plans will affect each other. This pr
changes innerAggExprIdToAggFunc to a local variable, fixes this bug.

No regression use case was added because it’s not a problem that will
definitely reoccur and requires the same rule to be applied to multiple
plans at the same time.
dataroaring pushed a commit that referenced this pull request Jun 13, 2024
…e members (#36145)

This bug is induced by #31811.
The innerAggExprIdToAggFunc was a member of MergeAggregate, which was
wrong. Because rules like MergeAggregate are single instances, same
rules applied to different sub-plans will affect each other. This pr
changes innerAggExprIdToAggFunc to a local variable, fixes this bug.

No regression use case was added because it’s not a problem that will
definitely reoccur and requires the same rule to be applied to multiple
plans at the same time.
924060929 pushed a commit that referenced this pull request Oct 24, 2024
…on all (#41613) (#41909)

introduce by #31811 and #39450
```sql
select count(1) from(select 3, 6 union all select 1, 3) t
```
wrong LogicalUnion plan:
```sql
LogicalUnion( qualifier=ALL, outputs=[3#6], regularChildrenOutputs=[], constantExprsList=[[], []], hasPushedFilter=false
```
this sql will report error in explain, because the logical union outputs has a slot, but the logical union has no child and has a empty constantExprList, which is wrong set in column prune.
this pr fixes it by consider when require columns is empty and keep the min slot and min slot corresponding const expressions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.4-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants