-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[improvement](mtmv) Materialized view partition track supports date_trunc and optimize the fail reason #35562
Conversation
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
run buildall |
1 similar comment
run buildall |
TPC-H: Total hot run time: 40704 ms
|
TPC-DS: Total hot run time: 170257 ms
|
ClickBench: Total hot run time: 30.02 s
|
run buildall |
TPC-H: Total hot run time: 39956 ms
|
TPC-DS: Total hot run time: 171459 ms
|
ClickBench: Total hot run time: 30.74 s
|
b0f6bf8
to
706d12b
Compare
run buildall |
TPC-H: Total hot run time: 39818 ms
|
TPC-DS: Total hot run time: 169663 ms
|
ClickBench: Total hot run time: 30.13 s
|
fe/fe-core/src/main/java/org/apache/doris/nereids/rules/expression/ExpressionNormalization.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/literal/Interval.java
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/literal/Interval.java
Outdated
Show resolved
Hide resolved
...rc/main/java/org/apache/doris/nereids/trees/plans/commands/info/MTMVPartitionDefinition.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/nereids/rules/expression/rules/MergeDateTrunc.java
Outdated
Show resolved
Hide resolved
run buildall |
TPC-H: Total hot run time: 42107 ms
|
TPC-DS: Total hot run time: 172696 ms
|
ClickBench: Total hot run time: 30.17 s
|
PR approved by at least one committer and no changes requested. |
PR approved by anyone and no changes requested. |
run buildall |
run buildall |
ClickBench: Total hot run time: 31.17 s
|
DISTRIBUTED BY RANDOM BUCKETS 2 | ||
PROPERTIES ( | ||
'replication_num' = '1' | ||
) | ||
AS | ||
SELECT date_trunc(`k2`,'miniute') as month_alias, * FROM ${tableName}; | ||
SELECT date_trunc(`k2`,'miniute') as miniute_alias, * FROM ${tableName}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
miniute==>minute?
… optimize the fail reason (#35562) this depends on #34781 1. Materialized view partition track supports date_trunc and optimize the fail reason. 2. it supports create partition mv as following: this mv will be partition updated by day CREATE MATERIALIZED VIEW mv_6 BUILD IMMEDIATE REFRESH AUTO ON MANUAL partition by(date_trunc(date_alias, 'day')) DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ('replication_num' = '1') AS SELECT date_trunc(t1.L_SHIPDATE, 'hour') as date_alias, t2.O_ORDERDATE, t1.L_QUANTITY, t2.O_ORDERSTATUS, count(distinct case when t1.L_SUPPKEY > 0 then t2.O_ORDERSTATUS else null end) as cnt_1 from (select * from lineitem where L_SHIPDATE in ('2017-01-30')) t1 left join (select * from orders where O_ORDERDATE in ('2017-01-30')) t2 on t1.L_ORDERKEY = t2.O_ORDERKEY group by t1.L_SHIPDATE, t2.O_ORDERDATE, t1.L_QUANTITY, t2.O_ORDERSTATUS;
… optimize the fail reason (apache#35562) this depends on apache#34781 1. Materialized view partition track supports date_trunc and optimize the fail reason. 2. it supports create partition mv as following: this mv will be partition updated by day CREATE MATERIALIZED VIEW mv_6 BUILD IMMEDIATE REFRESH AUTO ON MANUAL partition by(date_trunc(date_alias, 'day')) DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ('replication_num' = '1') AS SELECT date_trunc(t1.L_SHIPDATE, 'hour') as date_alias, t2.O_ORDERDATE, t1.L_QUANTITY, t2.O_ORDERSTATUS, count(distinct case when t1.L_SUPPKEY > 0 then t2.O_ORDERSTATUS else null end) as cnt_1 from (select * from lineitem where L_SHIPDATE in ('2017-01-30')) t1 left join (select * from orders where O_ORDERDATE in ('2017-01-30')) t2 on t1.L_ORDERKEY = t2.O_ORDERKEY group by t1.L_SHIPDATE, t2.O_ORDERDATE, t1.L_QUANTITY, t2.O_ORDERSTATUS;
… optimize the fail reason (apache#35562) this depends on apache#34781 1. Materialized view partition track supports date_trunc and optimize the fail reason. 2. it supports create partition mv as following: this mv will be partition updated by day CREATE MATERIALIZED VIEW mv_6 BUILD IMMEDIATE REFRESH AUTO ON MANUAL partition by(date_trunc(date_alias, 'day')) DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ('replication_num' = '1') AS SELECT date_trunc(t1.L_SHIPDATE, 'hour') as date_alias, t2.O_ORDERDATE, t1.L_QUANTITY, t2.O_ORDERSTATUS, count(distinct case when t1.L_SUPPKEY > 0 then t2.O_ORDERSTATUS else null end) as cnt_1 from (select * from lineitem where L_SHIPDATE in ('2017-01-30')) t1 left join (select * from orders where O_ORDERDATE in ('2017-01-30')) t2 on t1.L_ORDERKEY = t2.O_ORDERKEY group by t1.L_SHIPDATE, t2.O_ORDERDATE, t1.L_QUANTITY, t2.O_ORDERSTATUS;
… optimize the fail reason (apache#35562) this depends on apache#34781 1. Materialized view partition track supports date_trunc and optimize the fail reason. 2. it supports create partition mv as following: this mv will be partition updated by day CREATE MATERIALIZED VIEW mv_6 BUILD IMMEDIATE REFRESH AUTO ON MANUAL partition by(date_trunc(date_alias, 'day')) DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ('replication_num' = '1') AS SELECT date_trunc(t1.L_SHIPDATE, 'hour') as date_alias, t2.O_ORDERDATE, t1.L_QUANTITY, t2.O_ORDERSTATUS, count(distinct case when t1.L_SUPPKEY > 0 then t2.O_ORDERSTATUS else null end) as cnt_1 from (select * from lineitem where L_SHIPDATE in ('2017-01-30')) t1 left join (select * from orders where O_ORDERDATE in ('2017-01-30')) t2 on t1.L_ORDERKEY = t2.O_ORDERKEY group by t1.L_SHIPDATE, t2.O_ORDERDATE, t1.L_QUANTITY, t2.O_ORDERSTATUS;
…by (#36175) This is brought by #35562 At the pr above when you create partition materialized view as following, which would fail with the message: Unable to find a suitable base table for partitioning CREATE MATERIALIZED VIEW mvName BUILD IMMEDIATE REFRESH AUTO ON MANUAL PARTITION BY (date_trunc(month_alias, 'month')) DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ( 'replication_num' = '1' ) AS SELECT date_trunc(`k2`,'day') AS month_alias, k3, count(*) FROM tableName GROUP BY date_trunc(`k2`,'day'), k3; This pr supports to create partition materialized view when `date_trunc` in group by cluause.
… rewrite by partition rolled up mv (#36414) This is brought by #35562 When mv is partition rolled up mv, which is rolled up by date_trunc. If base table add new partition. if query rewrite successfully by the partition mv, the data will lost the new partition data. This pr fix this problem. For example as following: mv def is: CREATE MATERIALIZED VIEW roll_up_mv BUILD IMMEDIATE REFRESH AUTO ON MANUAL partition by (date_trunc(`col1`, 'month')) DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ('replication_num' = '1') AS select date_trunc(`l_shipdate`, 'day') as col1, l_shipdate, o_orderdate, l_partkey, l_suppkey, sum(o_totalprice) as sum_total from lineitem left join orders on lineitem.l_orderkey = orders.o_orderkey and l_shipdate = o_orderdate group by col1, l_shipdate, o_orderdate, l_partkey, l_suppkey; if run the insert comand insert into lineitem values (1, 2, 3, 4, 5.5, 6.5, 7.5, 8.5, 'o', 'k', '2023-11-21', '2023-11-21', '2023-11-21', 'a', 'b', 'yyyyyyyyy'); then run query as following, result will not return the 2023-11-21 partition data select date_trunc(`l_shipdate`, 'day') as col1, l_shipdate, o_orderdate, l_partkey, l_suppkey, sum(o_totalprice) as sum_total from lineitem left join orders on lineitem.l_orderkey = orders.o_orderkey and l_shipdate = o_orderdate group by col1, l_shipdate, o_orderdate, l_partkey, l_suppkey;
…by (apache#36175) This is brought by apache#35562 At the pr above when you create partition materialized view as following, which would fail with the message: Unable to find a suitable base table for partitioning CREATE MATERIALIZED VIEW mvName BUILD IMMEDIATE REFRESH AUTO ON MANUAL PARTITION BY (date_trunc(month_alias, 'month')) DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ( 'replication_num' = '1' ) AS SELECT date_trunc(`k2`,'day') AS month_alias, k3, count(*) FROM tableName GROUP BY date_trunc(`k2`,'day'), k3; This pr supports to create partition materialized view when `date_trunc` in group by cluause.
… rewrite by partition rolled up mv (apache#36414) This is brought by apache#35562 When mv is partition rolled up mv, which is rolled up by date_trunc. If base table add new partition. if query rewrite successfully by the partition mv, the data will lost the new partition data. This pr fix this problem. For example as following: mv def is: CREATE MATERIALIZED VIEW roll_up_mv BUILD IMMEDIATE REFRESH AUTO ON MANUAL partition by (date_trunc(`col1`, 'month')) DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ('replication_num' = '1') AS select date_trunc(`l_shipdate`, 'day') as col1, l_shipdate, o_orderdate, l_partkey, l_suppkey, sum(o_totalprice) as sum_total from lineitem left join orders on lineitem.l_orderkey = orders.o_orderkey and l_shipdate = o_orderdate group by col1, l_shipdate, o_orderdate, l_partkey, l_suppkey; if run the insert comand insert into lineitem values (1, 2, 3, 4, 5.5, 6.5, 7.5, 8.5, 'o', 'k', '2023-11-21', '2023-11-21', '2023-11-21', 'a', 'b', 'yyyyyyyyy'); then run query as following, result will not return the 2023-11-21 partition data select date_trunc(`l_shipdate`, 'day') as col1, l_shipdate, o_orderdate, l_partkey, l_suppkey, sum(o_totalprice) as sum_total from lineitem left join orders on lineitem.l_orderkey = orders.o_orderkey and l_shipdate = o_orderdate group by col1, l_shipdate, o_orderdate, l_partkey, l_suppkey;
…by (#36175) This is brought by #35562 At the pr above when you create partition materialized view as following, which would fail with the message: Unable to find a suitable base table for partitioning CREATE MATERIALIZED VIEW mvName BUILD IMMEDIATE REFRESH AUTO ON MANUAL PARTITION BY (date_trunc(month_alias, 'month')) DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ( 'replication_num' = '1' ) AS SELECT date_trunc(`k2`,'day') AS month_alias, k3, count(*) FROM tableName GROUP BY date_trunc(`k2`,'day'), k3; This pr supports to create partition materialized view when `date_trunc` in group by cluause.
… rewrite by partition rolled up mv (#36414) This is brought by #35562 When mv is partition rolled up mv, which is rolled up by date_trunc. If base table add new partition. if query rewrite successfully by the partition mv, the data will lost the new partition data. This pr fix this problem. For example as following: mv def is: CREATE MATERIALIZED VIEW roll_up_mv BUILD IMMEDIATE REFRESH AUTO ON MANUAL partition by (date_trunc(`col1`, 'month')) DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ('replication_num' = '1') AS select date_trunc(`l_shipdate`, 'day') as col1, l_shipdate, o_orderdate, l_partkey, l_suppkey, sum(o_totalprice) as sum_total from lineitem left join orders on lineitem.l_orderkey = orders.o_orderkey and l_shipdate = o_orderdate group by col1, l_shipdate, o_orderdate, l_partkey, l_suppkey; if run the insert comand insert into lineitem values (1, 2, 3, 4, 5.5, 6.5, 7.5, 8.5, 'o', 'k', '2023-11-21', '2023-11-21', '2023-11-21', 'a', 'b', 'yyyyyyyyy'); then run query as following, result will not return the 2023-11-21 partition data select date_trunc(`l_shipdate`, 'day') as col1, l_shipdate, o_orderdate, l_partkey, l_suppkey, sum(o_totalprice) as sum_total from lineitem left join orders on lineitem.l_orderkey = orders.o_orderkey and l_shipdate = o_orderdate group by col1, l_shipdate, o_orderdate, l_partkey, l_suppkey;
… optimize the fail reason (apache#35562) this depends on apache#34781 1. Materialized view partition track supports date_trunc and optimize the fail reason. 2. it supports create partition mv as following: this mv will be partition updated by day CREATE MATERIALIZED VIEW mv_6 BUILD IMMEDIATE REFRESH AUTO ON MANUAL partition by(date_trunc(date_alias, 'day')) DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ('replication_num' = '1') AS SELECT date_trunc(t1.L_SHIPDATE, 'hour') as date_alias, t2.O_ORDERDATE, t1.L_QUANTITY, t2.O_ORDERSTATUS, count(distinct case when t1.L_SUPPKEY > 0 then t2.O_ORDERSTATUS else null end) as cnt_1 from (select * from lineitem where L_SHIPDATE in ('2017-01-30')) t1 left join (select * from orders where O_ORDERDATE in ('2017-01-30')) t2 on t1.L_ORDERKEY = t2.O_ORDERKEY group by t1.L_SHIPDATE, t2.O_ORDERDATE, t1.L_QUANTITY, t2.O_ORDERSTATUS;
…by (apache#36175) This is brought by apache#35562 At the pr above when you create partition materialized view as following, which would fail with the message: Unable to find a suitable base table for partitioning CREATE MATERIALIZED VIEW mvName BUILD IMMEDIATE REFRESH AUTO ON MANUAL PARTITION BY (date_trunc(month_alias, 'month')) DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ( 'replication_num' = '1' ) AS SELECT date_trunc(`k2`,'day') AS month_alias, k3, count(*) FROM tableName GROUP BY date_trunc(`k2`,'day'), k3; This pr supports to create partition materialized view when `date_trunc` in group by cluause.
… rewrite by partition rolled up mv (apache#36414) This is brought by apache#35562 When mv is partition rolled up mv, which is rolled up by date_trunc. If base table add new partition. if query rewrite successfully by the partition mv, the data will lost the new partition data. This pr fix this problem. For example as following: mv def is: CREATE MATERIALIZED VIEW roll_up_mv BUILD IMMEDIATE REFRESH AUTO ON MANUAL partition by (date_trunc(`col1`, 'month')) DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ('replication_num' = '1') AS select date_trunc(`l_shipdate`, 'day') as col1, l_shipdate, o_orderdate, l_partkey, l_suppkey, sum(o_totalprice) as sum_total from lineitem left join orders on lineitem.l_orderkey = orders.o_orderkey and l_shipdate = o_orderdate group by col1, l_shipdate, o_orderdate, l_partkey, l_suppkey; if run the insert comand insert into lineitem values (1, 2, 3, 4, 5.5, 6.5, 7.5, 8.5, 'o', 'k', '2023-11-21', '2023-11-21', '2023-11-21', 'a', 'b', 'yyyyyyyyy'); then run query as following, result will not return the 2023-11-21 partition data select date_trunc(`l_shipdate`, 'day') as col1, l_shipdate, o_orderdate, l_partkey, l_suppkey, sum(o_totalprice) as sum_total from lineitem left join orders on lineitem.l_orderkey = orders.o_orderkey and l_shipdate = o_orderdate group by col1, l_shipdate, o_orderdate, l_partkey, l_suppkey;
…m both side of join (#40485) This is brought by #35562 if partition mv def is as following: CREATE MATERIALIZED VIEW mv1 BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL PARTITION BY (upgrade_day) DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ('replication_num' = '1') AS select t1.upgrade_day, t2.batch_no, count(*) from test2 t2 join test1 t1 on t1.upgrade_day = t2.upgrade_day group by t1.upgrade_day, t2.batch_no; the mv related partition table should `test1`, but now is `test2`, this pr fix this.
…m both side of join (apache#40485) This is brought by apache#35562 if partition mv def is as following: CREATE MATERIALIZED VIEW mv1 BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL PARTITION BY (upgrade_day) DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ('replication_num' = '1') AS select t1.upgrade_day, t2.batch_no, count(*) from test2 t2 join test1 t1 on t1.upgrade_day = t2.upgrade_day group by t1.upgrade_day, t2.batch_no; the mv related partition table should `test1`, but now is `test2`, this pr fix this.
…m both side of join (apache#40485) This is brought by apache#35562 if partition mv def is as following: CREATE MATERIALIZED VIEW mv1 BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL PARTITION BY (upgrade_day) DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ('replication_num' = '1') AS select t1.upgrade_day, t2.batch_no, count(*) from test2 t2 join test1 t1 on t1.upgrade_day = t2.upgrade_day group by t1.upgrade_day, t2.batch_no; the mv related partition table should `test1`, but now is `test2`, this pr fix this.
Proposed changes
this depends on #34781
Materialized view partition track supports date_trunc and optimize the fail reason.
it supports create partition mv as following:
this mv will be partition updated by day