-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[opt](Nereids) Replace Slot in Each Data Trait Separately #36886
Conversation
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
run buildall |
run buildall |
PR approved by anyone and no changes requested. |
TPC-H: Total hot run time: 39659 ms
|
TPC-H: Total hot run time: 39412 ms
|
run buildall |
TPC-H: Total hot run time: 39673 ms
|
TPC-DS: Total hot run time: 173999 ms
|
ClickBench: Total hot run time: 30.18 s
|
this.slots = this.slots.stream() | ||
.map(s -> replaceMap.getOrDefault(s, s)) | ||
.collect(Collectors.toSet()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use for
loop for better perf. in for loop u could use expected size collection to avoid resize it
Set<Slot> key = e.getKey().stream() | ||
.map(s -> replaceSlotMap.getOrDefault(s, s)) | ||
.collect(Collectors.toSet()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for loop
for (int i = 0; i < newOutputs.size(); i++) { | ||
replaceMap.put(originalOutputs.get(i), newOutputs.get(i)); | ||
for (int i = 0; i < children.size(); i++) { | ||
List<? extends Slot> originOutputs = this.regularChildrenOutputs.isEmpty() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this.regularChildrenOutputs.size() <= i
@@ -111,20 +111,29 @@ public LogicalIntersect withNewOutputs(List<NamedExpression> newOutputs) { | |||
|
|||
void replaceSlotInFuncDeps(DataTrait.Builder builder, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be removed?
@@ -188,6 +188,6 @@ public void computeUniform(Builder builder) { | |||
for (int i = 0; i < output.size(); i++) { | |||
replaceMap.put(originalOutputs.get(i), output.get(i)); | |||
} | |||
builder.replace(replaceMap); | |||
builder.replaceUniformBy(replaceMap); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not do same refactor as LogicalIntersect? there are same code in four functions too
run buildall |
d2a3c2b
to
a078836
Compare
run buildall |
TPC-H: Total hot run time: 39564 ms
|
TPC-DS: Total hot run time: 170631 ms
|
ClickBench: Total hot run time: 30.72 s
|
run p0 |
PR approved by at least one committer and no changes requested. |
a078836
to
ce8f99d
Compare
run buildall |
TPC-H: Total hot run time: 39353 ms
|
TPC-DS: Total hot run time: 174989 ms
|
ClickBench: Total hot run time: 30.49 s
|
PR approved by at least one committer and no changes requested. |
…eliminate fail (#36888) this depends on #36839 #36886 Such as low level materialized view contains 5 group by dimension, and query also has 5 group by dimension, they are equals.In this scene, would not add aggregate on mv when try to rewrite query by materialized view. But if query only use 4 group by dimension and the remain demension is can be eliminated, then the query will change to 4 group by dimension. this will cause add aggregate on mv and will cause high level materialize rewrite fail later. Solution: in aggregate rewrite by materialized view, we try to eliminate mv group by dimension by query used dimension. if eliminate successfully. then high level will rewrite continue. such as low level mv def sql is as following: def join_mv_1 = """ select l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, cast(sum(IFNULL(o_orderkey, 0) * IFNULL(o_custkey, 0)) as decimal(28, 8)) as agg1, sum(o_totalprice) as sum_total, max(o_totalprice) as max_total, min(o_totalprice) as min_total, count(*) as count_all, bitmap_union(to_bitmap(case when o_shippriority > 1 and o_orderkey IN (1, 3) then o_custkey else null end)) cnt_1, bitmap_union(to_bitmap(case when o_shippriority > 2 and o_orderkey IN (2) then o_custkey else null end)) as cnt_2 from lineitem_1 inner join orders_1 on lineitem_1.l_orderkey = orders_1.o_orderkey where lineitem_1.l_shipdate >= "2023-10-17" group by l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey """ def join_mv_2 = """ select l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, ps_partkey, ps_suppkey, t.agg1 as agg1, t.sum_total as agg3, t.max_total as agg4, t.min_total as agg5, t.count_all as agg6, cast(sum(IFNULL(ps_suppkey, 0) * IFNULL(ps_partkey, 0)) as decimal(28, 8)) as agg2 from ${mv_1} as t inner join partsupp_1 on t.l_partkey = partsupp_1.ps_partkey and t.l_suppkey = partsupp_1.ps_suppkey where partsupp_1.ps_suppkey > 1 group by l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, ps_partkey, ps_suppkey, agg1, agg3, agg4, agg5, agg6 """ high level mv def sql is as following: def join_mv_3 = """ select t1.l_orderkey, t2.l_partkey, t1.l_suppkey, t2.o_orderkey, t1.o_custkey, t2.ps_partkey, t1.ps_suppkey, t2.agg1, >t1.agg2, t2.agg3, t1.agg4, t2.agg5, t1.agg6 from ${mv_2} as t1 left join ${mv_2} as t2 on t1.l_orderkey = t2.l_orderkey where t1.l_orderkey > 1 group by t1.l_orderkey, t2.l_partkey, t1.l_suppkey, t2.o_orderkey, t1.o_custkey, t2.ps_partkey, t1.ps_suppkey, >t2.agg1, >t1.agg2, t2.agg3, t1.agg4, t2.agg5, t1.agg6 """ if we run the query as following, it can hit the mv3 select t1.l_orderkey, t2.l_partkey, t1.l_suppkey, t2.o_orderkey, t1.o_custkey, t2.ps_partkey, t1.ps_suppkey, t2.agg1, >t1.agg2, >t2.agg3, t1.agg4, t2.agg5, t1.agg6 from ( select l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, ps_partkey, ps_suppkey, t.agg1 as agg1, t.sum_total as agg3, t.max_total as agg4, t.min_total as agg5, t.count_all as agg6, cast(sum(IFNULL(ps_suppkey, 0) * IFNULL(ps_partkey, 0)) as decimal(28, 8)) as agg2 from ( select l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, cast(sum(IFNULL(o_orderkey, 0) * >IFNULL(o_custkey, 0)) as decimal(28, 8)) as agg1, sum(o_totalprice) as sum_total, max(o_totalprice) as max_total, min(o_totalprice) as min_total, count(*) as count_all, bitmap_union(to_bitmap(case when o_shippriority > 1 and o_orderkey IN (1, 3) then o_custkey else null end)) >cnt_1, bitmap_union(to_bitmap(case when o_shippriority > 2 and o_orderkey IN (2) then o_custkey else null end)) as >cnt_2 from lineitem_1 inner join orders_1 on lineitem_1.l_orderkey = orders_1.o_orderkey where lineitem_1.l_shipdate >= "2023-10-17" group by l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey ) as t inner join partsupp_1 on t.l_partkey = partsupp_1.ps_partkey and t.l_suppkey = partsupp_1.ps_suppkey where partsupp_1.ps_suppkey > 1 group by l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, ps_partkey, ps_suppkey, agg1, agg3, agg4, >agg5, >agg6 ) as t1 left join ( select l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, ps_partkey, ps_suppkey, t.agg1 as agg1, t.sum_total as agg3, t.max_total as agg4, t.min_total as agg5, t.count_all as agg6, cast(sum(IFNULL(ps_suppkey, 0) * IFNULL(ps_partkey, 0)) as decimal(28, 8)) as agg2 from ( select l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, cast(sum(IFNULL(o_orderkey, 0) * >IFNULL(o_custkey, 0)) as decimal(28, 8)) as agg1, sum(o_totalprice) as sum_total, max(o_totalprice) as max_total, min(o_totalprice) as min_total, count(*) as count_all, bitmap_union(to_bitmap(case when o_shippriority > 1 and o_orderkey IN (1, 3) then o_custkey else null end)) >cnt_1, bitmap_union(to_bitmap(case when o_shippriority > 2 and o_orderkey IN (2) then o_custkey else null end)) as >cnt_2 from lineitem_1 inner join orders_1 on lineitem_1.l_orderkey = orders_1.o_orderkey where lineitem_1.l_shipdate >= "2023-10-17" group by l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey ) as t inner join partsupp_1 on t.l_partkey = partsupp_1.ps_partkey and t.l_suppkey = partsupp_1.ps_suppkey where partsupp_1.ps_suppkey > 1 group by l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, ps_partkey, ps_suppkey, agg1, agg3, agg4, agg5, >agg6 ) as t2 on t1.l_orderkey = t2.l_orderkey where t1.l_orderkey > 1 group by t1.l_orderkey, t2.l_partkey, t1.l_suppkey, t2.o_orderkey, t1.o_custkey, t2.ps_partkey, t1.ps_suppkey, >t2.agg1, >t1.agg2, t2.agg3, t1.agg4, t2.agg5, t1.agg6 --------- Co-authored-by: xiejiann <jianxie0@gmail.com>
To avoid replacing slots in each data trait repeatedly, we split the replace function into four functions and replaced them separately.
…eliminate fail (#36888) this depends on #36839 #36886 Such as low level materialized view contains 5 group by dimension, and query also has 5 group by dimension, they are equals.In this scene, would not add aggregate on mv when try to rewrite query by materialized view. But if query only use 4 group by dimension and the remain demension is can be eliminated, then the query will change to 4 group by dimension. this will cause add aggregate on mv and will cause high level materialize rewrite fail later. Solution: in aggregate rewrite by materialized view, we try to eliminate mv group by dimension by query used dimension. if eliminate successfully. then high level will rewrite continue. such as low level mv def sql is as following: def join_mv_1 = """ select l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, cast(sum(IFNULL(o_orderkey, 0) * IFNULL(o_custkey, 0)) as decimal(28, 8)) as agg1, sum(o_totalprice) as sum_total, max(o_totalprice) as max_total, min(o_totalprice) as min_total, count(*) as count_all, bitmap_union(to_bitmap(case when o_shippriority > 1 and o_orderkey IN (1, 3) then o_custkey else null end)) cnt_1, bitmap_union(to_bitmap(case when o_shippriority > 2 and o_orderkey IN (2) then o_custkey else null end)) as cnt_2 from lineitem_1 inner join orders_1 on lineitem_1.l_orderkey = orders_1.o_orderkey where lineitem_1.l_shipdate >= "2023-10-17" group by l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey """ def join_mv_2 = """ select l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, ps_partkey, ps_suppkey, t.agg1 as agg1, t.sum_total as agg3, t.max_total as agg4, t.min_total as agg5, t.count_all as agg6, cast(sum(IFNULL(ps_suppkey, 0) * IFNULL(ps_partkey, 0)) as decimal(28, 8)) as agg2 from ${mv_1} as t inner join partsupp_1 on t.l_partkey = partsupp_1.ps_partkey and t.l_suppkey = partsupp_1.ps_suppkey where partsupp_1.ps_suppkey > 1 group by l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, ps_partkey, ps_suppkey, agg1, agg3, agg4, agg5, agg6 """ high level mv def sql is as following: def join_mv_3 = """ select t1.l_orderkey, t2.l_partkey, t1.l_suppkey, t2.o_orderkey, t1.o_custkey, t2.ps_partkey, t1.ps_suppkey, t2.agg1, >t1.agg2, t2.agg3, t1.agg4, t2.agg5, t1.agg6 from ${mv_2} as t1 left join ${mv_2} as t2 on t1.l_orderkey = t2.l_orderkey where t1.l_orderkey > 1 group by t1.l_orderkey, t2.l_partkey, t1.l_suppkey, t2.o_orderkey, t1.o_custkey, t2.ps_partkey, t1.ps_suppkey, >t2.agg1, >t1.agg2, t2.agg3, t1.agg4, t2.agg5, t1.agg6 """ if we run the query as following, it can hit the mv3 select t1.l_orderkey, t2.l_partkey, t1.l_suppkey, t2.o_orderkey, t1.o_custkey, t2.ps_partkey, t1.ps_suppkey, t2.agg1, >t1.agg2, >t2.agg3, t1.agg4, t2.agg5, t1.agg6 from ( select l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, ps_partkey, ps_suppkey, t.agg1 as agg1, t.sum_total as agg3, t.max_total as agg4, t.min_total as agg5, t.count_all as agg6, cast(sum(IFNULL(ps_suppkey, 0) * IFNULL(ps_partkey, 0)) as decimal(28, 8)) as agg2 from ( select l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, cast(sum(IFNULL(o_orderkey, 0) * >IFNULL(o_custkey, 0)) as decimal(28, 8)) as agg1, sum(o_totalprice) as sum_total, max(o_totalprice) as max_total, min(o_totalprice) as min_total, count(*) as count_all, bitmap_union(to_bitmap(case when o_shippriority > 1 and o_orderkey IN (1, 3) then o_custkey else null end)) >cnt_1, bitmap_union(to_bitmap(case when o_shippriority > 2 and o_orderkey IN (2) then o_custkey else null end)) as >cnt_2 from lineitem_1 inner join orders_1 on lineitem_1.l_orderkey = orders_1.o_orderkey where lineitem_1.l_shipdate >= "2023-10-17" group by l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey ) as t inner join partsupp_1 on t.l_partkey = partsupp_1.ps_partkey and t.l_suppkey = partsupp_1.ps_suppkey where partsupp_1.ps_suppkey > 1 group by l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, ps_partkey, ps_suppkey, agg1, agg3, agg4, >agg5, >agg6 ) as t1 left join ( select l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, ps_partkey, ps_suppkey, t.agg1 as agg1, t.sum_total as agg3, t.max_total as agg4, t.min_total as agg5, t.count_all as agg6, cast(sum(IFNULL(ps_suppkey, 0) * IFNULL(ps_partkey, 0)) as decimal(28, 8)) as agg2 from ( select l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, cast(sum(IFNULL(o_orderkey, 0) * >IFNULL(o_custkey, 0)) as decimal(28, 8)) as agg1, sum(o_totalprice) as sum_total, max(o_totalprice) as max_total, min(o_totalprice) as min_total, count(*) as count_all, bitmap_union(to_bitmap(case when o_shippriority > 1 and o_orderkey IN (1, 3) then o_custkey else null end)) >cnt_1, bitmap_union(to_bitmap(case when o_shippriority > 2 and o_orderkey IN (2) then o_custkey else null end)) as >cnt_2 from lineitem_1 inner join orders_1 on lineitem_1.l_orderkey = orders_1.o_orderkey where lineitem_1.l_shipdate >= "2023-10-17" group by l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey ) as t inner join partsupp_1 on t.l_partkey = partsupp_1.ps_partkey and t.l_suppkey = partsupp_1.ps_suppkey where partsupp_1.ps_suppkey > 1 group by l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, ps_partkey, ps_suppkey, agg1, agg3, agg4, agg5, >agg6 ) as t2 on t1.l_orderkey = t2.l_orderkey where t1.l_orderkey > 1 group by t1.l_orderkey, t2.l_partkey, t1.l_suppkey, t2.o_orderkey, t1.o_custkey, t2.ps_partkey, t1.ps_suppkey, >t2.agg1, >t1.agg2, t2.agg3, t1.agg4, t2.agg5, t1.agg6 --------- Co-authored-by: xiejiann <jianxie0@gmail.com>
Proposed changes
To avoid replacing slots in each data trait repeatedly, we split the replace function into four functions and replaced them separately.