-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(optimizer): Fix issues with join graph construction #3668
fix(optimizer): Fix issues with join graph construction #3668
Conversation
822f145
to
62a604c
Compare
CodSpeed Performance ReportMerging #3668 will degrade performances by 30.82%Comparing Summary
Benchmarks breakdown
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3668 +/- ##
==========================================
- Coverage 78.06% 77.80% -0.27%
==========================================
Files 728 729 +1
Lines 89967 90557 +590
==========================================
+ Hits 70236 70458 +222
- Misses 19731 20099 +368
|
break; | ||
} | ||
_ => { | ||
self.process_leaf_relation(plan); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be plan
or curr_node
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case it's intended to be plan
because we just grab the node at the top of the linear chain. I guess process_leaf_relation
is a misleading name. Changing the function name to add_relation
and adding a comment explaining what's going on.
for (name, node, done) in &mut self.join_conds_to_resolve { | ||
if !*done && schema.has_field(name) { | ||
*node = plan.clone(); | ||
let mut cur_node = plan; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be less bug prone to have
let old_plan = plan;
and then you can modify plan
freely instead of referring to cur_node
ending_node: &LogicalPlanRef, | ||
) { | ||
let mut cur_node = starting_node; | ||
while cur_node != ending_node { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this do a full equality operation or just a Arc::ptr
check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah for some reason I keep forgetting that Arc dereferences the inner value when doing eq. Will change this to ptr_eq
// Continue to children. | ||
cur_node = input; | ||
} | ||
_ => unreachable!(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's have a better message here
We make three fixes to join graph construction. With these changes, join ordering using the naive left deep join orderer produces correct results for TPCH queries.
Push up projections and filters only if they sit in between joins
Instead of pushing up all Projections and Filters until we hit an unreorderable node, we go down each linear chain of (reorderable) Projections and Filters until we hit a Join node, then push up the Projects and Filters we encountered along the way. If we hit an unreorderable node first, we simply treat the operator that sits at the top of the current linear chain as the relation to pass into the join.
For example, consider this query tree:
In between
InnerJoin(c=d)
andScan(c_prime)
there are Filter and Project nodes. Since there is no join belowInnerJoin(c=d)
, we take theFilter(c<5)
operator as the relation to pass into the join (as opposed to usingScan(c_prime)
and pushing up the Projects and Filters above it).If a relation needs to rename one column, make sure the other columns are also selected
Previously, a relation might need to have a column renamed. However when we did this, we didn't select the other columns in the relation, causing some columns to be dropped prematurely.
Rename columns for relations even if they are not involved in join conditions
Previously, if we encountered a projection (e.g.
a_prime <- a
), we would apply this projection directly above the source relation ifa_prime
was involved in a join, e.g.Join(left_on="a_prime", ..)
. Now, we uniformly apply these projections regardless if the projection affects a join condition.