-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(optimizer): Add join reordering as an optimizer rule #3642
feat(optimizer): Add join reordering as an optimizer rule #3642
Conversation
CodSpeed Performance ReportMerging #3642 will improve performances by 16.8%Comparing Summary
Benchmarks breakdown
|
5a5b27a
to
9075793
Compare
9075793
to
12da094
Compare
rule_batches.push(RuleBatch::new( | ||
vec![ | ||
Box::new(ReorderJoins::new()), | ||
Box::new(EnrichWithStats::new()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aren't the stats already enriched from the previous batch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but since we're creating new operators while join reordering, I re-enrich the new operators. Another possibility is to simply make operators enrich themselves with stats while we perform join reordering
@@ -90,7 +103,7 @@ pub struct Optimizer { | |||
|
|||
impl Optimizer { | |||
pub fn new(config: OptimizerConfig) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it may be worthwhile to refactor this to a OptimizerBuilder
where we can call methods to add batches.
let builder = OptimizerBuilder::default()
.join_reordering()
.simplify_expressions();
let optimizer = builder.build()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, done!
src/daft-logical-plan/src/optimization/rules/reorder_joins/join_graph.rs
Outdated
Show resolved
Hide resolved
src/daft-logical-plan/src/optimization/rules/reorder_joins/naive_left_deep_join_order.rs
Outdated
Show resolved
Hide resolved
src/daft-logical-plan/src/optimization/rules/reorder_joins/join_graph.rs
Outdated
Show resolved
Hide resolved
} | ||
|
||
impl OptimizerBuilder { | ||
pub fn reorder_joins(&mut self) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typical builder pattern should do the following instead
pub fn reorder_joins(self) -> Self {
so you can chain builder calls
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, fixed
src/daft-logical-plan/src/optimization/rules/reorder_joins/join_graph.rs
Outdated
Show resolved
Hide resolved
src/daft-logical-plan/src/optimization/rules/reorder_joins/join_graph.rs
Outdated
Show resolved
Hide resolved
/// Returns a tuple of the logical plan builder consisting of joins, and a bitmask indicating the plan IDs | ||
/// that are contained within the current logical plan builder. The bitmask is used for determining join | ||
/// conditions to use when logical plan builders are joined together. | ||
fn build_joins_from_join_order( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spoke offline, put join conditions into JoinOrderTree instead since this should just be a 1:1 translation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the discussion! Moved join condition resolution to join order construction time. Much easier to follow now
Applies the naive left deep join order from #3616 as an optimizer rule.
This optimizer rule is gated behind an environment variable that allows us to validate the rule on our current workloads.
Currently join reordering results in errors for 50% of TPC-H queries during join graph building. We'll tackle these in a follow-up PR.