Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(optimizer): Add join reordering as an optimizer rule #3642

Conversation

desmondcheongzx
Copy link
Contributor

@desmondcheongzx desmondcheongzx commented Jan 6, 2025

Applies the naive left deep join order from #3616 as an optimizer rule.
This optimizer rule is gated behind an environment variable that allows us to validate the rule on our current workloads.

Currently join reordering results in errors for 50% of TPC-H queries during join graph building. We'll tackle these in a follow-up PR.

@github-actions github-actions bot added the feat label Jan 6, 2025
Copy link

codspeed-hq bot commented Jan 6, 2025

CodSpeed Performance Report

Merging #3642 will improve performances by 16.8%

Comparing desmondcheongzx:apply-join-reorder-optimization (215b815) with main (e6c084f)

Summary

⚡ 1 improvements
✅ 26 untouched benchmarks

Benchmarks breakdown

Benchmark main desmondcheongzx:apply-join-reorder-optimization Change
test_count[1 Small File] 4 ms 3.4 ms +16.8%

@desmondcheongzx desmondcheongzx force-pushed the apply-join-reorder-optimization branch from 5a5b27a to 9075793 Compare January 6, 2025 23:03
@desmondcheongzx desmondcheongzx force-pushed the apply-join-reorder-optimization branch from 9075793 to 12da094 Compare January 7, 2025 00:29
rule_batches.push(RuleBatch::new(
vec![
Box::new(ReorderJoins::new()),
Box::new(EnrichWithStats::new()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aren't the stats already enriched from the previous batch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but since we're creating new operators while join reordering, I re-enrich the new operators. Another possibility is to simply make operators enrich themselves with stats while we perform join reordering

@@ -90,7 +103,7 @@ pub struct Optimizer {

impl Optimizer {
pub fn new(config: OptimizerConfig) -> Self {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it may be worthwhile to refactor this to a OptimizerBuilder where we can call methods to add batches.

let builder = OptimizerBuilder::default()
  .join_reordering()
  .simplify_expressions();

let optimizer = builder.build()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, done!

src/common/daft-config/src/lib.rs Outdated Show resolved Hide resolved
Copy link

codecov bot commented Jan 7, 2025

Codecov Report

Attention: Patch coverage is 54.73251% with 110 lines in your changes missing coverage. Please review.

Project coverage is 78.02%. Comparing base (2aac957) to head (215b815).
Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
...src/optimization/rules/reorder_joins/join_graph.rs 35.36% 53 Missing ⚠️
...l-plan/src/optimization/rules/reorder_joins/mod.rs 0.00% 37 Missing ⚠️
...rc/daft-logical-plan/src/optimization/optimizer.rs 82.95% 15 Missing ⚠️
src/common/daft-config/src/lib.rs 62.50% 3 Missing ⚠️
...l-plan/src/optimization/rules/enrich_with_stats.rs 75.00% 1 Missing ⚠️
...daft-physical-plan/src/physical_planner/planner.rs 0.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3642      +/-   ##
==========================================
- Coverage   78.09%   78.02%   -0.08%     
==========================================
  Files         721      727       +6     
  Lines       89145    89633     +488     
==========================================
+ Hits        69621    69935     +314     
- Misses      19524    19698     +174     
Files with missing lines Coverage Δ
src/daft-logical-plan/src/builder.rs 92.30% <100.00%> (+0.07%) ⬆️
src/daft-logical-plan/src/logical_plan.rs 73.63% <100.00%> (+1.38%) ⬆️
.../rules/reorder_joins/naive_left_deep_join_order.rs 95.49% <100.00%> (+0.04%) ⬆️
...l-plan/src/optimization/rules/enrich_with_stats.rs 90.90% <75.00%> (-9.10%) ⬇️
...daft-physical-plan/src/physical_planner/planner.rs 3.37% <0.00%> (ø)
src/common/daft-config/src/lib.rs 81.48% <62.50%> (-2.58%) ⬇️
...rc/daft-logical-plan/src/optimization/optimizer.rs 94.90% <82.95%> (-3.94%) ⬇️
...l-plan/src/optimization/rules/reorder_joins/mod.rs 0.00% <0.00%> (ø)
...src/optimization/rules/reorder_joins/join_graph.rs 88.79% <35.36%> (-5.91%) ⬇️

... and 39 files with indirect coverage changes

}

impl OptimizerBuilder {
pub fn reorder_joins(&mut self) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typical builder pattern should do the following instead

pub fn reorder_joins(self) -> Self {

so you can chain builder calls

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, fixed

src/daft-logical-plan/src/optimization/optimizer.rs Outdated Show resolved Hide resolved
src/daft-logical-plan/src/optimization/optimizer.rs Outdated Show resolved Hide resolved
/// Returns a tuple of the logical plan builder consisting of joins, and a bitmask indicating the plan IDs
/// that are contained within the current logical plan builder. The bitmask is used for determining join
/// conditions to use when logical plan builders are joined together.
fn build_joins_from_join_order(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spoke offline, put join conditions into JoinOrderTree instead since this should just be a 1:1 translation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the discussion! Moved join condition resolution to join order construction time. Much easier to follow now

@desmondcheongzx desmondcheongzx merged commit 1562569 into Eventual-Inc:main Jan 8, 2025
40 of 41 checks passed
@desmondcheongzx desmondcheongzx deleted the apply-join-reorder-optimization branch January 8, 2025 08:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants