-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(rust): add cluster_with_columns plan optimization #16274
feat(rust): add cluster_with_columns plan optimization #16274
Conversation
I think we must try to pull as close to the first |
Currently, we only optimize sequential |
I think this CI failure is a false negative |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #16274 +/- ##
==========================================
+ Coverage 81.39% 81.41% +0.01%
==========================================
Files 1406 1409 +3
Lines 183953 184497 +544
Branches 2958 2960 +2
==========================================
+ Hits 149731 150206 +475
- Misses 33709 33776 +67
- Partials 513 515 +2 ☔ View full report in Codecov by Sentry. |
I think this PR is mostly ready. Some points to specifically look at are.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe there needs to be additional tests, as this can be quite impactful.
Yes, can you try to come up with some combinations on with_columns
and data dependencies on the python side?
I am not sure about the implementation of AExpr::Wildcard and AExpr::Function.
This will resolve itself if we iterate as proposed.
There is currently a lot of to deal with making the Bitmap into an IndexSet. Perhaps, it is worth it just to make an index set. That is what the commented Bitset struct is. But if it is not wanted, I will just remove that before merging.
I feel more for exposing a bitand
to a MutableBitmap
then the code is has more global use.
crates/polars-plan/src/logical_plan/optimizer/cluster_with_columns.rs
Outdated
Show resolved
Hide resolved
crates/polars-plan/src/logical_plan/optimizer/cluster_with_columns.rs
Outdated
Show resolved
Hide resolved
crates/polars-plan/src/logical_plan/optimizer/cluster_with_columns.rs
Outdated
Show resolved
Hide resolved
crates/polars-plan/src/logical_plan/optimizer/cluster_with_columns.rs
Outdated
Show resolved
Hide resolved
crates/polars-plan/src/logical_plan/optimizer/cluster_with_columns.rs
Outdated
Show resolved
Hide resolved
Nice one! I have left a few comments. |
e5c20a3
to
5eb7aca
Compare
0be0d6f
to
097b760
Compare
This comment was marked as outdated.
This comment was marked as outdated.
Okay! This is ready to get merged 🚀 |
CodSpeed Performance ReportMerging #16274 will not alter performanceComparing Summary
|
crates/polars-plan/src/logical_plan/optimizer/collect_members.rs
Outdated
Show resolved
Hide resolved
crates/polars-plan/src/logical_plan/optimizer/cluster_with_columns.rs
Outdated
Show resolved
Hide resolved
crates/polars-plan/src/logical_plan/optimizer/cluster_with_columns.rs
Outdated
Show resolved
Hide resolved
crates/polars-plan/src/logical_plan/optimizer/cluster_with_columns.rs
Outdated
Show resolved
Hide resolved
crates/polars-plan/src/logical_plan/optimizer/cluster_with_columns.rs
Outdated
Show resolved
Hide resolved
crates/polars-plan/src/logical_plan/optimizer/cluster_with_columns.rs
Outdated
Show resolved
Hide resolved
crates/polars-plan/src/logical_plan/optimizer/cluster_with_columns.rs
Outdated
Show resolved
Hide resolved
crates/polars-plan/src/logical_plan/optimizer/cluster_with_columns.rs
Outdated
Show resolved
Hide resolved
crates/polars-plan/src/logical_plan/optimizer/cluster_with_columns.rs
Outdated
Show resolved
Hide resolved
54163ae
to
34f3f5b
Compare
34f3f5b
to
f9bee6f
Compare
This adds a new optimization pass for the query engine that clusters several sequential
WITH COLUMNS
calls.When the optimizer spots a chain of
WITH COLUMNS
, it tries to pull expressions as close to the source as possible. If all expressions can be pulled up closer to the source, theWITH COLUMNS
node is removed entirely.