Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(rust): add cluster_with_columns plan optimization #16274

Merged
merged 21 commits into from
May 22, 2024

Conversation

coastalwhite
Copy link
Collaborator

@coastalwhite coastalwhite commented May 16, 2024

This adds a new optimization pass for the query engine that clusters several sequential WITH COLUMNS calls.

When the optimizer spots a chain of WITH COLUMNS, it tries to pull expressions as close to the source as possible. If all expressions can be pulled up closer to the source, the WITH COLUMNS node is removed entirely.

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature rust Related to Rust Polars labels May 16, 2024
@ritchie46
Copy link
Member

When the optimizer spots a chain of WITH COLUMNS, it tries to pull expressions as close to the source as possible.

I think we must try to pull as close to the first WITH_COLUMNS. Closer to the source isn't per se beneficial, as the crossing operations like Filter and Join make those node more expensive.

@coastalwhite
Copy link
Collaborator Author

Currently, we only optimize sequential WITH_COLUMNS calls if there is nothing in between them. So at the moment, closer to source is roughly equal to first WITH_COLUMNS.

@coastalwhite
Copy link
Collaborator Author

I think this CI failure is a false negative

Copy link

codecov bot commented May 17, 2024

Codecov Report

Attention: Patch coverage is 88.84892% with 31 lines in your changes are missing coverage. Please review.

Project coverage is 81.41%. Comparing base (ec08d76) to head (f9bee6f).
Report is 4 commits behind head on main.

Files Patch % Lines
crates/polars-arrow/src/bitmap/bitmap_ops.rs 55.17% 26 Missing ⚠️
crates/polars-arrow/src/bitmap/immutable.rs 0.00% 3 Missing ⚠️
...src/logical_plan/optimizer/cluster_with_columns.rs 99.35% 1 Missing ⚠️
crates/polars-plan/src/logical_plan/visitor/lp.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #16274      +/-   ##
==========================================
+ Coverage   81.39%   81.41%   +0.01%     
==========================================
  Files        1406     1409       +3     
  Lines      183953   184497     +544     
  Branches     2958     2960       +2     
==========================================
+ Hits       149731   150206     +475     
- Misses      33709    33776      +67     
- Partials      513      515       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@coastalwhite
Copy link
Collaborator Author

coastalwhite commented May 17, 2024

I think this PR is mostly ready. Some points to specifically look at are.

  • Maybe there needs to be additional tests, as this can be quite impactful.
  • I am not sure if I am handling the ProjectionOptions correctly.
  • I am not sure about the implementation of AExpr::Wildcard and AExpr::Function.
  • There is currently a lot of to deal with making the Bitmap into an IndexSet. Perhaps, it is worth it just to make an index set. That is what the commented Bitset struct is. But if it is not wanted, I will just remove that before merging.
  • I am currently ad-hoc adjusting the schemas as using IRBuilder would require more code. I think it is fine here, but maybe you don't agree.

@coastalwhite coastalwhite marked this pull request as ready for review May 17, 2024 14:25
Copy link
Member

@ritchie46 ritchie46 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe there needs to be additional tests, as this can be quite impactful.

Yes, can you try to come up with some combinations on with_columns and data dependencies on the python side?

I am not sure about the implementation of AExpr::Wildcard and AExpr::Function.

This will resolve itself if we iterate as proposed.

There is currently a lot of to deal with making the Bitmap into an IndexSet. Perhaps, it is worth it just to make an index set. That is what the commented Bitset struct is. But if it is not wanted, I will just remove that before merging.

I feel more for exposing a bitand to a MutableBitmap then the code is has more global use.

@ritchie46
Copy link
Member

Nice one! I have left a few comments.

@coastalwhite

This comment was marked as outdated.

@coastalwhite
Copy link
Collaborator Author

Okay! This is ready to get merged 🚀

Copy link

codspeed-hq bot commented May 21, 2024

CodSpeed Performance Report

Merging #16274 will not alter performance

Comparing coastalwhite:opt-cluster-with-columns (f9bee6f) with main (4375930)

Summary

✅ 37 untouched benchmarks

@ritchie46 ritchie46 merged commit c75dba9 into pola-rs:main May 22, 2024
26 checks passed
@coastalwhite coastalwhite deleted the opt-cluster-with-columns branch May 22, 2024 10:42
@c-peters c-peters added the accepted Ready for implementation label May 27, 2024
Wouittone pushed a commit to Wouittone/polars that referenced this pull request Jun 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Ready for implementation enhancement New feature or an improvement of an existing feature rust Related to Rust Polars
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants