-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Only push predicates depending on the subset columns past unique()
#14668
Conversation
let mut root_count = 0; | ||
|
||
// if this condition is called more than once, its a binary or ternary operation. | ||
let condition = |_| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I understand this wanted to block predicates that had more than 1 column input, but because mut root_count
is not reset per predicate it would end up blocking all predicates after the first call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yeap. That had to reset.
true | ||
} | ||
if let Some(ref subset) = options.subset { | ||
// Predicates on the subset can pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since unique()
is equivalent to group_by().gather(index)
, so similar to group_by we can pass predicates on the subset (<=>group key) columns
@@ -132,7 +133,7 @@ where | |||
for name in root_names { | |||
if condition(name) { | |||
remove_keys.push(key.clone()); | |||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would insert the same key multiple times
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #14668 +/- ##
==========================================
+ Coverage 80.79% 80.82% +0.03%
==========================================
Files 1326 1326
Lines 173249 173215 -34
Branches 2456 2455 -1
==========================================
+ Hits 139976 140007 +31
+ Misses 32800 32736 -64
+ Partials 473 472 -1 ☔ View full report in Codecov by Sentry. |
} | ||
if let Some(ref subset) = options.subset { | ||
// Predicates on the subset can pass. | ||
let mut names_set = PlHashSet::<Arc<str>>::with_capacity(subset.len()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use &str
here? Then we don't have to heap allocate the Arc
.
let mut root_count = 0; | ||
|
||
// if this condition is called more than once, its a binary or ternary operation. | ||
let condition = |_| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yeap. That had to reset.
I agree on the fact that this should be treated the same as group by. I've left one minor comment. |
updated to use 409 | if let Some(ref subset) = options.subset {
| ---------- borrow of `options.subset.0` occurs here
...
423 | options
| ^^^^^^^ move out of `options` occurs here
...
426 | } else {
| - borrow might be used here, when `names_set` is dropped and runs the destructor for type `hashbrown::set::HashSet<&str, polars_core::export::ahash::RandomState>` |
Which is cheap because of |
Fix #14595.
Also improves the display in
explain()
to showmaintain_order
andkeep_strategy
, as these are useful to know when debugging.