-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(rust, python): allow arithmetic operations between numeric Series and list Series #18901
Conversation
@@ -53,23 +53,112 @@ fn lists_same_shapes(left: &ArrayRef, right: &ArrayRef) -> bool { | |||
} | |||
} | |||
|
|||
/// Arithmetic operations that can be applied to a Series |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there some existing code that does this? There's a whole bunch of similar things but I may've missed an actual implementation.
|
||
impl Op { | ||
/// Apply the operation to a pair of Series. | ||
fn apply_with_series(&self, lhs: &Series, rhs: &Series) -> PolarsResult<Series> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to do these two functions as a single generic function and failed...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why exactly? Seems this is possible with generics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't figure out the constraints that work. I guess I can try another round and report back when I get stuck.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the issue is that when you add two &Series
, you get a PolarsResult<Series>
, and when you add a &Series
and u8
you get Series
. Since the return types don't match, the constraints conflict (the former wants T: std:ops::Add<Output=PolarsResult<Series>>
, the latter wants T: std:ops::Add<Output=Series>
).
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #18901 +/- ##
=======================================
Coverage 79.77% 79.77%
=======================================
Files 1531 1531
Lines 208487 208536 +49
Branches 2913 2913
=======================================
+ Hits 166314 166370 +56
+ Misses 41622 41615 -7
Partials 551 551 ☔ View full report in Codecov by Sentry. |
CodSpeed Performance ReportMerging #18901 will not alter performanceComparing Summary
|
Hey @itamarst Thanks for the PR. Before I review, can you rebase? There seem to be many commits from previous iteration. |
Will try. |
…ic and the ls is a list.
4142998
to
ef43739
Compare
Looks like maybe one or two changes might be unnecessary, as reflected in the coverage report, so will make this draft while I fix that. |
OK, it's ready for review again. Not sure what's up with Windows failures. |
And it's both ready for review and green. |
let mut new_right_dtype = right_dtype.cast_leaf(leaf_super_dtype); | ||
|
||
// If we have e.g. Array and List, we want to convert those too. | ||
if (left_dtype.is_list() && right_dtype.is_array()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should do that. I want those casts to be explicit for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By explicit do you mean "explicitly cast list to array" and vice versa as needed? This section is definitely necessary in some form, if I remove it:
[gw4] linux -- Python 3.12.3 /home/itamarst/devel/polars/.venv/bin/python3
tests/unit/datatypes/test_array.py:106: in test_array_list_supertype
result = s1 == s2
polars/series/series.py:773: in __eq__
return self._comp(other, "eq")
polars/series/series.py:753: in _comp
return self._from_pyseries(getattr(self._s, op)(other._s))
E pyo3_runtime.PanicException: called `Result::unwrap()` on an `Err` value: SchemaMismatch(ErrString("invalid series dtype: expected `FixedSizeList`, got `list[f64]`"))
-------------------------------- Captured stderr call --------------------------------thread '<unnamed>' panicked at crates/polars-core/src/series/comparison.rs:132:9:
called `Result::unwrap()` on an `Err` value: SchemaMismatch(ErrString("invalid series dtype: expected `FixedSizeList`, got `list[f64]`"))
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
_______________________________ test_eq_array_cmp_list _______________________________[gw10] linux -- Python 3.12.3 /home/itamarst/devel/polars/.venv/bin/python3
tests/unit/series/test_equals.py:57: in test_eq_array_cmp_list
result = s == [1, 2]
polars/series/series.py:773: in __eq__
return self._comp(other, "eq")
polars/series/series.py:753: in _comp
return self._from_pyseries(getattr(self._s, op)(other._s))
E pyo3_runtime.PanicException: called `Result::unwrap()` on an `Err` value: SchemaMismatch(ErrString("invalid series dtype: expected `FixedSizeList`, got `list[i64]`"))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the user should cast explicitly. And we shuold return an error instead of panicking if they pass wrong types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do that, but given this will involving changing the behavior of those two tests (from e.g. "these two things are equal" to "this now errors asking for a cast"), this is a backwards incompatible change. So as long as that's OK.
|
||
impl Op { | ||
/// Apply the operation to a pair of Series. | ||
fn apply_with_series(&self, lhs: &Series, rhs: &Series) -> PolarsResult<Series> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why exactly? Seems this is possible with generics.
Thanks for the PR - we want to try doing this with a more optimized kernel that operates directly on the primitive values - I will take it from here |
@@ -47,55 +47,6 @@ fn is_cat_str_binary(type_left: &DataType, type_right: &DataType) -> bool { | |||
} | |||
} | |||
|
|||
fn process_list_arithmetic( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove logic from the type_coercion
optimizer - this should already be handled by get_arithmetic_field
during conversion
Multiply => wrap!(lhs * rhs), | ||
Divide => wrap!(lhs / rhs), | ||
Remainder => wrap!(lhs % rhs), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can make Op
generic but it costs a little bit of sanity I think 😄
Closed in favor of #19162 |
Fixes #8006
Fixes #14711
Followup to #9188, allow arithmetic between: