refactor(rust): Arithmetics refactor and cleanup (part 1) #7537

aldanor · 2023-03-14T02:15:48Z

This is a preliminary refactor before starting to integrate decimal arithmetics in.

Currently, there's the PolarsNumericType which (unfortunately) plays way too many roles – from question like "is it a simple numerical thing we can just copy" to assumptions like "if it's a 'numeric' type, then we can just add and subtract the underlying physical values". This logic breaks for decimals, because i128 is a physical type, however PrimitiveArray<i128> does not use native arithmetics (moreover, we'd want to reimplement those arithmetics ops for it so as to allow some flexibility in maintaining scale/precision etc). So some rearrangement is unavoidable, and I figured I should push what I currently have in progress instead of keeping it so as not to suffer from constant rebasing.

A few changes:

NumOpsDispatch / NumOpsDispatchInner are removed completely
ArrayArithmetics has been removed (will implement respective traits for decimals manually at a later time)
There's a NativeArithmetics bound being used now for native arithmetics ops (may need to add our own 1:1 trait when implementing decimals due to potential shadowing problems)
Added Try<...> traits to unify fallible ops
Some unwraps/panics in fallible methods replaced with errors
Added PolarsFloatNative, this helps simplify trait bound copy/paste
Implementations of chunked array and series arithmetics are now unified via macros (all except checked div etc)
For chunked arrays, all array vs scalar ops will now use $op_scalar() methods from arrow2 for native arithmetics (wasn't the case previously)
For owned array vs array arithmetics, there was some env::var("ASSIGN") check which seemed to have been something temporary and has been removed
Renamed arithmetic method from SeriesTrait to series_$op() for clarity
Added ChunkedArray::with_name() helper
All macros in polars-error use $crate:: now
Downcasted iteration is now done via a trait instead of copy/paste

ritchie46

Thanks. This touches a lot of old code that indeed could use some refactoring/cleaning up.

One important question: You did not change any semantics right? Especially the owned arithmetic is an optimization I don't want to break and only works if the ref count of Arc<ChunkedArray> is 1.

ritchie46 · 2023-03-14T06:58:24Z

polars/polars-ops/src/frame/pivot/mod.rs

@@ -246,7 +246,7 @@ fn pivot_impl(
                debug_assert_eq!(value_agg_phys.len(), row_locations.len());

                let mut cols = if value_agg_phys.dtype().is_numeric() {
-                    macro_rules! dispatch {
+                    macro_rules! pivot {


I don't think we should call this pivot as we don't pivot here. dispatch is clearer to me that we simply have a utility macro to dispatch the inner function.

Right. This is purely for error reporting, it will include name of the macro if it fails for whatever reason. I can change it back to dispatch or pivot_dispatch or any better name if you have any ideas.

Btw (1): here's another case when you check is_numeric(). Is decimal numeric or not? :) For pivoting purposes - sure! For many other purposes, no... 🤦

Btw (2), it just occurred to me, as this is yet another point where you check if something is numeric in point 1 but use this implicit assumption in point 2, wouldn't it be cleaner to instead have

fn as_numeric(&self) -> Option<NumericChunked<'_>> { match self.dtype() { ... } } enum NumericChunked<'a> { UInt8(&'a UInt8Chunked), ... }

Yet another case where we can potentially dodge implicit assumptions and unwraps by using type system? Then errors simply can't occur and no unwraps are needed because you can do

if let Some(ca) = agg_phys.as_numeric() { ... // do stuff, it can't fail now }

polars/polars-core/src/series/arithmetic/borrowed.rs

ritchie46 · 2023-03-14T07:03:51Z

polars/polars-core/src/series/arithmetic/borrowed.rs

-    let rhs = rhs.struct_().unwrap();
-    let s_fields = s.fields();
+) -> PolarsResult<Series> {
+    let lhs = lhs.struct_().unwrap_or_else(|_| unreachable!());


What's the benefit of this over unwrap?

I think it was a marker for myself while reading this, basically saying this is only called with struct dtypes.

This is one of my biggest gripes with polars codebase 🙂 When staring at a particular piece of code you have no idea what assumptions are currently being used here, you see a bunch of unwraps and have no idea which of them CAN potentially trigger and which cannot ever happen, which are unhandled errors and which are completely unreachable. You have to scan all usages of a particular thing trying to infer it.

I think a lot of these cases you can convert to traits/errors or just restructure the code. E.g.:

This function should probably take &StructChunked, &StructChunked and then no unwraps are needed here

And upstream instead of matching (lhs.dtype(), rhs.dtype()) = (Struct(_), Struct(_)) you could instead do (lhs.struct_(), rhs.struct_()) = (Some(_), Some(_)), so no unwraps are needed again. No implicit assumptions either.

Right. Maybe just put an unwrap with a comment why we can unwrap or expect("impl error")?

This is one of my biggest gripes with polars codebase

I agree we can do better here. All unwraps should be considered asserts, but it isn't always the case yet.

aldanor · 2023-03-14T12:12:48Z

Thanks. This touches a lot of old code that indeed could use some refactoring/cleaning up.

One important question: You did not change any semantics right? Especially the owned arithmetic is an optimization I don't want to break and only works if the ref count of Arc<ChunkedArray> is 1.

I don't think I've changed any semantics there. As you can see, all of the owned arithmetic logic. Here's a gist of recursive expansion of Add / TryAdd macro for ChunkedArray so you can see more easily what's going on: https://gist.github.com/aldanor/0f612c7ee267bf6cb70b9d6d952a1997

A few potential changes:

There was some check for ASSIGN env var when doing owned vs owned ops; I think it was temporary as it was only defined for a few ops and not all of them, and there were no other references in the codebase so it has been removed.
Ops for references were implemented as Add for &ChunkedArray<T>. This is TOO restrictive because it can be unwrapped as:

Add for &ChunkedArray<T>
Add<Self> for &ChunkedArray<T>
Add<&'a ChunkedArray<T>> for &'a ChunkedArray<T>

which is basically requiring matching lifetimes on LHS and RHS of all arithmetic ops if you're dealing with references. This is overly restrictive and a better way is to explicitly defined 'lhs, 'rhs lifetimes.

Some operations were using arrow2::$op_scalar, some weren't. Now all of them do.
Another thing is this: for "array $op scalar" operations where array is owned, there's two options (the currently used one is option 2, but could specialize to 1 for div/rem I guess):
- use $op_scalar() – which would be faster for div/rem, but requires an allocation
- use normal $op but no allocation (that's the way it currently does it); will be slower for div/rem but always avoid an allocation)

ritchie46 · 2023-03-14T12:21:57Z

Another thing is this: for "array $op scalar" operations where array is owned, there's two options (the currently used one is option 2, but could specialize to 1 for div/rem I guess):

Could you point to the snippet you are referring?

ritchie46 · 2023-03-14T12:45:12Z

Ops for references were implemented as Add for &ChunkedArray<T>. This is TOO restrictive because it can be unwrapped as:

I don't think this is the case. This compiles on master. I believe the longer living lifetime can always match the shortest one.

fn try_different_lifetimes<'a>(a: &'a Int32Chunked, b: &'static Int32Chunked) -> Int32Chunked {
    a + b
}
fn try_different_lifetimes2<'a, 'b>(a: &'a Int32Chunked, b: &'b Int32Chunked) -> Int32Chunked {
    a + b
}

fn try_different_lifetimes3<'a>(a: &'a Int32Chunked) -> Int32Chunked {
    // this lifteme is always shorter than 'a as it is local
    let ca = &Int32Chunked::full_null("", 1);
    ca + a
}

ritchie46 · 2023-03-14T12:46:38Z

There was some check for ASSIGN env var when doing owned vs owned ops; I think it was temporary as it was only defined for a few ops and not all of them, and there were no other references in the codebase so it has been removed.

I think this was a remnant of earlier testing. This can be removed indeed. 👍

ritchie46 · 2023-03-20T08:46:59Z

Any updates @aldanor? I think there is some great stuff in here.

aldanor · 2023-03-23T02:45:42Z

@ritchie46 apologies, was busy with work, will get back re: this PR tomorrow to address the comments and rebase

aldanor · 2023-06-01T17:58:48Z

Ok. My fault for not taking enough effort to get this merged :/ (although it was 99% there I thi) Now that #9123 has been merged, I'm not entirely sure it's rebase-able...

However, if there's still interest in pushing it through (I think it does clean up the whole array-arithmetics mess by quite a bit), I can try a rebase locally to see if it's possible at all.

ritchie46 · 2023-06-01T18:14:18Z

I'm not entirely sure it's rebase-able...

Me neither. :/ I recall cleaning up a bit on the arithmetics as well.

svaningelgem · 2023-10-11T17:43:48Z

@stinodego > As you requested to be pinged.
This goes a bit above my head, but a cleanup is always nice. Though as @ritchie46 said: it's probably not rebase-able.
@aldanor : what do you think? When you look over the codebase, is there still as much klunkiness in there as in March (ie: does it still warrant the refactoring?)

aldanor · 2023-10-11T19:41:50Z

@svaningelgem

I think it's pretty sad it wasn't merged in back in the day - because it did clean up tons of inconsistencies and weird points in the codebase (also simplifying NumDispatch etc, unifying fallible ops, unifying downcasting, converting some unwraps to proper errors etc). This was a preliminary step to improving arithmetics on decimals, so there's that.

But - I don't think any of the above has been fully resolved (maybe a few things here and there), perhaps it would make sense to break this apart into more sizeable PRs and resubmit them, idk. Rebasing would be pretty tough indeed due to some refactoring that happened inbetween...

(I'll double check a few things in the coming days, re: the current state of the codebase)

stinodego · 2024-01-15T10:38:12Z

I don't think there's much use in keeping this PR open any longer. I am not sure what the current state of this part of the code is, but if you feel like (some of) the improvements made here would still be applicable, you're very welcome to open a new PR! Indeed it might help to split it up in smaller parts if at all possible.

github-actions bot added refactor Code improvements that do not change functionality rust Related to Rust Polars labels Mar 14, 2023

aldanor force-pushed the feature/arithmetics-refactor branch from c854f06 to ca2bf80 Compare March 14, 2023 02:20

ritchie46 reviewed Mar 14, 2023

View reviewed changes

aldanor added 14 commits March 27, 2023 11:43

Add arithmetics module with Try* traits

fbfc1c0

Add ChunkedArray::with_name() helper

ad3e4f7

polars-core: POLARSTYPE -> PolarsType

f3b14ab

polars-error: macros use qualified $crate syntax

3967566

polars-core: add PolarsFloatNative type

2a4a6a7

Refactor ChunkedArray arithmetic ops

14d38c3

Simplify NumOpsDispatch trait

cc80b86

polars-core: remove NumDispatch::add_to()

93d7d61

polars-core: remove NumOpsDispatch altogether

32a05a7

polars-core: chunked_array/arithmetic->arithmetics

e4b4a6d

polars-core: remove copy/paste in c/a downcasting

d1a1d37

polars-core: clean up series<-->series arithmetics

cd19bd3

Series arithmetics traits cleanup

8e01e4e

(a few build warning fixes)

c548a05

aldanor force-pushed the feature/arithmetics-refactor branch from c9742a7 to c548a05 Compare March 27, 2023 11:02

aldanor mentioned this pull request Mar 27, 2023

test_streaming_join, test_reproducible_hash_with_seeds fail on M1 #7812

Open

stinodego requested review from stinodego, alexander-beedie and MarcoGorelli as code owners November 12, 2023 04:36

stinodego closed this Jan 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(rust): Arithmetics refactor and cleanup (part 1) #7537

refactor(rust): Arithmetics refactor and cleanup (part 1) #7537

aldanor commented Mar 14, 2023 •

edited

Loading

ritchie46 left a comment

ritchie46 Mar 14, 2023

aldanor Mar 14, 2023 •

edited

Loading

ritchie46 Mar 14, 2023

aldanor Mar 14, 2023

ritchie46 Mar 14, 2023

ritchie46 Mar 14, 2023

aldanor commented Mar 14, 2023

ritchie46 commented Mar 14, 2023

ritchie46 commented Mar 14, 2023 •

edited

Loading

ritchie46 commented Mar 14, 2023

ritchie46 commented Mar 20, 2023

aldanor commented Mar 23, 2023

aldanor commented Jun 1, 2023

ritchie46 commented Jun 1, 2023

svaningelgem commented Oct 11, 2023

aldanor commented Oct 11, 2023 •

edited

Loading

stinodego commented Jan 15, 2024

refactor(rust): Arithmetics refactor and cleanup (part 1) #7537

refactor(rust): Arithmetics refactor and cleanup (part 1) #7537

Conversation

aldanor commented Mar 14, 2023 • edited Loading

ritchie46 left a comment

Choose a reason for hiding this comment

ritchie46 Mar 14, 2023

Choose a reason for hiding this comment

aldanor Mar 14, 2023 • edited Loading

Choose a reason for hiding this comment

ritchie46 Mar 14, 2023

Choose a reason for hiding this comment

aldanor Mar 14, 2023

Choose a reason for hiding this comment

ritchie46 Mar 14, 2023

Choose a reason for hiding this comment

ritchie46 Mar 14, 2023

Choose a reason for hiding this comment

aldanor commented Mar 14, 2023

ritchie46 commented Mar 14, 2023

ritchie46 commented Mar 14, 2023 • edited Loading

ritchie46 commented Mar 14, 2023

ritchie46 commented Mar 20, 2023

aldanor commented Mar 23, 2023

aldanor commented Jun 1, 2023

ritchie46 commented Jun 1, 2023

svaningelgem commented Oct 11, 2023

aldanor commented Oct 11, 2023 • edited Loading

stinodego commented Jan 15, 2024

aldanor commented Mar 14, 2023 •

edited

Loading

aldanor Mar 14, 2023 •

edited

Loading

ritchie46 commented Mar 14, 2023 •

edited

Loading

aldanor commented Oct 11, 2023 •

edited

Loading