Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(rust, python): Decimal arithmetic #9123

Merged
merged 3 commits into from
May 31, 2023
Merged

feat(rust, python): Decimal arithmetic #9123

merged 3 commits into from
May 31, 2023

Conversation

ritchie46
Copy link
Member

@ritchie46 ritchie46 commented May 30, 2023

Still to be decided if scale should adapt or not. @aldanor, do you have any ideas on this?

@ritchie46 ritchie46 changed the title WIP: decimal arithmetic feat(rust, python): Decimal arithmetic May 31, 2023
@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels May 31, 2023
@ritchie46 ritchie46 merged commit 98adaf2 into main May 31, 2023
@ritchie46 ritchie46 deleted the decimal branch May 31, 2023 07:41
@aldanor
Copy link
Contributor

aldanor commented Jun 1, 2023

Sorry I've fallen out for a while, completely loaded with daytime work right :/ (But I'll be happy to participate in discussions at least.)

I think the biggest quirk is this:

  • From the perspective of ops (not storage), decimals are absolutely NOT a logical type on top of (i128) integers. More like a physical type.
  • Logical type, especially in terms of arithmetics, implies (at least in polars it usually does) that you can do ops on terms of the underlying physical type (if this op is allowed for the logical type) and then wrap the results back into the logical type and it will work. For decimals, it doesn't work that way at all:
    • For the same scale, only add/sub will work as is. Mul/div won't.
    • For different scales, none of the ops will work as is

So:

  • The only "happy path" is +/- of decimals of the same scale (happy as in - you can use ops on the underlying i128s without alterations)
    • It's not entirely 'happy' in fact, because the scale may get bumped by +1 if it doesn't fit.
    • For decimals of different scales, you can absolutely +/- them, but you have to pick the highest scale for the result
  • For *, the happy path is when sum of scales is less than max_scale (use i128 multiply, add up the scales)
    • The unhappy path for * is when the sum of scales doesn't fit but all values fit - in this case we have to scan all values to find the scale
    • The unhappiest path for * is when it just doesn't fit
  • For /, it's kind of the same as *

Comment on lines +5 to +7
// The division is done using the numbers without scale.
// The dividend is scaled up to maintain precision after the
// division
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The a * scale will overflow very often if dealing with high-precision decimals. Basically what you're doing here implies that using scale greater than MAX_SCALE / 2 is not possible and will lead to overflow errors, I think.

Usually it's done in the 256-bit space (IIRC that's how arrow2 does it too).

// 222.222 --> 222222
// -------- -------
// 24691.308 <-- 24691308642
a * b / scale
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, a * b will overflow too often. This is usually done with i256.

@aldanor
Copy link
Contributor

aldanor commented Jun 1, 2023

Re: precision, see very detailed discussion here – I don't think it should be used in any computations at all, and once you start doing any arithmetic work it can be simply discarded. The only thing that matters is scale really.

Setting precision 'at the edges' should be sufficient (coming from python's decimal, you don't even have precision); so, reading and then immediately writing should preserve it. But if you start doing arithmetic stuff, I think it can be just ditched. You can always do a cast(Decimal(scale, precision)) at the end of the operation chain manually, or before writing, might you want to.

@ritchie46
Copy link
Member Author

From the perspective of ops (not storage), decimals are absolutely NOT a logical type on top of (i128) integers. More like a physical type. Logical type, especially in terms of arithmetics, implies (at least in polars it usually does)

There are more exceptions. The Categorical is a logical on Uint32, but also has logicla rules on various functions.

The only "happy path" is +/- of decimals of the same scale (happy as in - you can use ops on the underlying i128s without alterations)
It's not entirely 'happy' in fact, because the scale may get bumped by +1 if it doesn't fit.
For decimals of different scales, you can absolutely +/- them, but you have to pick the highest scale for the result
For *, the happy path is when sum of scales is less than max_scale (use i128 multiply, add up the scales)
The unhappy path for * is when the sum of scales doesn't fit but all values fit - in this case we have to scan all values to find

Thanks, I will see if I can include that. Though, after a handful of multiplications we are out of sale range. How do people deal with that? Something like this: (a * b * c).cast(Decimal(scale=20) * d

Re: precision, see very detailed discussion #7178 – I don't think it should be used in any computations at all, and once you start doing any arithmetic work it can be simply discarded. The only thing that matters is scale really.

Yes, now that I am looking into decimals, I am wondering if we should drop precision altogether. I don't see any benefit for it yet.

c-peters pushed a commit to c-peters/polars that referenced this pull request Jul 14, 2023
@alexander-beedie alexander-beedie added the A-dtype-decimal Area: decimal data type label Jan 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-dtype-decimal Area: decimal data type enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants