Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cast int to decimal has an unexpected result #5793

Closed
jayzhan211 opened this issue May 22, 2024 · 14 comments
Closed

Cast int to decimal has an unexpected result #5793

jayzhan211 opened this issue May 22, 2024 · 14 comments
Labels

Comments

@jayzhan211
Copy link
Contributor

Describe the bug

To Reproduce

#[test]
    fn test_int_to_decimal() {
        let array = Int64Array::from(vec![5, 6, 7, 8, 9]);
        let decimal_array = cast(&array, &DataType::Decimal128(10, 4)).unwrap();
        println!("{:?}", array);
        println!("{:?}", decimal_array);
        let decimal_array = cast(&array, &DataType::Decimal128(10, -4)).unwrap();
        println!("{:?}", decimal_array);
    }
---- cast::tests::test_int_to_decimal stdout ----
PrimitiveArray<Int64>
[
  5,
  6,
  7,
  8,
  9,
]
PrimitiveArray<Decimal128(10, 4)>
[
  50000,
  60000,
  70000,
  80000,
  90000,
]
PrimitiveArray<Decimal128(10, -4)>
[
  0,
  0,
  0,
  0,
  0,
]

Expected behavior

I think we should get

PrimitiveArray<Decimal128(10, 4)>
[
  5,
  6,
  7,
  8,
  9,
]
PrimitiveArray<Decimal128(10, -4)>
[
  50000,
  60000,
  70000,
  80000,
  90000,
]

Additional context

https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html#variant.Decimal128

@jayzhan211 jayzhan211 added the bug label May 22, 2024
@jayzhan211
Copy link
Contributor Author

I notice there is value_as_string, so I think Decimal128(10, 4) is correct, but I'm not sure if 0 for Decimal128(10, -4) is expected or not, I expect to get 50000.

"5.0000"
"00000"

@tustvold
Copy link
Contributor

tustvold commented May 27, 2024

There is a good chance the conversion does not work correctly for negative scales

@ByteBaker
Copy link
Contributor

Hi, I think I've found the cause. Would like to take this up for a fix.

@jayzhan211
Copy link
Contributor Author

Hi, I think I've found the cause. Would like to take this up for a fix.

Sure

@ByteBaker
Copy link
Contributor

Hi @tustvold @jayzhan211 I've made the changes. Please review them.

@alamb
Copy link
Contributor

alamb commented Sep 18, 2024

I am depressed about the large review backlog in this crate. We are looking for more help from the community reviewing PRs -- see #6418 for more

@wjones127
Copy link
Member

It's not clear to me that the original behavior is bad. It seems like it is casting numerically. That is, 2 is being translated into 2.0000 for decimal(10, 4) and 0 for decimal(10, -4). Numerically, those seem to be the closest values representable in those types.

It seems like what you are suggesting is casting uses the integer as the internal integer representation and then attaches the scale blindly. So 2 would be translated to 0.0002 for decimal(10, 4) and 2000 for decimal(10, -4). That seems like more surprising behavior to me. Am I understanding your suggestion correctly?

Is there a SQL system that sets some precedent for how this work? For example, how does postgres do this?

@ByteBaker
Copy link
Contributor

To be honest there was little reference on its behaviour, including on the internet. And naturally I'd have asked how to go about this, but then I read the documentation which said negative precision is just the padding of zeroes to the right.

@jayzhan211
Copy link
Contributor Author

jayzhan211 commented Sep 19, 2024

The proposal is based on the arrow's document so it might not be correct / expected

https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html#variant.Decimal128

Exact 128-bit width decimal value with precision and scale

precision is the total number of digits
scale is the number of digits past the decimal
For example the number 123.45 has precision 5 and scale 2.

In certain situations, scale could be negative number. For negative scale, it is the number of padding 0 to the right of the digits.

For example the number 12300 could be treated as a decimal has precision 3 and scale -2.

Postgres

postgres=# select 5::decimal(10, -4);
 numeric 
---------
       0
(1 row)

I think we might need to fix the doc

@ByteBaker
Copy link
Contributor

I was just about to post this too. 😅

This also means that this issue can be closed, and the PR will be no longer required.

@ByteBaker
Copy link
Contributor

Postgres documents the behavior as below:

Beginning in PostgreSQL 15, it is allowed to declare a numeric column with a negative scale. Then values will be rounded to the left of the decimal point. The precision still represents the maximum number of non-rounded digits. Thus, a column declared as

NUMERIC(2, -3)
will round values to the nearest thousand and can store values between -99000 and 99000, inclusive

Check out the docs.

@alamb
Copy link
Contributor

alamb commented Sep 19, 2024

@jayzhan211 if you concur let's close the ticket

@alamb
Copy link
Contributor

alamb commented Sep 19, 2024

Thank you for the research @ByteBaker and @wjones127

@jayzhan211
Copy link
Contributor Author

Sure, we just need to fix the document to be consistent with the postgres's one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants