Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect float/double column comparison. #10686

Closed
abhioncbr opened this issue Apr 25, 2023 · 14 comments
Closed

Incorrect float/double column comparison. #10686

abhioncbr opened this issue Apr 25, 2023 · 14 comments
Assignees
Labels

Comments

@abhioncbr
Copy link
Contributor

We are ignoring a couple of numeric comparison queries between float and double based columns, for example

SELECT i1 != i2, i2 != i1 FROM {tbl}

It's failing because of the cast function from double to float, hence losing precision and resulting incorrect output. The Apache calcite adds the cast transformation function during query parsing. This issue tracks the implementation for correctly handling float/double comparison.

@abhioncbr
Copy link
Contributor Author

I am planning to work on this. Thanks

@Jackie-Jiang
Copy link
Contributor

We should discuss if we want to do exact comparison between floating point numbers or fuzzy comparison. I'd suggest doing some research to see what is the behavior of the commonly used DBs.
cc @richardstartin I remember you have some suggestions to this topic
cc @xiangfu0 @siddharthteotia @yupeng9 @walterddr @61yao @atris for some input

@Jackie-Jiang
Copy link
Contributor

Also related to #10637 cc @elonazoulay

@abhioncbr
Copy link
Contributor Author

I did a bit of analysis while going over the problem itself.

  1. H2 database follows the exact comparison between double and float columns. Here are the results of queries in the H2 database
$ CREATE TABLE float_x_double_comp_tbl (i1 double,i2 real);
$ INSERT INTO float_x_double_comp_tbl VALUES(1.1,1.1);
$ INSERT INTO float_x_double_comp_tbl VALUES(1.5,1.5);

$ SELECT i1, i2, i1!=i2, i2!=i1 FROM FLOAT_X_DOUBLE_COMP_TBL


I1  | I2  | I1 <> I2 | I2 <> I1
----|-----|----------|---------
1.1 | 1.1 | TRUE     | TRUE
1.5 | 1.5 | FALSE    | FALSE

Actually, initially I thought it's a bug in H2 also and created an issue for this. Got nice explanation by the developer.

  1. I have also checked the Postgress, and the results are the same as of the H2 database.

Screen Shot 2023-04-26 at 7 33 02 PM

@Jackie-Jiang
Copy link
Contributor

So seems all the DBs are doing exact match on = and !=.

Do you see the cast from double to float getting added during the query parsing? Why would it cast value from double to float, but not hoisting up to double?

@abhioncbr
Copy link
Contributor Author

Yes. Apache Calcite supports implicit type conversion between float and double and hence depends on the left operand data type; the right operand is getting cast.

So if the LHS is float and RHS is double; the cast is happening from double -> float, hence losing precision and resulting in the wrong result.
Screen Shot 2023-04-26 at 8 20 24 PM

My initial solution approach is to remove/skip this casting function in case of comparison of float/double. What do you think about it?

@Jackie-Jiang
Copy link
Contributor

Losing precision might not be the root cause though. Even if we do float to double casting, the number won't match.

System.out.println((double) 1.1f); --> 1.100000023841858

Which query is giving different result comparing to H2?

@abhioncbr
Copy link
Contributor Author

Losing precision might not be the root cause though. Even if we do float to double casting, the number won't match.

System.out.println((double) 1.1f); --> 1.100000023841858

Which query is giving different result comparing to H2?

Well, you are following the wrong order of casting. The problematic comparison involves casting from double -> float.

System.out.println((double) 1.1f); --> 1.100000023841858 (correct)
System.out.println((float) 1.1d);  --> 1.1 (incorrect)

// incorrect float/double comparison because of casting
float f = 1.1f;
double d = 1.1d;
System.out.println(Float.compare(f, (float)d)); --> 0 (Such comparison in Pinot resulting incorrect output)

We are ignoring the following queries in our test cases

@Jackie-Jiang
Copy link
Contributor

I see, so we actually expect it to cast from float to double and since we are converting from low precision to high precision it will return not equals.
I feel the difference is from the behavior of always casting the RHS value vs hoisting up to the same type. Can you also try the comparison for other data types (int, long, float, double, big_decimal) and see what is the expected behavior from H2 and Postgres?

@abhioncbr
Copy link
Contributor Author

abhioncbr commented May 4, 2023

Comparison results on H2 DB

$ CREATE TABLE Numeric_tbl (int_col INTEGER, long_col BIGINT, double_col Double, float_col REAL, bigDecimal_col DECFLOAT);
$ INSERT INTO Numeric_tbl VALUES(1,1,1.1,1.1,1.1);
$ INSERT INTO Numeric_tbl VALUES(1,1,1,1,1);

$ select int_col = long_col, long_col =int_col from Numeric_tbl;
INT_COL = LONG_COL | LONG_COL = INT_COL
------------------ | -------------------
TRUE               | TRUE
TRUE               | TRUE

$ select int_col = bigDecimal_col, bigDecimal_col = int_col,long_col = bigDecimal_col, bigDecimal_col = long_col from Numeric_tbl;
INT_COL = BIGDECIMAL_COL | BIGDECIMAL_COL = INT_COL | LONG_COL = BIGDECIMAL_COL | BIGDECIMAL_COL = LONG_COL
------------------------ | ------------------------ | ------------------------- | ------------------------ 
FALSE                    | FALSE                    | FALSE                     | FALSE
TRUE                     | TRUE                     | TRUE                      | TRUE

$ select float_col = double_col, double_col = float_col from Numeric_tbl;
FLOAT_COL = DOUBLE_COL | DOUBLE_COL = FLOAT_COL
---------------------- | ------------------------
FALSE                  | FALSE
TRUE                   | TRUE

$ select float_col =  bigDecimal_col, bigDecimal_col = float_col,  double_col =  bigDecimal_col, bigDecimal_col = double_col from Numeric_tbl;
FLOAT_COL = BIGDECIMAL_COL | BIGDECIMAL_COL = FLOAT_COL | DOUBLE_COL = BIGDECIMAL_COL | BIGDECIMAL_COL = DOUBLE_COL
-------------------------- | -------------------------- | --------------------------- | ---------------------------
TRUE                       | TRUE                       | TRUE                        | TRUE
TRUE                       | TRUE                       | TRUE                        | TRUE

On Postgres
Screen Shot 2023-05-03 at 11 04 23 PM

@Jackie-Jiang
Copy link
Contributor

Thanks for taking time testing these comparisons, very interesting results! So Postgres and H2 have different behavior on float - big-decimal conversion, but double - big-decimal conversion is lossless in both DBs.
For the float - double conversion issue, I feel we should always hoist up to the higher precision type, which aligns with the H2 and Postgres behavior.

@abhioncbr
Copy link
Contributor Author

Thanks for taking time testing these comparisons, very interesting results! So Postgres and H2 have different behavior on float - big-decimal conversion, but double - big-decimal conversion is lossless in both DBs. For the float - double conversion issue, I feel we should always hoist up to the higher precision type, which aligns with the H2 and Postgres behavior.

+1; I'll try to implement this. Thanks

@abhioncbr
Copy link
Contributor Author

We can close this one. Thanks

@walterddr
Copy link
Contributor

We can close this one. Thanks

FYI if you include this fixes #xxxx in your PR description it will automatically close the issue when PR merges

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants