Incorrect float/double column comparison. #10686

abhioncbr · 2023-04-25T13:37:06Z

We are ignoring a couple of numeric comparison queries between float and double based columns, for example

SELECT i1 != i2, i2 != i1 FROM {tbl}

It's failing because of the cast function from double to float, hence losing precision and resulting incorrect output. The Apache calcite adds the cast transformation function during query parsing. This issue tracks the implementation for correctly handling float/double comparison.

The text was updated successfully, but these errors were encountered:

abhioncbr · 2023-04-25T13:37:48Z

I am planning to work on this. Thanks

Jackie-Jiang · 2023-04-26T22:23:09Z

We should discuss if we want to do exact comparison between floating point numbers or fuzzy comparison. I'd suggest doing some research to see what is the behavior of the commonly used DBs.
cc @richardstartin I remember you have some suggestions to this topic
cc @xiangfu0 @siddharthteotia @yupeng9 @walterddr @61yao @atris for some input

Jackie-Jiang · 2023-04-26T22:24:49Z

Also related to #10637 cc @elonazoulay

abhioncbr · 2023-04-26T23:37:38Z

I did a bit of analysis while going over the problem itself.

H2 database follows the exact comparison between double and float columns. Here are the results of queries in the H2 database

$ CREATE TABLE float_x_double_comp_tbl (i1 double,i2 real);
$ INSERT INTO float_x_double_comp_tbl VALUES(1.1,1.1);
$ INSERT INTO float_x_double_comp_tbl VALUES(1.5,1.5);

$ SELECT i1, i2, i1!=i2, i2!=i1 FROM FLOAT_X_DOUBLE_COMP_TBL


I1  | I2  | I1 <> I2 | I2 <> I1
----|-----|----------|---------
1.1 | 1.1 | TRUE     | TRUE
1.5 | 1.5 | FALSE    | FALSE

Actually, initially I thought it's a bug in H2 also and created an issue for this. Got nice explanation by the developer.

I have also checked the Postgress, and the results are the same as of the H2 database.

Jackie-Jiang · 2023-04-26T23:54:13Z

So seems all the DBs are doing exact match on = and !=.

Do you see the cast from double to float getting added during the query parsing? Why would it cast value from double to float, but not hoisting up to double?

abhioncbr · 2023-04-27T00:22:54Z

Yes. Apache Calcite supports implicit type conversion between float and double and hence depends on the left operand data type; the right operand is getting cast.

So if the LHS is float and RHS is double; the cast is happening from double -> float, hence losing precision and resulting in the wrong result.

My initial solution approach is to remove/skip this casting function in case of comparison of float/double. What do you think about it?

Jackie-Jiang · 2023-04-27T18:07:19Z

Losing precision might not be the root cause though. Even if we do float to double casting, the number won't match.

System.out.println((double) 1.1f); --> 1.100000023841858

Which query is giving different result comparing to H2?

abhioncbr · 2023-04-27T19:20:20Z

Losing precision might not be the root cause though. Even if we do float to double casting, the number won't match.
System.out.println((double) 1.1f); --> 1.100000023841858
Which query is giving different result comparing to H2?

Well, you are following the wrong order of casting. The problematic comparison involves casting from double -> float.

System.out.println((double) 1.1f); --> 1.100000023841858 (correct)
System.out.println((float) 1.1d);  --> 1.1 (incorrect)

// incorrect float/double comparison because of casting
float f = 1.1f;
double d = 1.1d;
System.out.println(Float.compare(f, (float)d)); --> 0 (Such comparison in Pinot resulting incorrect output)

We are ignoring the following queries in our test cases

Jackie-Jiang · 2023-04-27T22:11:24Z

I see, so we actually expect it to cast from float to double and since we are converting from low precision to high precision it will return not equals.
I feel the difference is from the behavior of always casting the RHS value vs hoisting up to the same type. Can you also try the comparison for other data types (int, long, float, double, big_decimal) and see what is the expected behavior from H2 and Postgres?

abhioncbr · 2023-05-04T02:57:30Z

Comparison results on H2 DB

$ CREATE TABLE Numeric_tbl (int_col INTEGER, long_col BIGINT, double_col Double, float_col REAL, bigDecimal_col DECFLOAT);
$ INSERT INTO Numeric_tbl VALUES(1,1,1.1,1.1,1.1);
$ INSERT INTO Numeric_tbl VALUES(1,1,1,1,1);

$ select int_col = long_col, long_col =int_col from Numeric_tbl;
INT_COL = LONG_COL | LONG_COL = INT_COL
------------------ | -------------------
TRUE               | TRUE
TRUE               | TRUE

$ select int_col = bigDecimal_col, bigDecimal_col = int_col,long_col = bigDecimal_col, bigDecimal_col = long_col from Numeric_tbl;
INT_COL = BIGDECIMAL_COL | BIGDECIMAL_COL = INT_COL | LONG_COL = BIGDECIMAL_COL | BIGDECIMAL_COL = LONG_COL
------------------------ | ------------------------ | ------------------------- | ------------------------ 
FALSE                    | FALSE                    | FALSE                     | FALSE
TRUE                     | TRUE                     | TRUE                      | TRUE

$ select float_col = double_col, double_col = float_col from Numeric_tbl;
FLOAT_COL = DOUBLE_COL | DOUBLE_COL = FLOAT_COL
---------------------- | ------------------------
FALSE                  | FALSE
TRUE                   | TRUE

$ select float_col =  bigDecimal_col, bigDecimal_col = float_col,  double_col =  bigDecimal_col, bigDecimal_col = double_col from Numeric_tbl;
FLOAT_COL = BIGDECIMAL_COL | BIGDECIMAL_COL = FLOAT_COL | DOUBLE_COL = BIGDECIMAL_COL | BIGDECIMAL_COL = DOUBLE_COL
-------------------------- | -------------------------- | --------------------------- | ---------------------------
TRUE                       | TRUE                       | TRUE                        | TRUE
TRUE                       | TRUE                       | TRUE                        | TRUE

On Postgres

Jackie-Jiang · 2023-05-23T22:54:47Z

Thanks for taking time testing these comparisons, very interesting results! So Postgres and H2 have different behavior on float - big-decimal conversion, but double - big-decimal conversion is lossless in both DBs.
For the float - double conversion issue, I feel we should always hoist up to the higher precision type, which aligns with the H2 and Postgres behavior.

abhioncbr · 2023-05-25T01:47:03Z

Thanks for taking time testing these comparisons, very interesting results! So Postgres and H2 have different behavior on float - big-decimal conversion, but double - big-decimal conversion is lossless in both DBs. For the float - double conversion issue, I feel we should always hoist up to the higher precision type, which aligns with the H2 and Postgres behavior.

+1; I'll try to implement this. Thanks

abhioncbr · 2023-06-13T19:13:28Z

We can close this one. Thanks

walterddr · 2023-06-13T23:29:08Z

We can close this one. Thanks

FYI if you include this fixes #xxxx in your PR description it will automatically close the issue when PR merges

Jackie-Jiang assigned abhioncbr Apr 26, 2023

Jackie-Jiang added the bug label Apr 26, 2023

abhioncbr mentioned this issue May 29, 2023

[multistage][float-double-comparison]: updated for CAST function addition based on correct precision. #10814

Merged

abhioncbr closed this as completed Jun 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect float/double column comparison. #10686

Incorrect float/double column comparison. #10686

abhioncbr commented Apr 25, 2023

abhioncbr commented Apr 25, 2023

Jackie-Jiang commented Apr 26, 2023

Jackie-Jiang commented Apr 26, 2023

abhioncbr commented Apr 26, 2023

Jackie-Jiang commented Apr 26, 2023

abhioncbr commented Apr 27, 2023

Jackie-Jiang commented Apr 27, 2023

abhioncbr commented Apr 27, 2023

Jackie-Jiang commented Apr 27, 2023

abhioncbr commented May 4, 2023 •

edited

Loading

Jackie-Jiang commented May 23, 2023

abhioncbr commented May 25, 2023

abhioncbr commented Jun 13, 2023

walterddr commented Jun 13, 2023

Incorrect float/double column comparison. #10686

Incorrect float/double column comparison. #10686

Comments

abhioncbr commented Apr 25, 2023

abhioncbr commented Apr 25, 2023

Jackie-Jiang commented Apr 26, 2023

Jackie-Jiang commented Apr 26, 2023

abhioncbr commented Apr 26, 2023

Jackie-Jiang commented Apr 26, 2023

abhioncbr commented Apr 27, 2023

Jackie-Jiang commented Apr 27, 2023

abhioncbr commented Apr 27, 2023

Jackie-Jiang commented Apr 27, 2023

abhioncbr commented May 4, 2023 • edited Loading

Jackie-Jiang commented May 23, 2023

abhioncbr commented May 25, 2023

abhioncbr commented Jun 13, 2023

walterddr commented Jun 13, 2023

abhioncbr commented May 4, 2023 •

edited

Loading