-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Spark-compatible cast between decimals with different precision and scale #375
Comments
This looks simple to fix. We currently throw an exception if cannot convert the byes to decimal, but looks like Spark returns null. |
@andygrove Do you mind letting me try it? In my spare time, I have been fixing the compatibility issues of calcite in spark sql type conversion. |
@caicancai Thanks. Please go ahead to create a PR for the open tickets which no one claims working on it. |
I am working on it. |
I was trying to take a look at this one - I added this test in the CometCastSuite - test("cast between decimals with different precision and scale") {
val df = generateDecimalsPrecision38Scale18()
val df1 = df.withColumn("b", col("a").cast(DataTypes.createDecimalType(10, 2)))
df1.show(false)
castTest(generateDecimalsPrecision38Scale18(), DataTypes.createDecimalType(10, 2))
} It gives me result like this +----------------------------------------+----------+
|a |b |
+----------------------------------------+----------+
|-99999999999999999999.999999999999000000|null |
|-9223372036854775808.234567000000000000 |null |
|-9223372036854775807.123123000000000000 |null |
|-2147483648.123123123000000000 |null |
|-2147483647.123123123000000000 |null |
|-123456.789000000000000000 |-123456.79|
|0E-18 |0.00 |
|123456.789000000000000000 |123456.79 |
|2147483647.123123123000000000 |null |
|2147483648.123123123000000000 |null |
|9223372036854775807.123123000000000000 |null |
|9223372036854775808.234567000000000000 |null |
|99999999999999999999.999999999999000000 |null |
|null |null |
+----------------------------------------+----------+ But Expected only Comet native operators, but found Sort.
plan: Sort [a#30 ASC NULLS FIRST], true, 0
+- Project [a#30, cast(a#30 as decimal(10,2)) AS a#32]
+- CometCoalesce Coalesce 1, [a#30], 1
+- CometScan parquet [a#30] Batched: true, DataFilters: [], Format: CometParquet, Location: InMemoryFileIndex(1 paths)[file:/private/var/folders/jx/23vwhfzn2ts493_2twyz1dpc0000gn/T/spark-5f..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<a:decimal(38,18)>
Also this code inside the test(....) produces following plan spark.sql(s"select a, cast(a as decimal(10,2)) from t2 order by a").explain()
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=false
+- CometSort [a#4, a#9], [a#4 ASC NULLS FIRST]
+- CometColumnarExchange rangepartitioning(a#4 ASC NULLS FIRST, 10), ENSURE_REQUIREMENTS, CometColumnarShuffle, [plan_id=13]
+- LocalTableScan [a#4, a#9] Update : Figured out that, removing val data = roundtripParquet(input, dir)\\.coalesce(1) @viirya @andygrove can you provide guidance here on how to proceed ? |
Thanks for looking into this @himadripal. The issue is that the projection is falling back to Spark because |
You may want to set |
thank you @andygrove for the guidance and tip. I'll explore |
this arrow pr will fix this issue completely. waiting for arrow release and then Datafusion release later. |
What is the problem the feature request solves?
Comet is not consistent with Spark when casting between decimals. Here is a test to demonstrate this.
Spark Result
Comet Result
Describe the potential solution
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: