Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Implement Spark-compatible CAST float/double to string #346

Merged
merged 12 commits into from
May 3, 2024

Conversation

mattharder91
Copy link
Contributor

@mattharder91 mattharder91 commented Apr 29, 2024

Which issue does this PR close?

Closes #312

Rationale for this change

Improve compatibility with Apache Spark

What changes are included in this PR?

  • Add custom implementation of CAST from float to string rather than delegate to DataFusion
  • Adapted test function to opt out the ansi mode exception testing as there is never a casting exception thrown between rusts(float,double) to string

How are these changes tested?

  • added unit tests for float
  • added unit tests for double

Further Considerations

A potential problem could be there for denormalized precision values.

Scalas Double.MinPositiveValue is 4.9E-324 the denormalized value and parses correctly to 4.9E-324 while rust outputs 5E-324.

@viirya viirya changed the title Issue 326/cast float string feat: Implement Spark-compatible CAST from String to Floating Point Apr 29, 2024
@viirya viirya changed the title feat: Implement Spark-compatible CAST from String to Floating Point feat: Implement Spark-compatible CAST float/double to string Apr 29, 2024
0.0,
-0.0)
).toDF("a")
castTest(testValues, DataTypes.StringType, testAnsiModeThrows = false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#351 will add some data generation function with random number. May need to rebase on it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will take a look

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are still some inputs that produce different results, we can document that in the compatibility guide and skip fuzz testing (or have some reduced form of fuzz testing). We could also consider adding a config so that the user can opt in or out of enabling this particular cast, depending on the severity of any incompatibilities.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also see #362 which improves how we handle casts that are not compatible with Spark

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rebased #351 will checkout #362

@mattharder91 mattharder91 force-pushed the ISSUE-326/cast-float-string branch from 9f2dc41 to f9e46c1 Compare May 3, 2024 09:38
Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thans @mattharder91. I left some feedback. I think this is nearly ready to approve.

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @mattharder91

@@ -329,9 +329,22 @@ class CometCastSuite extends CometTestBase with AdaptiveSparkPlanHelper {
castTest(generateFloats(), DataTypes.createDecimalType(10, 2))
}

ignore("cast FloatType to StringType") {
test("cast FloatType to StringType") {
// https://github.com/apache/datafusion-comet/issues/312
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should remove the references to the issue now that we are resolving the issue

Suggested change
// https://github.com/apache/datafusion-comet/issues/312

@@ -374,9 +387,18 @@ class CometCastSuite extends CometTestBase with AdaptiveSparkPlanHelper {
castTest(generateDoubles(), DataTypes.createDecimalType(10, 2))
}

ignore("cast DoubleType to StringType") {
test("cast DoubleType to StringType") {
// https://github.com/apache/datafusion-comet/issues/312
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// https://github.com/apache/datafusion-comet/issues/312

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove these comments in a future PR

@andygrove andygrove merged commit 5fc6327 into apache:main May 3, 2024
28 checks passed
himadripal pushed a commit to himadripal/datafusion-comet that referenced this pull request Sep 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement Spark-compatible CAST float/double to string
4 participants