Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Enable floating point by default #5323

Closed
sameerz opened this issue Apr 26, 2022 · 2 comments · Fixed by #5441
Closed

[FEA] Enable floating point by default #5323

sameerz opened this issue Apr 26, 2022 · 2 comments · Fixed by #5441
Assignees
Labels
feature request New feature or request

Comments

@sameerz
Copy link
Collaborator

sameerz commented Apr 26, 2022

Is your feature request related to a problem? Please describe.
Today the plugin is conservative in asking users to enable floating point calculations by default. There will always be some differences in floating point calculations between CPU and GPU. We should enable floating point by default, by setting these configurations to true:

spark.rapids.sql.castDecimalToFloat.enabled
spark.rapids.sql.castFloatToIntegralTypes.enabled
spark.rapids.sql.castFloatToString.enabled
spark.rapids.sql.castStringToFloat.enabled

spark.rapids.sql.csv.read.float.enabled
spark.rapids.sql.json.read.float.enabled

spark.rapids.sql.improvedFloatOps.enabled

spark.rapids.sql.variableFloatAgg.enabled

Describe the solution you'd like
Enable floating point by default. Allow users to disable float if they need to fall back to the CPU behavior. Add a warning in the driver log to let users know when floating point operations are used and that there may be differences between CPU and GPU calculations of floats, and point them to the documentation related to floats.

We should also turn on spark.rapids.sql.incompatibleOps.enabled as part of this, and log warnings about incompatibleOps being on by default.

If a user has explicitly set a config to enabled then we should disable the warning.

Documentation will need to be updated as well, and this should be flagged in the release notes.

Describe alternatives you've considered
We could fix as many of the differences between floating point calculations between the CPU and GPU. That will take quite a bit of time and may not get us all the way there.

Additional context
We've had floating point off by default because of concerns of users trying to do things with floating point aggregations like joins on floats or unions on floats. However without floating point operations on, there is a performance hit as many operations will fall back to the CPU.

@sameerz sameerz added feature request New feature or request ? - Needs Triage Need team to review and classify and removed ? - Needs Triage Need team to review and classify labels Apr 26, 2022
@sameerz sameerz added this to the May 2 - May 20 milestone Apr 29, 2022
@razajafri
Copy link
Collaborator

@sameerz what about spark.rapids.sql.castFloatToDecimal.enabled

@sameerz
Copy link
Collaborator Author

sameerz commented May 9, 2022

spark.rapids.sql.castFloatToDecimal.enabled

Thanks for the catch, we should default this to true as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants