-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-47681][FOLLOWUP] Fix schema_of_variant for float inputs. #47058
Conversation
@cloud-fan could you help review? Thanks! |
@@ -372,6 +372,11 @@ object JsonInferSchema { | |||
case (DoubleType, _: DecimalType) | (_: DecimalType, DoubleType) => | |||
DoubleType | |||
|
|||
// This branch is only used by `SchemaOfVariant.mergeSchema` because `JsonInferSchema` never | |||
// produces `FloatType`. | |||
case (FloatType, _: DecimalType) | (_: DecimalType, FloatType) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a standard for this? These two are not compatible, because float is approximate but decimal is not.
Anyway, I'm fine with this as we already did it for double type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not aware of an existing standard, but I think we can follow the type resolution rules in the add operator:
Based on this result, I think it may be better to change the result of float x decimal
into double.
Changing decimal into float/double may indeed lose precision, but I think it is a reasonable approach in the shcema inference.
@cloud-fan could you help merge it? Thanks! |
thanks, merging to master! |
### What changes were proposed in this pull request? The current `schema_of_variant` depends on `JsonInferSchema.compatibleType` to find the common type of two types. This function doesn't handle the case of `float x decimal` or `decimal x float` correctly. It doesn't produce the expected result, but consider the two types as incompatible and produces `variant`. This change doesn't affect the JSON schema inference because it never produces `float` beforehand. ### Why are the changes needed? It is a bug fix and is required to process floats correctly. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? A new unit test that checks all type combinations. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47058 from chenhao-db/fix_schema_of_variant_float. Authored-by: Chenhao Li <chenhao.li@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
The current
schema_of_variant
depends onJsonInferSchema.compatibleType
to find the common type of two types. This function doesn't handle the case offloat x decimal
ordecimal x float
correctly. It doesn't produce the expected result, but consider the two types as incompatible and producesvariant
. This change doesn't affect the JSON schema inference because it never producesfloat
beforehand.Why are the changes needed?
It is a bug fix and is required to process floats correctly.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
A new unit test that checks all type combinations.
Was this patch authored or co-authored using generative AI tooling?
No.