-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-48414][PYTHON] Fix breaking change in python's fromJson
#46737
Conversation
fieldPath: str, | ||
collationsMap: Optional[Dict[str, str]], | ||
fieldPath: str = "", | ||
collationsMap: Optional[Dict[str, str]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extensive testing was added in #46280, this change is just so that the API for the fromJson
method doesn't change which we missed in the previous PR
python/pyspark/sql/types.py
Outdated
fieldPath: str, | ||
collationsMap: Optional[Dict[str, str]], | ||
fieldPath: str = "", | ||
collationsMap: Optional[Dict[str, str]] = None, | ||
) -> "ArrayType": | ||
elementType = _parse_datatype_json_value( | ||
json["elementType"], fieldPath + ".element", collationsMap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like it's .element
when empty strings are given. Are they expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if you are deserializing a schema where an array is on the column a
the collationsMap
would be {"a.element": collation}
. So if you just want to deserialize a map that is not a part of the larger schema the collationsMap
should be {".element": collation}
Added tests to verify this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like it has to be {"element": collation}
and {"key": collation}
. .element
sounds a bit awkward. While I am fine because this JSON format is supposed to be internal-only, is this backward compatible? E.g., can old JSON data types be read? I have seen many tickets related to this, e.g., #43474
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the second thought I agree with you, it definitely makes more sense to special case this so we don't have this weird dangling dot.
This should be completely backwards compatible; if we don't care about collations and don't pass fieldPath
and collationsMap
everything will be the same as before.
Merged to master. |
…scala ### What changes were proposed in this pull request? When deserializing map/array that is not part of the struct field, the key in collation map should just be `{"element": collation}` instead of `{".element": collation}`. ### Why are the changes needed? To be consistent with the behavior on the pyspark side (#46737). ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47497 from stefankandic/complexTypeDeSer. Authored-by: Stefan Kandic <stefan.kandic@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
### What changes were proposed in this pull request? Fix breaking change in `fromJson` method by having default param values. ### Why are the changes needed? In order to not break clients that don't care about collations. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing UTs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#46737 from stefankandic/fromJsonBreakingChange. Authored-by: Stefan Kandic <stefan.kandic@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
…scala ### What changes were proposed in this pull request? When deserializing map/array that is not part of the struct field, the key in collation map should just be `{"element": collation}` instead of `{".element": collation}`. ### Why are the changes needed? To be consistent with the behavior on the pyspark side (apache#46737). ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47497 from stefankandic/complexTypeDeSer. Authored-by: Stefan Kandic <stefan.kandic@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
### What changes were proposed in this pull request? Fix breaking change in `fromJson` method by having default param values. ### Why are the changes needed? In order to not break clients that don't care about collations. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing UTs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#46737 from stefankandic/fromJsonBreakingChange. Authored-by: Stefan Kandic <stefan.kandic@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
…scala ### What changes were proposed in this pull request? When deserializing map/array that is not part of the struct field, the key in collation map should just be `{"element": collation}` instead of `{".element": collation}`. ### Why are the changes needed? To be consistent with the behavior on the pyspark side (apache#46737). ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47497 from stefankandic/complexTypeDeSer. Authored-by: Stefan Kandic <stefan.kandic@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
### What changes were proposed in this pull request? Fix breaking change in `fromJson` method by having default param values. ### Why are the changes needed? In order to not break clients that don't care about collations. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing UTs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#46737 from stefankandic/fromJsonBreakingChange. Authored-by: Stefan Kandic <stefan.kandic@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
…scala ### What changes were proposed in this pull request? When deserializing map/array that is not part of the struct field, the key in collation map should just be `{"element": collation}` instead of `{".element": collation}`. ### Why are the changes needed? To be consistent with the behavior on the pyspark side (apache#46737). ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47497 from stefankandic/complexTypeDeSer. Authored-by: Stefan Kandic <stefan.kandic@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
What changes were proposed in this pull request?
Fix breaking change in
fromJson
method by having default param values.Why are the changes needed?
In order to not break clients that don't care about collations.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Existing UTs.
Was this patch authored or co-authored using generative AI tooling?
No.