-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-48700][SQL] Mode expression for complex types (all collations) #47154
Conversation
129768e
to
92f284a
Compare
92f284a
to
f469c2a
Compare
@uros-db This is ready for initial review. Let me know what you think. I am working on simplifying the implementation -- suggestions for how to do so are welcome. All suggestions welcome. |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala
Outdated
Show resolved
Hide resolved
0f1ea9d
to
005c411
Compare
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala
Show resolved
Hide resolved
44b78f9
to
9504ba6
Compare
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala
Show resolved
Hide resolved
@MaxGekk so sorry about that! New computer, thus default IntelliJ settings for auto formatting. But it is all fixed now. Do you have some default IDE settings in a file form I could import into mine, for next time? If not I will adjust the settings manually before creating my next spark PR. |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala
Outdated
Show resolved
Hide resolved
…ssionsSuite.scala Co-authored-by: Maxim Gekk <max.gekk@gmail.com>
…ssionsSuite.scala Co-authored-by: Maxim Gekk <max.gekk@gmail.com>
…ssionsSuite.scala Co-authored-by: Maxim Gekk <max.gekk@gmail.com>
…ssionsSuite.scala Co-authored-by: Maxim Gekk <max.gekk@gmail.com>
sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala
Outdated
Show resolved
Hide resolved
CollationFactory.getCollationKey(data.asInstanceOf[UTF8String], st.collationId) | ||
case _ => | ||
throw new SparkUnsupportedOperationException( | ||
"UNSUPPORTED_MODE_DATA_TYPE", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UNSUPPORTED_MODE_DATA_TYPE
is a sub-condition of DATATYPE_MISMATCH
. You cannot refers to it in this way. Could you add a test for this case, please.
BTW, if you cannot reproduce the error via public API, we should consider to convert it to an internal error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MaxGekk For the test, I suppose the situation where a type is not binary stable and is not covered by checkInputDataTypes would include some UDTs? checkInputType just confirms it is not a MapType (a blacklist/blocklist), whereas collationAwareTransform
is an allowlist/whitelist).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MaxGekk @uros-db I am having a lot of trouble with this one! I implemented it as COMPLEX_EXPRESSION_UNSUPPORTED_INPUT.NO_INPUT, and plan to create a different subclass that represents this situation (Maybe BAD_INPUT), once I just have it working. But When I throw the following:
SparkUnsupportedOperationException(
errorClass = "COMPLEX_EXPRESSION_UNSUPPORTED_INPUT.NO_INPUT",
I end up getting:
// org.apache.spark.SparkException: [INTERNAL_ERROR] Cannot
// find sub error class 'COMPLEX_EXPRESSION_UNSUPPORTED_INPUT.NO_INPUT' SQLSTATE: XX000
Yet if I do
SparkUnsupportedOperationException(
errorClass = "COMPLEX_EXPRESSION_UNSUPPORTED_INPUT",
Then org.apache.spark.ErrorClassesJsonReader#getMessageTemplate
fails during the assertion assert(errorInfo.subClass.isDefined == subErrorClass.isDefined)
As the subClass is missing.
Would either of you be able to tell me if this is the right pattern (eg. maybe it should be ComplexExpressionException("BAD_INPUT"), not sure what is the "old" pattern and which is the preferred one).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MaxGekk please re-review
9dd61b7
to
afd123b
Compare
…ly formatted' in SparkThrowableSuite
@MaxGekk please review (new error condition and test is up and running as was requested -- I can change the name and details of the new error condition, now that it and the test are all set up, just let me know if it needs a rename) |
@GideonPotok Thank you for the ping. I will review this PR soon. |
+1, LGTM. Merging to master. |
### What changes were proposed in this pull request? Add support for complex types with subfields that are collated strings, for the mode operator. ### Why are the changes needed? Full support for collations as per SPARK-48700 ### Does this PR introduce _any_ user-facing change? Yes. ### How was this patch tested? Unit tests only, so far. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47154 from GideonPotok/collationmodecomplex. Lead-authored-by: Gideon P <g.potok4@gmail.com> Co-authored-by: Gideon Potok <31429832+GideonPotok@users.noreply.github.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>
What changes were proposed in this pull request?
Add support for complex types with subfields that are collated strings, for the mode operator.
Why are the changes needed?
Full support for collations as per SPARK-48700
Does this PR introduce any user-facing change?
Yes.
How was this patch tested?
Unit tests only, so far.
Was this patch authored or co-authored using generative AI tooling?
No.