-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-47357][SQL] Add support for Upper, Lower, InitCap (all collations) #46008
Conversation
@@ -89,6 +89,45 @@ class CollationStringExpressionsSuite | |||
testRepeat("UNICODE_CI", 3, "abc", 2) | |||
} | |||
|
|||
test("UPPER check output type on collated string") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's more readable to combine multiple simple checks for expressions like this into one test function like here. In this case we could also combine testUpper
, testLower
, testInitCap
into one function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, agreed. Will refactor a bit.
# Conflicts: # sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala
testLower("UNICODE_CI", 3, "abc") | ||
} | ||
|
||
test("INITCAP check output type on collated string") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@uros-db @nikolamand-db @dbatomic @stefankandic @stevomitric @cloud-fan @MaxGekk I am a bit sceptical about the recent change we introduced in this Suite with UTF8String refactoring. Why are we opting to e2e tests for each expression? Correct me if I am wrong, but these expressions are proven to work e2e with existing tests. Shouldn't we write only unit tests for each expression to test whether these "black boxes" work correctly under different types of collations and reserve e2e tests for multiple stuff, like correctly working from parsing through analysis to execution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The line of reasoning is to cover both unit tests and e2e tests:
- first use CollationSupportSuite.java with a lot of rigorous tests to see whether collation-aware implementation introduced in
CollationSupport.scala
is correct; these tests should only cover "brand new" stuff - i.e. execBinary/execLowercase/execICU behvaiour - then use CollationStringExpressionsSuite.scala with just a couple of SQL tests to verify whether full queries with collation-aware expressions in
stringExpressions.scala
work as expected; these tests should also cover expression return datatype checks, and test whether casting and collation mismatch is handled properly
I tried to boil this down in: https://issues.apache.org/jira/browse/SPARK-47410
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good practice to always have end-to-end tests for each function. The test coverage doesn't need to be very high (the unit tests should cover all the corner cases), but we should have one.
thanks, merging to master! |
What changes were proposed in this pull request?
Addition of InitCap support and tests for Upper, Lower, InitCap.
Why are the changes needed?
We need to support functions with collations.
Does this PR introduce any user-facing change?
Yes, we enabled more functions.
How was this patch tested?
Test in
CollationStringExpressionsSuite
.Was this patch authored or co-authored using generative AI tooling?
No.