-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-49911][SQL] Fix semantic of support binary equality #48472
[SPARK-49911][SQL] Fix semantic of support binary equality #48472
Conversation
…antic-of-supportBinaryEquality
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java
Show resolved
Hide resolved
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java
Show resolved
Hide resolved
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so let me check if I understand correctly
we keep supportsBinaryEquality
, which will be:
true
forUTF8_BINARY
false
forUTF8_BINARY_RTRIM
and we also introduce isUtf8BinaryType
, which will be:
true
forUTF8_BINARY
true
forUTF8_BINARY_RTRIM
then, we refactor the execution-related code path branches to rely on isUtf8BinaryType
, rather than supportsBinaryEquality
?
Everything is correct except the last sentence. We will keep relying on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, this makes sense - I'm on board with the approach
one small ask for you tho - let's hold off all unnecessary changes in this PR (for example: modifying the branching pattern in CollationSupport classes). I don't like the idea of modifying some code (that the execution flow can't reach anyways at this time, because analysis will block it) and not adding the appropriate tests
please opt for one of the following:
- undo all needless & untested changes here, and only keep the minimally required scaffolding for further implementation
- keep the
supportsBinaryEquality
->isUtf8BinaryType
changes, but also apply the proper space trimming policy, whitelist the expression and add corresponding tests
so let's just avoid half-baked code, it can be confusing both for you and the reviewers
Agree, makes sense. Let's go with 1, i.e we want have any changes without tests. Will (today) follow up with prs that will have tests and use that code path. |
+1, LGTM. Merging to master. |
What changes were proposed in this pull request?
With introduction of trim collation, what was known as supportsBinaryEquality changes, it is now split in isUtf8BinaryType and usesTrimCollation so that it has correct semantics.
Why are the changes needed?
With introduction of trim collation, what was known as supportsBinaryEquality changes, it is now split in isUtf8BinaryType and usesTrimCollation so that it has correct semantics.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Everything is covered with existing tests, no new functionality is added.
Was this patch authored or co-authored using generative AI tooling?
No.