Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-47443][SQL] Window Aggregate support for collations #45568

Closed

Conversation

dbatomic
Copy link
Contributor

What changes were proposed in this pull request?

This PR introduces support for Window Aggregates when partitioning is done against expressions with non-binary collation.
The approach is same as for regular aggregates. Instead of doing byte-for-byte comparison against UnsafeRow we fall back to interpreted mode if there is a data type in grouping expressions that doesn't satisfy isBinaryStable constraint.

Why are the changes needed?

Previous implementation returned invalid results.

Does this PR introduce any user-facing change?

yes - fixes incorrect behavior.

How was this patch tested?

New test is added in CollationSuite.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Mar 18, 2024
Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except of a minor comment.

val t1 = "T_NON_BINARY"
val t2 = "T_BINARY"

withTable(t1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can list multiple tables:

Suggested change
withTable(t1) {
withTable(t1, t2) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Done.

@MaxGekk
Copy link
Member

MaxGekk commented Mar 20, 2024

+1, LGTM. Merging to master.
Thank you, @dbatomic.

@MaxGekk MaxGekk closed this in 3a3477b Mar 20, 2024
sweisdb pushed a commit to sweisdb/spark that referenced this pull request Apr 1, 2024
### What changes were proposed in this pull request?

This PR introduces support for Window Aggregates when partitioning is done against expressions with non-binary collation.
The approach is same as for regular aggregates. Instead of doing byte-for-byte comparison against `UnsafeRow` we fall back to interpreted mode if there is a data type in grouping expressions that doesn't satisfy `isBinaryStable` constraint.

### Why are the changes needed?

Previous implementation returned invalid results.

### Does this PR introduce _any_ user-facing change?

yes - fixes incorrect behavior.

### How was this patch tested?

New test is added in `CollationSuite`.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#45568 from dbatomic/win_agg_support_for_collations.

Authored-by: Aleksandar Tomic <aleksandar.tomic@databricks.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants