Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug fix: Empty Record Batch handling #5131

Merged
merged 3 commits into from
Jan 31, 2023
Merged

Bug fix: Empty Record Batch handling #5131

merged 3 commits into from
Jan 31, 2023

Conversation

mustafasrepo
Copy link
Contributor

Which issue does this PR close?

Closes #5090.

Rationale for this change

When cardinality of column is low, and target partition is high, we may get empty record batches in WindowAggExec. In these cases we receive an error as described in the #5090. This PR adds handling for empty batches to fix the bug.

What changes are included in this PR?

A simple change to fix the bug.

Are these changes tested?

Yes, new tests are added.

Are there any user-facing changes?

No.

@github-actions github-actions bot added the core Core DataFusion crate label Jan 31, 2023
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @mustafasrepo ❤️

Looks like cargo fmt is needed to get CI passing.

@@ -2385,6 +2385,36 @@ async fn test_window_agg_sort_orderby_reversed_partitionby_reversed_plan() -> Re
Ok(())
}

#[tokio::test]
async fn test_window_agg_low_cardinality() -> Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alamb alamb merged commit d59b6dd into apache:master Jan 31, 2023
@alamb
Copy link
Contributor

alamb commented Jan 31, 2023

Thanks again!

@ursabot
Copy link

ursabot commented Jan 31, 2023

Benchmark runs are scheduled for baseline = abeb4fe and contender = d59b6dd. d59b6dd is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@mustafasrepo mustafasrepo deleted the feature/bug_fix_target_window branch February 10, 2023 06:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Window function error: InvalidArgumentError("number of columns(27) must match number of fields(35) in schema"
4 participants