Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Average expression in Comet Final should handle all null inputs from partial Spark aggregation #261

Merged
merged 1 commit into from
Apr 12, 2024

Conversation

viirya
Copy link
Member

@viirya viirya commented Apr 11, 2024

Which issue does this PR close?

Part of #260.

Rationale for this change

Comet Final aggregation could be used after Spark partial aggregation if columnar shuffle is enabled. While enabling columnar shuffle by default in #250, some Spark SQL tests with Average aggregation are failed on all null inputs.

It is because Comet Average expression relies on null buffers of input array to decide if the input are all null or not. But Spark Average expression simply relies on count value. If all inputs are null values, count is 0.

So in above case, Comet Average sees zero values on sum and count states but it still computes the average sum / count instead of returning null. This is different to Spark behavior and we should fix it.

What changes are included in this PR?

How are these changes tested?

viirya

This comment was marked as outdated.

} else {
Ok(ScalarValue::Float64(
self.sum.map(|f| f / self.count as f64),
))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the GroupsAccumulator below has correctly implemented this behavior. Only AvgAccumulator has this issue.

@viirya
Copy link
Member Author

viirya commented Apr 12, 2024

cc @sunchao @huaxingao

Copy link
Contributor

@huaxingao huaxingao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the fix!

@viirya viirya merged commit 421f0e0 into apache:main Apr 12, 2024
28 checks passed
@viirya
Copy link
Member Author

viirya commented Apr 12, 2024

Merged. Thanks.

@viirya viirya deleted the fix_avg_nulls branch April 12, 2024 02:01
himadripal pushed a commit to himadripal/datafusion-comet that referenced this pull request Sep 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants