fix: Average expression in Comet Final should handle all null inputs from partial Spark aggregation #261
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Part of #260.
Rationale for this change
Comet Final aggregation could be used after Spark partial aggregation if columnar shuffle is enabled. While enabling columnar shuffle by default in #250, some Spark SQL tests with Average aggregation are failed on all null inputs.
It is because Comet Average expression relies on null buffers of input array to decide if the input are all null or not. But Spark Average expression simply relies on
count
value. If all inputs are null values,count
is 0.So in above case, Comet Average sees zero values on
sum
andcount
states but it still computes the averagesum / count
instead of returning null. This is different to Spark behavior and we should fix it.What changes are included in this PR?
How are these changes tested?