-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect avg calculation from vector runtime #5516
Comments
Fix a bug in the vector runtime for the avg aggregation where null values were improperly being counted Closes #5516
Fix a bug in the vector runtime for the avg aggregation where null values were improperly being counted Closes #5516
Verified in super commit 883ffd2. I now see
It's interesting that even though both the Parquet and CSUP files are generated from the same original BSUP data, the calculations with these in the vector runtime differ from each other and also differ from the calculation with BSUP in the sequential runtime. This doesn't come across as a bug necessarily since users are accustomed to seeing small differences in precision with floating point math, so I expect this might all be explained by something about parallel operations. But I point it out in case it's at all surprising. FWIW, if we use the same Parquet and CSUP files to do the calculation with the sequential runtime, now the result does match what we saw with BSUP in the sequential runtime.
I've opened separate issue #5530 in case that's worthy of closer scrutiny. Thanks @mattnibs! |
Repro is with super commit 200f373.
Test data is the attached data.bsup.gz. This is a simplification of an issue that first surfaced when running the mgbench bench3/q5 query.
I start by confirming the expected result using the original BSUP data in our sequential runtime and seeing it matches with the equivalent query in DuckDB (modulo the typical/tiny floating point differences).
Running the same through the vector runtime when querying as either Parquet or CSUP, I get a significantly different result.
If I run the query through the sequential runtime with either of those vector-friendly formats, I once again get the correct result.
The text was updated successfully, but these errors were encountered: