-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IS NOT NULL
predicate rewrite is incorrect
#9230
Comments
I believe @appletreeisyellow plans to revert #9208 and then work on this feature as part of #9231 |
Ah, right. The |
Yeah I am embarassed I didn't catch it in review |
I feel a little strange when reviewing the PR, and thought I missed something. That is! 😄 |
This logic is very subtle. As I mentioned above, I am really thinking that to make the pruning predicate work more generally we need a different approach than this particular style of rewrite (I am partial to #7887) |
Thanks for catching it so quickly @alamb and thanks for the quick review on the revert PR #9232 @alamb and @viirya!
📌 |
Describe the bug
The logic introduced in #9208 is (very subtly) incorrect as I found while upgrading to use it in InfluxDB
To Reproduce
The data is like this
Note it has BOTH 14 null values and 14 non null values for usage_system and non null values
The original predicate was
And the rewritten predicate is (now)
Thus, the input data statistics look like this
With these statistics this predicate evaluates to false and the data is pruned,
Expected behavior
The data should not be pruned because there are rows for which
IS NOT NULL
evaluates to true (there are non null values in the data)Additional context
The pruning predicate needs to return
false
only if "there are no rows that could possibly match the predicate", which forIS NOT NULL
means that there are only null values in the data, but the current implementation checks if there are any null values in the data.So in this case, that the original predicate needs to be rewritten to
We of course don't have
row_count
information yet (but @appletreeisyellow) is working on adding in #9171This kind of subtle logic bug is another reason I think we should be seriously considering the range based analysis described in #7887
The text was updated successfully, but these errors were encountered: