-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
31.0.0 not filter is always false for missing column #17528
Comments
This sounds like expected behavior to me, given SQL compatible null handling and 3-valued logic. The nonexistent column Some time between Druid 24 and 31, SQL compatible NULL handling became the default behavior, so that's likely why upgrading changed behavior here. If you want the top level filter to be
The For the time being, you can also set the NULL handling back to legacy mode to get the old behavior. This legacy mode will however be removed in the future, so that's not a long term solution. |
Ok, thanks for the clarification. I had read about the SQL compatible NULL changes but had not considered they would impact something like this (especially in native queries). I'm not sure this type of "side-effect" was immediately obvious though. This makes a bit more sense if you know exactly which columns exist but when you have some columns that may or may not be present I guess you need to be more careful (SQL compliance really makes much less sense intuitively when there is no strict schema). Perhaps it could be added to the documentation of the selector filter that it treats UNKNOWN values as false. That does sound reasonable when you spell it out, but it may surprise less experienced users when the rows returned by the filter "not(a = b)" isn't the logical complement of the rows returned by "a = b". (EDIT: well, in one sentence on https://druid.apache.org/docs/latest/querying/filters there is a statement about the 3-valued logic, very easy to miss it and fail to understand all the consequences though). We can live with this now by educating our team and changing some wrapper code to generate logical not filters in the form of not(istrue(filter))) instead of just not(filter). I already had to fix this in at least four separate places in our application after upgrading to 31.0.0. |
Affected Version
31.0.0
Description
Hello,
We have a use case where we use string-based schemaless ingestion for a datasource where a particular column may or may not be present. We were querying that data source with a condition to filter out rows where that column is NOT equal to a particular value (
{"type": "not", "field": {"type": "selector", "dimension": "our-column", "value": "our-value"}}
). However, against our expectations, this filter was not returning rows that were missing this column entirely.We were able to reproduce this by simply querying on a completely bogus column:
Expectation: this filter should be equivalent to no filter at all, because there are no rows with this column and consequently all rows should match the condition that this row is not equal to some value.
Reality: no rows are returned. The same happens with an in-filter and even with a regex filter.
We are now able to work around this by instead filtering on a virtual column (with
nvl(our-column, 'default_value')
), but we are concerned that this behavior my silently filter out data in some other cases. Or if this behavior is by design, we would at least expect to find it in the documentation because it is very surprising.EDIT: as an update to this, these queries were working as expected in Druid version 24.0.1 which we recently upgraded from.
The text was updated successfully, but these errors were encountered: