Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contains translates to correlated query when the item is nullable #32574

Closed
roji opened this issue Dec 9, 2023 · 3 comments · Fixed by #32575
Closed

Contains translates to correlated query when the item is nullable #32574

roji opened this issue Dec 9, 2023 · 3 comments · Fixed by #32575
Assignees
Labels
area-perf closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. customer-reported regression Servicing-approved type-enhancement
Milestone

Comments

@roji
Copy link
Member

roji commented Dec 9, 2023

We usually translate Contains to an uncorrelated IN subquery, e.g. WHERE x.Item IN (SELECT value FROM OPENJSON(@p)). However, when the item is nullable, we translate to a correlated EXISTS subquery instead: WHERE EXISTS (SELECT value FROM OPENJSON(..) WHERE value = x.Item.

The problem is that during translation of primitive collections, we cannot examine parameter contents, and so we must assume that there are possible nulls in the collection (unless we're dealing with a non-nullable value). Later, in SqlNullabilityProcessor, when processing the InExpression we see that both sides (item and subquery) are nullable, meaning that we can't use SQL IN for that (impossible to distinguish between a NULL result that indicates that the item wasn't found, and a NULL result that indicates that NULL was found). As a result, we currently transform the IN to an EXISTS, to make sure results are accurate.

The fix here would be to look into the parameterized collection in SqlNullabilityProcessor (where we can access parameters), and rewrite it to remove any null values, compensating with an additional OR clause. Once nulls are cleared out, we can keep the uncorrelated IN translation, rather than switching to correlated EXISTS. Note that this would be the first time we rewrite parameters in the 2nd part of the query pipeline; it also means we can't cache the resulting SQL, since the parameter rewriting must happen again for each query.

(/cc @ajcvickers regarding value converters and null: since the query pipeline cares about nulls vs. non-nulls, if value converters support converting between nulls and non-nulls, that has to happen early, otherwise we'll get incorrect results)

Note that any other type of IN+subquery will continue to be translated as before, i.e. transforming IN to correlated EXISTS when both sides are nullable. This isn't a regression from previous releases.

@roji
Copy link
Member Author

roji commented Jan 9, 2024

Reopening to consider servicing.

roji added a commit to roji/efcore that referenced this issue Jan 10, 2024
roji added a commit to roji/efcore that referenced this issue Jan 10, 2024
roji added a commit to roji/efcore that referenced this issue Jan 10, 2024
@ajcvickers ajcvickers added this to the 8.0.x milestone Jan 10, 2024
roji added a commit that referenced this issue Jan 12, 2024
@asos-martinsmith
Copy link

asos-martinsmith commented Jan 13, 2024

we can't use SQL IN for that (impossible to distinguish between a NULL result that indicates that the item wasn't found, and a NULL result that indicates that NULL was found). As a result, we currently transform the IN to an EXISTS, to make sure results are accurate.

WHERE IN and WHERE EXISTS behave identically with NULLs - both variants will only return rows where the value is not null and is matched on both sides.

The expressions can evaluate to different results in 3 valued logic https://dbfiddle.uk/Jt--BJyX but the WHERE clause only returns rows where the predicate evaluates to true and this is the same for both.

  • is this for the negated case? (as WHERE NOT IN and WHERE NOT EXISTS don't behave the same as negating "Unknown" still gives "Unknown")?

@roji
Copy link
Member Author

roji commented Jan 13, 2024

@asos-martinsmith yes, I'm aware of the 3-value logic behavior of IN and EXISTS. However, EXISTS allows customizing the predicate. For example, consider translating LINQ something.Contains(x) to SQL IN; the natural translation would be with IN: x IN (subquery), and this is what EF indeed does in the regular case. However, if x is nullable and subquery may return nullable elements, then there's no way to distinguish between the case where x was not found in the subquery results (NULL is returned), and the case where x was NULL and the subquery also returned NULL (NULL is also returned). This is an issue when trying to replicate the C# 2-value logic behavior via 3-value SQL (LINQ something.Contains(x) is expected to return true when x is null and something contains null as well).

The way EF handles this at the moment, is to use EXISTS instead, and compensate for 3-value logic in the predicate; so instead of WHERE x IN (SELECT i FROM t), we generate WHERE EXISTS (SELECT i FROM t WHERE i = x OR (i IS NULL AND x IS NULL). However, that's a more complex query, and is crucially also correlated (the reference to x which comes from the outside), and is the source of some severe performance issues.

The fix in #32575 if for EF to do parameter preprocessing, for the case where subquery represents a simple parameterized list, remove the NULL out of it, adding additional clauses to compensate; so if we detect that the list contains null, we remove it and generate: WHERE x IN (subquery) OR x IS NULL; this allows us to continue using the tighter and uncorrelated IN construct, while preserving the desired 2-value behavior.

Negation indeed introduces some further complications; you're welcome to take a look at #32575 which does this work, and I'm otherwise happy to continue exchanging around this!

@ajcvickers ajcvickers added the closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. label Jan 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-perf closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. customer-reported regression Servicing-approved type-enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants