-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix comparison of nullable values #33757
Conversation
a8a8641
to
859b2cd
Compare
@ranma42 additional tests are needed. Specifically ones where nullable comparisons are wrapped around Not operation. In SqlNullabilityProcessor -> VisitSqlUnary -> OptimizeNonNullableNotExpression we make an assumption that for non-nullable expression inside Not we can perform a bunch of simplifications:
This is not true for operations with nullable values. We should make sure that we do the right thing and don't try to apply this "optimization", given that we add coalesce(x, false) and mark nullability of the expression as false. |
I re-enabled the Is there a way to observe code coverage using the current testing infrastructure? |
@ranma42 we don't have any dedicated code coverage infra and as such we don't rely on code coverage too much. I debugged into the code and the COALESCE node blocks the optimizations I was worried about, so we should be safe here. I would add a test for projection, just in case. There are also some test failures due to sql changes, e.g. Composite_key_join_on_groupby_aggregate_projecting_only_grouping_key before: SELECT [l2].[Key]
FROM [LevelOne] AS [l]
INNER JOIN (
SELECT [l1].[Key], COALESCE(SUM([l1].[Id]), 0) AS [Sum]
FROM (
SELECT [l0].[Id], [l0].[Id] % 3 AS [Key]
FROM [LevelTwo] AS [l0]
) AS [l1]
GROUP BY [l1].[Key]
) AS [l2] ON [l].[Id] = [l2].[Key] AND CAST(1 AS bit) = CASE
WHEN [l2].[Sum] > 10 THEN CAST(1 AS bit)
ELSE CAST(0 AS bit)
END now: SELECT [l2].[Key]
FROM [LevelOne] AS [l]
INNER JOIN (
SELECT [l1].[Key], COALESCE(SUM([l1].[Id]), 0) AS [Sum]
FROM (
SELECT [l0].[Id], [l0].[Id] % 3 AS [Key]
FROM [LevelTwo] AS [l0]
) AS [l1]
GROUP BY [l1].[Key]
) AS [l2] ON [l].[Id] = [l2].[Key] AND COALESCE(CASE
WHEN [l2].[Sum] > 10 THEN CAST(1 AS bit)
ELSE CAST(0 AS bit)
END, CAST(0 AS bit)) = CAST(1 AS bit) this is not ideal since the CASE block is already non-nullable, so the COALESCE is unnecessary. |
the problem is due to search condition converting visitor on sql server. What's happening in the scenario is that when going through null semantics processor the expression is
at each step we do the right thing, but end up with worse sql, because search condition is already effectively doing the null protection in this case (but we can't know that at the time we apply the coalesce, and it's only on sql server) |
Sorry for the noise, I will check on the full SQL Server testsuite in the next runs 😇 I would eventually like the SQL Server translation to be SELECT [l2].[Key]
FROM [LevelOne] AS [l]
INNER JOIN (
SELECT [l1].[Key], COALESCE(SUM([l1].[Id]), 0) AS [Sum]
FROM (
SELECT [l0].[Id], [l0].[Id] % 3 AS [Key]
FROM [LevelTwo] AS [l0]
) AS [l1]
GROUP BY [l1].[Key]
) AS [l2] ON [l].[Id] = [l2].[Key] AND [l2].[Sum] > 10 (no need for any I will try and check whether it is easier to preserve the original translation in this PR or go directly for the improved one. |
@ranma42 one thing to keep in mind, for join where keys are anonymous objects we actually want c# null semantics (to mimic linq to objects behavior). See #27071 (comment) for some extra context. If you decide to improve the translation, make sure to add tests that exercise this as well, i.e. join with composite keys where the key elements could end up being nulls |
Uh, i was not aware of this! In general, it is not trivial to know whether EFCore uses C# or SQL semantics 🤔 and it is even harder along the translation pipeline, as the same expression changes semantics along the way (see I found the missing optimization that would make the query ideal (see #33776), but regardless of that that optimization I will try and investigate a bit more the pipeline around the changes I am making. |
@ranma42 yeah, it is a problem with working in this area - there is a lot of tribal knowledge and edge cases that need to be taken into consideration. We don't have any up to date documentation regarding this (issue is tracked here: dotnet/EntityFramework.Docs#3692) In general, we try to mimic the c# semantics as much as we can, meaning the final result of the query should (ideally) be identical with the same query ran on Linq 2 objects (ignoring stuff we can't realistically control like case-insensitive comparisons) with EF adding null ref protection to the mix. Good example of going to great lengths to mimic c# is 517fc18 Internally, we do two types of expression trees, regular expression tree which represents linq query (and uses c# semantics) and SqlExpression tree which represents sql query (and follows sql null semantics), but they all use Our usual solution is to add a lot of tests - AssertQuery infra does full result verification vs linq to objects, and they are relatively fast/cheap to run. So when in doubt - add a test, even if there is potential duplication with existing tests. Things to keep in mind:
|
cdf1b2c
to
dd83431
Compare
By emitting The only remaining regression I observe in the current test suite is --- a/test/EFCore.SqlServer.FunctionalTests/Query/FunkyDataQuerySqlServerTest.cs
+++ b/test/EFCore.SqlServer.FunctionalTests/Query/FunkyDataQuerySqlServerTest.cs
@@ -158,7 +158,10 @@ public override async Task String_contains_on_argument_with_wildcard_column_nega
SELECT [f].[FirstName] AS [fn], [f0].[LastName] AS [ln]
FROM [FunkyCustomers] AS [f]
CROSS JOIN [FunkyCustomers] AS [f0]
-WHERE NOT ([f].[FirstName] IS NOT NULL AND [f0].[LastName] IS NOT NULL AND (CHARINDEX([f0].[LastName], [f].[FirstName]) > 0 OR [f0].[LastName] LIKE N''))
+WHERE [f].[FirstName] IS NULL OR [f0].[LastName] IS NULL OR (CASE
+ WHEN CHARINDEX([f0].[LastName], [f].[FirstName]) > 0 THEN CAST(1 AS bit)
+ ELSE CAST(0 AS bit)
+END = CAST(0 AS bit) AND [f0].[LastName] NOT LIKE N'')
""");
} which I believe is related to some issues with nullability propagation. I am still investigating (and learning things about the translation pipeline 🤩). |
9154317
to
e0be248
Compare
This is now based on: which takes care of implementing the nullability computation for |
e0be248
to
7dc2d71
Compare
7dc2d71
to
743fe70
Compare
49c5ab5
to
89c6c78
Compare
I rebased on main so that it includes the fixes to the PRs:
Now that the stack of PRs has resolved, this should be ready for review 🎉 |
src/EFCore.Sqlite.Core/Query/Internal/Translators/SqliteStringMethodTranslator.cs
Show resolved
Hide resolved
As reported in dotnet#33752, `SELECT`ing the result of a comparison between `int?` and `int` can trigger an exception in the shaper.
@ranma42 just the small comment update and it should be good to go! |
In C# an ordered comparison (<, >, <=, >=) between two nullable values always returns a boolean value: if either operand is null, the result is false; otherwise, the result is that of the comparison of the (non-null) values. Fixes dotnet#33752
The additional `IS NOT NULL` check is not needed anymore since now we do null semantics/compensation for comparison.
Several queries are now simpler as when filtering we can treat NULL and FALSE as equivalent results (they both discard the record).
Now that the comparison null semantics has been implemented, it works as intended.
including negated cases.
89c6c78
to
6e0c36c
Compare
I added the tests before seeing your edit; I think these tests might be valuable because they make it easy to spot that there are some missing optimizations. Sorry for the comment, it was left that way from a previous revision 😅 |
WHERE NOT (CASE | ||
WHEN instr("c"."CompanyName", "c"."ContactName") > 0 THEN 1 | ||
ELSE 0 | ||
END) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ideally this should be:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WHERE NOT (CASE | |
WHEN instr("c"."CompanyName", "c"."ContactName") > 0 THEN 1 | |
ELSE 0 | |
END) | |
WHERE CASE | |
WHEN instr("c"."CompanyName", "c"."ContactName") > 0 THEN 0 | |
ELSE 1 | |
END |
i.e. we could remove the NOT
by distributing it over the CASE
.
Can we do better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ranma42 the tests you added are valuable and I would keep them for sure. My comment was to add more tests for un-optimized string.contains (so that we can check that CASE conversion properly compensates for the fact that we no longer do null check on the argument to string.Contains). But then I noticed that we already have |
Cosmos has some failures, let me take care of it, so you don't need to jump through all the hoops to set it up @ranma42 |
yay, big improvement! thanks again @ranma42 |
Handle comparison of nullable values
SqlNullabilityProcessor
to handle comparisons of nullable valuesFixes #33752