Fix comparison of nullable values #33757

ranma42 · 2024-05-19T23:35:58Z

Handle comparison of nullable values

add a test that reproduces the issue
extend SqlNullabilityProcessor to handle comparisons of nullable values
cleanup code that was needed because of missing handling of comparisons

maumar · 2024-05-20T22:42:11Z

@ranma42 additional tests are needed. Specifically ones where nullable comparisons are wrapped around Not operation. In SqlNullabilityProcessor -> VisitSqlUnary -> OptimizeNonNullableNotExpression we make an assumption that for non-nullable expression inside Not we can perform a bunch of simplifications:

// !(a > b) -> a <= b
// !(a >= b) -> a < b
// !(a < b) -> a >= b
// !(a <= b) -> a > b

This is not true for operations with nullable values. We should make sure that we do the right thing and don't try to apply this "optimization", given that we add coalesce(x, false) and mark nullability of the expression as false.

ranma42 · 2024-05-21T05:22:05Z

@ranma42 additional tests are needed. Specifically ones where nullable comparisons are wrapped around Not operation. In SqlNullabilityProcessor -> VisitSqlUnary -> OptimizeNonNullableNotExpression we make an assumption that for non-nullable expression inside Not we can perform a bunch of simplifications:
// !(a > b) -> a <= b
// !(a >= b) -> a < b
// !(a < b) -> a >= b
// !(a <= b) -> a > b
This is not true for operations with nullable values. We should make sure that we do the right thing and don't try to apply this "optimization", given that we add coalesce(x, false) and mark nullability of the expression as false.

I re-enabled the Negated_order_comparison_on_nullable_arguments_doesnt_get_optimized test which seems to cover exactly the risk you are describing.
Note that it only checks for the "WHERE" case, if we also want to test that the nullability processing is correct in the "SELECT" case we might want another test (OTOH optimizations in WHERE are generally going to be more aggressive than in SELECT).

Is there a way to observe code coverage using the current testing infrastructure?

maumar · 2024-05-21T18:35:08Z

@ranma42 we don't have any dedicated code coverage infra and as such we don't rely on code coverage too much. I debugged into the code and the COALESCE node blocks the optimizations I was worried about, so we should be safe here. I would add a test for projection, just in case.

There are also some test failures due to sql changes, e.g. Composite_key_join_on_groupby_aggregate_projecting_only_grouping_key

before:

SELECT [l2].[Key]
FROM [LevelOne] AS [l]
INNER JOIN (
    SELECT [l1].[Key], COALESCE(SUM([l1].[Id]), 0) AS [Sum]
    FROM (
        SELECT [l0].[Id], [l0].[Id] % 3 AS [Key]
        FROM [LevelTwo] AS [l0]
    ) AS [l1]
    GROUP BY [l1].[Key]
) AS [l2] ON [l].[Id] = [l2].[Key] AND CAST(1 AS bit) = CASE
    WHEN [l2].[Sum] > 10 THEN CAST(1 AS bit)
    ELSE CAST(0 AS bit)
END

now:

SELECT [l2].[Key]
FROM [LevelOne] AS [l]
INNER JOIN (
    SELECT [l1].[Key], COALESCE(SUM([l1].[Id]), 0) AS [Sum]
    FROM (
        SELECT [l0].[Id], [l0].[Id] % 3 AS [Key]
        FROM [LevelTwo] AS [l0]
    ) AS [l1]
    GROUP BY [l1].[Key]
) AS [l2] ON [l].[Id] = [l2].[Key] AND COALESCE(CASE
    WHEN [l2].[Sum] > 10 THEN CAST(1 AS bit)
    ELSE CAST(0 AS bit)
END, CAST(0 AS bit)) = CAST(1 AS bit)

this is not ideal since the CASE block is already non-nullable, so the COALESCE is unnecessary.

maumar · 2024-05-21T19:35:44Z

the problem is due to search condition converting visitor on sql server. What's happening in the scenario is that when going through null semantics processor the expression is (true == x.Sum > 10), that gets converted to (true == COALESCE(x.Sum > 10, false)), which makes sense.
Then, simplifying visitor kicks in and removes the redundant true == so we end up with COALESCE(x.Sum > 10, false).
And finally we go through search condition conversion which ends up with

COALESCE(CASE
    WHEN l2.Sum > 10 THEN CAST(1 AS bit)
    ELSE CAST(0 AS bit)
END, CAST(0 AS bit)) == CAST(1 AS bit)

at each step we do the right thing, but end up with worse sql, because search condition is already effectively doing the null protection in this case (but we can't know that at the time we apply the coalesce, and it's only on sql server)

ranma42 · 2024-05-21T21:06:08Z

Sorry for the noise, I will check on the full SQL Server testsuite in the next runs 😇

I would eventually like the SQL Server translation to be

SELECT [l2].[Key]
FROM [LevelOne] AS [l]
INNER JOIN (
    SELECT [l1].[Key], COALESCE(SUM([l1].[Id]), 0) AS [Sum]
    FROM (
        SELECT [l0].[Id], [l0].[Id] % 3 AS [Key]
        FROM [LevelTwo] AS [l0]
    ) AS [l1]
    GROUP BY [l1].[Key]
) AS [l2] ON [l].[Id] = [l2].[Key] AND [l2].[Sum] > 10

(no need for any CASE or COALESCE; NULL and FALSE are equivalent in the join condition).

I will try and check whether it is easier to preserve the original translation in this PR or go directly for the improved one.

maumar · 2024-05-22T00:03:53Z

@ranma42 one thing to keep in mind, for join where keys are anonymous objects we actually want c# null semantics (to mimic linq to objects behavior). See #27071 (comment) for some extra context. If you decide to improve the translation, make sure to add tests that exercise this as well, i.e. join with composite keys where the key elements could end up being nulls

ranma42 · 2024-05-22T06:19:52Z

@ranma42 one thing to keep in mind, for join where keys are anonymous objects we actually want c# null semantics (to mimic linq to objects behavior). See #27071 (comment) for some extra context. If you decide to improve the translation, make sure to add tests that exercise this as well, i.e. join with composite keys where the key elements could end up being nulls

Uh, i was not aware of this! In general, it is not trivial to know whether EFCore uses C# or SQL semantics 🤔 and it is even harder along the translation pipeline, as the same expression changes semantics along the way (see ExpressionType.Equal which AFAICT usually starts with C# semantics, but IIUC after the nullability processor should have SQL semantics). I guess I will have to study more on the subject. Is there any documentation I should look up besides https://www.youtube.com/watch?v=1Ld3dtnTrMw ?
I found dotnet/EntityFramework.Docs#1920 (comment) but if it's up to date, I might just have to explore the code (which so far has been fun, so that's not so bad 😅 )

I found the missing optimization that would make the query ideal (see #33776), but regardless of that that optimization I will try and investigate a bit more the pipeline around the changes I am making.

maumar · 2024-05-22T08:09:04Z

@ranma42 yeah, it is a problem with working in this area - there is a lot of tribal knowledge and edge cases that need to be taken into consideration. We don't have any up to date documentation regarding this (issue is tracked here: dotnet/EntityFramework.Docs#3692) In general, we try to mimic the c# semantics as much as we can, meaning the final result of the query should (ideally) be identical with the same query ran on Linq 2 objects (ignoring stuff we can't realistically control like case-insensitive comparisons) with EF adding null ref protection to the mix.

Good example of going to great lengths to mimic c# is 517fc18

Internally, we do two types of expression trees, regular expression tree which represents linq query (and uses c# semantics) and SqlExpression tree which represents sql query (and follows sql null semantics), but they all use ExpressionType to describe the operation.

Our usual solution is to add a lot of tests - AssertQuery infra does full result verification vs linq to objects, and they are relatively fast/cheap to run. So when in doubt - add a test, even if there is potential duplication with existing tests. Things to keep in mind:

filter vs projection,
operations wrapped in NOT,
making sure things work well recursively (combining multiple operations, nesting one within another),

ranma42 · 2024-05-23T06:13:43Z

By emitting CASE WHEN expr THEN TRUE ELSE FALSE END instead of COALESCE(expr, FALSE) I managed to fix most of the SQL Server regressions (I believe there are further opportunities for improvements of COALESCE and equivalent operations in SQL Server, but I will try to keep each PR as narrow as possible 😇 ).

The only remaining regression I observe in the current test suite is

--- a/test/EFCore.SqlServer.FunctionalTests/Query/FunkyDataQuerySqlServerTest.cs
+++ b/test/EFCore.SqlServer.FunctionalTests/Query/FunkyDataQuerySqlServerTest.cs
@@ -158,7 +158,10 @@ public override async Task String_contains_on_argument_with_wildcard_column_nega
 SELECT [f].[FirstName] AS [fn], [f0].[LastName] AS [ln]
 FROM [FunkyCustomers] AS [f]
 CROSS JOIN [FunkyCustomers] AS [f0]
-WHERE NOT ([f].[FirstName] IS NOT NULL AND [f0].[LastName] IS NOT NULL AND (CHARINDEX([f0].[LastName], [f].[FirstName]) > 0 OR [f0].[LastName] LIKE N''))
+WHERE [f].[FirstName] IS NULL OR [f0].[LastName] IS NULL OR (CASE
+    WHEN CHARINDEX([f0].[LastName], [f].[FirstName]) > 0 THEN CAST(1 AS bit)
+    ELSE CAST(0 AS bit)
+END = CAST(0 AS bit) AND [f0].[LastName] NOT LIKE N'')
 """);
     }

which I believe is related to some issues with nullability propagation. I am still investigating (and learning things about the translation pipeline 🤩).

ranma42 · 2024-05-25T20:14:25Z

This is now based on:

Implement nullability computation for SqlFunctionExpression #33814

which takes care of implementing the nullability computation for SqlFunctionExpressions. Thanks to that, the diffs now should look ok.

ranma42 · 2024-05-29T22:33:24Z

I rebased on main so that it includes the fixes to the PRs:

Now that the stack of PRs has resolved, this should be ready for review 🎉

src/EFCore.Sqlite.Core/Query/Internal/Translators/SqliteStringMethodTranslator.cs

As reported in dotnet#33752, `SELECT`ing the result of a comparison between `int?` and `int` can trigger an exception in the shaper.

src/EFCore.Relational/Query/SqlNullabilityProcessor.cs

maumar · 2024-05-30T20:32:06Z

@ranma42 just the small comment update and it should be good to go!

In C# an ordered comparison (<, >, <=, >=) between two nullable values always returns a boolean value: if either operand is null, the result is false; otherwise, the result is that of the comparison of the (non-null) values. Fixes dotnet#33752

The additional `IS NOT NULL` check is not needed anymore since now we do null semantics/compensation for comparison.

Several queries are now simpler as when filtering we can treat NULL and FALSE as equivalent results (they both discard the record).

Now that the comparison null semantics has been implemented, it works as intended.

including negated cases.

ranma42 · 2024-05-30T20:55:47Z

I added the tests before seeing your edit; I think these tests might be valuable because they make it easy to spot that there are some missing optimizations.
I looked for the WHEN instr() pattern in the code and I was unable to find out which tests they are duplicating; if you point me towards those tests, I will drop the new ones and instead make sure that the SQL emitted by the Sqlite provider is asserted on the original ones.

Sorry for the comment, it was left that way from a previous revision 😅

ranma42 · 2024-05-30T20:56:21Z

test/EFCore.Sqlite.FunctionalTests/Query/NorthwindFunctionsQuerySqliteTest.cs

+WHERE NOT (CASE
+    WHEN instr("c"."CompanyName", "c"."ContactName") > 0 THEN 1
+    ELSE 0
+END)


ideally this should be:

Suggested change

WHERE NOT (CASE

WHEN instr("c"."CompanyName", "c"."ContactName") > 0 THEN 1

ELSE 0

END)

WHERE CASE

WHEN instr("c"."CompanyName", "c"."ContactName") > 0 THEN 0

ELSE 1

END

i.e. we could remove the NOT by distributing it over the CASE.
Can we do better?

Good idea, filed #33857. We also have #25977 around case nullability, but it's much more complicated and arguably not that useful.

maumar · 2024-05-30T21:35:26Z

@ranma42 the tests you added are valuable and I would keep them for sure. My comment was to add more tests for un-optimized string.contains (so that we can check that CASE conversion properly compensates for the fact that we no longer do null check on the argument to string.Contains). But then I noticed that we already have Optional_navigation_type_compensation_works_with_binary_and_expression, so we have the coverage. In general, don't worry about test duplication. More tests is pretty much always welcome.

maumar · 2024-05-30T23:00:46Z

Cosmos has some failures, let me take care of it, so you don't need to jump through all the hoops to set it up @ranma42

maumar · 2024-05-31T03:21:56Z

yay, big improvement! thanks again @ranma42

ranma42 force-pushed the fix-nullable-int-compare-int branch from a8a8641 to 859b2cd Compare May 20, 2024 13:01

roji requested a review from maumar May 20, 2024 15:08

ranma42 marked this pull request as draft May 21, 2024 21:06

ranma42 mentioned this pull request May 22, 2024

Simplify TRUE = expr in filtering contexts #33776

Merged

ranma42 marked this pull request as ready for review May 22, 2024 08:54

ranma42 marked this pull request as draft May 22, 2024 08:54

ranma42 force-pushed the fix-nullable-int-compare-int branch from cdf1b2c to dd83431 Compare May 22, 2024 21:43

ranma42 force-pushed the fix-nullable-int-compare-int branch 3 times, most recently from 9154317 to e0be248 Compare May 25, 2024 19:59

ranma42 force-pushed the fix-nullable-int-compare-int branch from e0be248 to 7dc2d71 Compare May 26, 2024 05:33

ranma42 mentioned this pull request May 26, 2024

Check arity consistency when constructing SqlFunctionExpressions #33815

Merged

ranma42 force-pushed the fix-nullable-int-compare-int branch from 7dc2d71 to 743fe70 Compare May 26, 2024 06:07

ranma42 mentioned this pull request May 27, 2024

Translation of impure functions can result in wrong queries #33791

Open

ajcvickers assigned maumar May 28, 2024

ranma42 force-pushed the fix-nullable-int-compare-int branch 2 times, most recently from 49c5ab5 to 89c6c78 Compare May 29, 2024 21:42

ranma42 marked this pull request as ready for review May 29, 2024 22:33

maumar reviewed May 30, 2024

View reviewed changes

src/EFCore.Sqlite.Core/Query/Internal/Translators/SqliteStringMethodTranslator.cs Show resolved Hide resolved

Add test for (x ? null : N) > B

f6c5e58

As reported in dotnet#33752, `SELECT`ing the result of a comparison between `int?` and `int` can trigger an exception in the shaper.

maumar reviewed May 30, 2024

View reviewed changes

src/EFCore.Relational/Query/SqlNullabilityProcessor.cs Outdated Show resolved Hide resolved

ranma42 added 6 commits May 30, 2024 22:46

Fix comparison of nullable values

4db69e0

In C# an ordered comparison (<, >, <=, >=) between two nullable values always returns a boolean value: if either operand is null, the result is false; otherwise, the result is that of the comparison of the (non-null) values. Fixes dotnet#33752

Update tests

cca4de9

Simplify translation of Contains

0092272

The additional `IS NOT NULL` check is not needed anymore since now we do null semantics/compensation for comparison.

Update tests

22bebdf

Several queries are now simpler as when filtering we can treat NULL and FALSE as equivalent results (they both discard the record).

Enable negated comparison test

69273e9

Now that the comparison null semantics has been implemented, it works as intended.

Add tests for projection and filtering of string.Contains

6e0c36c

including negated cases.

ranma42 force-pushed the fix-nullable-int-compare-int branch from 89c6c78 to 6e0c36c Compare May 30, 2024 20:50

ranma42 commented May 30, 2024

View reviewed changes

maumar added 2 commits May 30, 2024 16:25

fixing cosmos test failures

48ff521

adding test baselines for sql server

5887d9b

maumar approved these changes May 31, 2024

View reviewed changes

maumar merged commit b77d2f4 into dotnet:main May 31, 2024
7 checks passed

ranma42 deleted the fix-nullable-int-compare-int branch June 3, 2024 05:46

roji added the community-contribution label Jul 3, 2024

ranma42 mentioned this pull request Jul 23, 2024

Unexpected semantics if .All() predicate evaluates to NULL for some rows? #34271

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix comparison of nullable values #33757

Fix comparison of nullable values #33757

ranma42 commented May 19, 2024

maumar commented May 20, 2024

ranma42 commented May 21, 2024 •

edited

Loading

maumar commented May 21, 2024

maumar commented May 21, 2024 •

edited

Loading

ranma42 commented May 21, 2024

maumar commented May 22, 2024

ranma42 commented May 22, 2024

maumar commented May 22, 2024

ranma42 commented May 23, 2024

ranma42 commented May 25, 2024 •

edited

Loading

ranma42 commented May 29, 2024

maumar commented May 30, 2024

ranma42 commented May 30, 2024

ranma42 May 30, 2024

ranma42 May 30, 2024

maumar May 30, 2024

maumar commented May 30, 2024

maumar commented May 30, 2024

maumar commented May 31, 2024

Fix comparison of nullable values #33757

Fix comparison of nullable values #33757

Conversation

ranma42 commented May 19, 2024

maumar commented May 20, 2024

ranma42 commented May 21, 2024 • edited Loading

maumar commented May 21, 2024

maumar commented May 21, 2024 • edited Loading

ranma42 commented May 21, 2024

maumar commented May 22, 2024

ranma42 commented May 22, 2024

maumar commented May 22, 2024

ranma42 commented May 23, 2024

ranma42 commented May 25, 2024 • edited Loading

ranma42 commented May 29, 2024

maumar commented May 30, 2024

ranma42 commented May 30, 2024

ranma42 May 30, 2024

Choose a reason for hiding this comment

ranma42 May 30, 2024

Choose a reason for hiding this comment

maumar May 30, 2024

Choose a reason for hiding this comment

maumar commented May 30, 2024

maumar commented May 30, 2024

maumar commented May 31, 2024

ranma42 commented May 21, 2024 •

edited

Loading

maumar commented May 21, 2024 •

edited

Loading

ranma42 commented May 25, 2024 •

edited

Loading