Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement IS [NOT] DISTINCT FROM translation #34048

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

ranma42
Copy link
Contributor

@ranma42 ranma42 commented Jun 21, 2024

Most SQL engines support the IS [NOT] DISTINCT FROM predicate, which conveniently matches C# semantics for !=/==.
Where available (and efficient aka supported by indexes), using it can lead to simpler queries.

Fixes #10514, fixes #29624.

@ranma42
Copy link
Contributor Author

ranma42 commented Jun 21, 2024

As mentioned in #29624 (comment) this is still WIP (opened as PR for viewing convenience @roji ).

@ErikEJ
Copy link
Contributor

ErikEJ commented Jun 21, 2024

Will require Compat level 160

@ranma42
Copy link
Contributor Author

ranma42 commented Jun 21, 2024

I will work more on this in the weekend.
⚠️ This changeset might cause is a significant reduction in the test coverage for RewriteNullSemantics.
Will think about this in the weekend (maybe we can explicitly test for the interesting cases with a group of tests that intentionally "downgrade" the max sqlserver version).

@ranma42 ranma42 force-pushed the is-not-distinct-from branch from 99f3704 to adb20b7 Compare June 22, 2024 11:41
@ranma42
Copy link
Contributor Author

ranma42 commented Jun 22, 2024

I added the checks for SQLite version >= 3.8.11 and SqlServer CompatibilityVersion >= 160

Several tests are failing on SqlServer:

  • Compare_negated_bool_with_negated_bool_equal
  • Compare_bool_with_negated_bool_not_equal
  • Compare_negated_bool_with_bool_not_equal
  • Compare_negated_bool_with_bool_equal_negated
  • Compare_negated_bool_with_negated_bool_not_equal
  • Compare_negated_bool_with_bool_equal
  • Compare_bool_with_negated_bool_equal
  • Compare_bool_with_negated_bool_equal_negated
  • Compare_bool_with_negated_bool_not_equal_negated
  • Compare_negated_bool_with_negated_bool_equal_negated
  • Compare_negated_bool_with_bool_not_equal_negated
  • Compare_negated_bool_with_negated_bool_not_equal_negated

IIUC, this is because of #34001

Note that currently the test have not yet been extended to check both the IS and the relational-null-replacement code paths.

@ranma42
Copy link
Contributor Author

ranma42 commented Jun 22, 2024

I pushed a commit that avoids using IS [NOT] DISTINCT FROM when that would trigger #34001.
Obviously it would still be desirable to (also) fix #34001

@@ -14,6 +15,8 @@ namespace Microsoft.EntityFrameworkCore.Sqlite.Query.Internal;
/// </summary>
public class SqliteSqlNullabilityProcessor : SqlNullabilityProcessor
{
private static readonly bool _useIs = new Version(new SqliteConnection().ServerVersion) >= new Version(3, 8, 11);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder if it is OK to assume that users will always use a newer version than this one from 2015, since the provider use Microsoft.Data.Sqlite ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it is very likely, but I thought I could check the version just in case (also, it is only checked once, as it is a static field).
OTOH the documentation currently states that the provider supports >= 3.7

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For EF Core 9, the supported engine version is 3.44

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not aware of that (I thought that was only the one shipped by default on windows; on linux there is support for using the system-provided sqlite library).

I can update the PR to drop the check if 3.44 is the new minimum version.
In that case I will also push a PR to cleanup the checks for _areJsonFunctionsSupported and _isReturningClauseSupported (respectively 3.38 and 3.35).
Should I also open a PR to bump the minimum version in https://github.com/dotnet/EntityFramework.Docs/blob/main/entity-framework/core/providers/sqlite/index.md ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can update the PR to drop the check if 3.44 is the new minimum version.

I think users can opt in to use their own version, so assuming 3.44 covers everything is probably too optimistic.

In that case I will also push a PR to cleanup the checks for _areJsonFunctionsSupported and _isReturningClauseSupported (respectively 3.38 and 3.35).

These checks were added before EF Core 9, and referring to versions much more recent than 2015, so maybe they should jus be left in.

Should I also open a PR to bump the minimum version

I think there probably is a distinction between the oldest supported version of the ADO.NET driver and the oldest Sqlite version it supports versus the oldest supported version by EF Core.

But I am just a community member, let's see what the EF Core team thinks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't think we have a very clear policy here. As long as having the check is cheap and doesn't add any particular complexity (as seems to be the case here), I'd err towards just having it - even though a version from 2015 does indeed sound very unlikely. Of course when can discuss this on a case-by-case basis.

@ranma42
Copy link
Contributor Author

ranma42 commented Jun 23, 2024

As an intermediate step, I opened #34072 which adds exhaustive testing of the null rewriting (plus some cleanup/minor improvements ;) ).

My current plan would be to run this test for SQL Server both normally and forcing a lower compatibility version for these tests (following the model of PrimitiveCollectionsQueryOldSqlServerTest).

Copy link
Member

@roji roji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good @ranma42, and is a substantial improvement for the SQL that we generate - thanks!

Note that I've reached out to someone on the Azure SQL side to confirm that this change is a good idea and there aren't any gotchas - it may be good to wait for confirmation before investing too much time into it (though I doubt there'll be a problem).

Also, given that this is quite a big and potentially risky change, I'd add an appcontext switch to allow users to opt out of this very specific translation, just in case there's trouble. I'll bring this up (along with other questions around this) in the team design meeting.

Comment on lines 370 to 371
_boolTypeMapping
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
_boolTypeMapping
);
_boolTypeMapping);

/// not used in application code.
/// </para>
/// </summary>
public class IsExpression : SqlExpression
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IsDistinctFromExpression? IsExpression seems underspecified, especially given that there really are other IS expressions in SQL (e.g. IS NULL)

Copy link
Contributor Author

@ranma42 ranma42 Jun 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OTOH IsExpression(x, SqlConstantExpression(null)) would be a perfectly normal representation for x IS NULL ;)

Jokes aside, either name works for me; I'll leave it alone for the time being, but I'll update it before marking this PR as ready for review. In the long term I would like:

  1. having expressions that represent C#/LINQ semantics more accurately (that along the pipeline get replaced as needed)
  2. expressions that do not change semantics along the pipeline (currently the SqlBinaryExpression(ExpressionType.Equal starts as C# equality and in the SqlNullabilityProcessor becomes SQL equality, which changes its behavior on NULLs).

I would also like some canonicalization, to reduce the number of cases that must be handled, but then again it does not belong to this PR (I'll add it to my notes, though).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OTOH IsExpression(x, SqlConstantExpression(null)) would be a perfectly normal representation for x IS NULL ;)

Yeah :) PG also has IS TRUE and IS FALSE (I think that's just alternative syntax to equality), but of course what you can put after IS is very limited - that's why a specific expression time is justified...

All the rest definitely sounds interesting, looking forward to discussing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OTOH IsExpression(x, SqlConstantExpression(null)) would be a perfectly normal representation for x IS NULL ;)

Yeah :) PG also has IS TRUE and IS FALSE (I think that's just alternative syntax to equality), but of course what you can put after IS is very limited - that's why a specific expression time is justified...

According to the docs IS TRUE and IS FALSE never evaluate to NULL, i.e. IS (when syntactically valid) it is a shorter alias for IS NOT DISTINCT FROM both in Sqlite and PG 😉 (similarly, IS NOT/IS DISTINCT FROM).

Given that the standard-SQL expression has the disadvantage of the embedded negation, would a completely different name such as StrictEqualExpression or something like that make sense? I am open to suggestions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh nice, I missed that detail... So yeah, in PostgreSQL, IS NOT DISTINCT FROM TRUE could be simplified to IS TRUE` in SQL generation.

Re StrictEqualExpression, we generally try to have SQL expressions correspond as much as possible to the actual syntax they represent (including the name).

I know it's a bit ugly, but do you see a concrete disadvantage in just having IsDistinctFromExpression, and then wrapping that in negation for the "strict" (or two-value) equality check? The main disadvantage I see is that it would make it a bit more difficult to pattern-match equality later; but the transformation to 2-value logic happens very late in the pipeline (SqlNullabilityProcessor), so that's not likely to be needed much; plus we'd anyway have the problem of pattern-matching both regular equality (via SqlBinaryExpression) and the new expression, so we could have some function to encapsulate that etc...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, let's stick to modeling the SQL syntax as closely as possible in SqlExpressions.
It might be convenient to devise an intermediate representation/core language (ideally to represent generic collection expressions, both for relational and cosmos/other providers 😉 ) that is tuned for analysis/transformation/optimization, but that is definitely a different topic (/epic 🤣 ).

Jumping to another different topic, do you have any pointer on where/how the transformations rely on what indexes are defined? In some cases it could be very important to take advantage of that (for example in some combinations of NOT and =) but AFAICT these expressions are not optimized taking into account indexes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be convenient to devise an intermediate representation/core language (ideally to represent generic collection expressions, both for relational and cosmos/other providers 😉 )

Yes, agreed - we are indeed lacking something like that.

Jumping to another different topic, do you have any pointer on where/how the transformations rely on what indexes are defined?

I don't think we do anything like that right now - but this is something we've discussed before and would definitely be willing to do where it makes sense. We generally just try to do a translation that works well with indexes, but also doesn't perform badly without them; in other words, I can't recall a scenario where we had two translations, one that's better when there's an index but actually inferior when there isn't - though we definitely may have missed some. Do you have a concrete case in mind?

Indexes already impact EF runtime behavior in the update pipeline, and we could definitely use them to inform query generation as well. FWIW the PostgreSQL translation for Contains varied for whether the list is a column or parameter/constant, with the column translation explicitly chosen to allow (GIN) indexes to be used. But that's different from actually doing something because we know there's an index.

In any case, implementation-wise this could be slightly tricky, since once a property access is translated it becomes a ColumnExpression that no longer references the model - only a TypeMapping. So I think at this point, you can only know if an index is present if you pattern match the entire sub-tree, i.e. before a property access has been translated to ColumnExpression. Note that this loss of information is probably problematic in other cases as well - we could consider extending our SQL tree model to include this kind of information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A simple example in which what is the best way forward depends on the availability of indexes is NOT(x == y). Let's assume x and y are boolean columns from different tables.
The two expressions NOT(x == y) and x != y will most likely perform about the same.
I x == NOT(y) could be more efficient if there is an index on x, conversely NOT(x) == y would be more efficient if the index was on y.
I am not sure if this holds true for all SQL engines, but at least Sqlite plans match my assumption.
In fact, Sqlite is much happier with ==, even without an index: it will automatically create a query-time index.

The SELECT * FROM t1, t2 WHERE NOT(x == y) query is definitely very contrived, but it can serve as a model for more relevant cases. OTOH all of the interesting cases I found so far in practice did not involve matching two columns directly and instead could be efficiently optimized by looking at the types; namely, when possible push the NOT to the non-column expression.
This syntax-driven (opposed to dbmodel-driven) policy seems to be sufficient for most cases, just like in the PG example of Contains, so it likely be deferred to future investigations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that's a good example, though non-equality between boolean columns indeed seems very specific (and a bit contrived). I wonder if other SQL engines do this sort of optimization themselves, but in any case it's indeed a good case where model-driven translation makes sense.

Maybe open a separate issue so we have this in the backlog? There's still the question of whether we implement something like this as a very targeted, specific optimization (i.e. recognize non-equality specifically over bool columns when translating), or think about a more general approach where model information (i.e. the IProperty) would get bubbled up, to allow model-driven optimizations on arbitrary fragments. To actually go into the latter I think we'd need more justification...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opened #34159 to track this

@@ -681,6 +681,7 @@ protected override bool TryGetOperatorInfo(SqlExpression expression, out int pre

CollateExpression => (900, false),
LikeExpression => (350, false),
IsExpression => (500, false),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure you could verify this value, since the docs don't list it unfortunately...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no; I should probably mark it as tentative, although it makes sense that it has the same precedence as other comparisons.

maybe I could write down a script that checks precedence for the operators

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case you missed it, we have OperatorsQueryTestBase where we do some operator precedence tests - definitely feel free some tests around this there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote a simple program that computes priority and apparently it works reasonably well.
I will try to clean it up and post it as a gist 😇

Unfortunately it is not very relevant in this case, because SqlServer operators are basically partitioned in 3 classes:

  • "value" operators (+, *, |, ^, ~, ...), which work on value types (bit, bigint, text, ...) and return a value type
  • comparison operators (=, <, IS NULL, ...) which work on value types and return Boolean
  • Boolean operators (AND, OR, NOT) which work on Boolen arguments and return Boolean

These 3 classes cannot be mixed because they are basically incompatible: in T-SQL you cannot do (x = y) = z or (x = y) AND z (assuming x, y and z are all bit columns).

IS [NOT] DISTINCT FROM is in the "comparison" group and these operators cannot be composed with one another, so they cannot really have a different priority to each other. I believe they have been designed to have a priority between value and Boolean operators so that by default they avoid nonsensical parsing such as:

  • x + y = a * b parsed as x + (y = a) * b (nonsensical because the Boolean result of y = a cannot be used with value operators)
  • x = y AND a = b parsed as x = (y AND a) = b (nonsensical because the values y and a cannot be used as arguments of the Boolean AND operator)

{
var result = base.RewriteNullSemantics(sqlBinaryExpression, left, right, leftNullable, rightNullable, optimize, out nullable);

if (_useIs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since IS DISTINCT FROM is possibly/probably becomining a relatively standard cross-provider thing, I'd maybe provide a more specialized extension point to allow it, i.e. instead of having providers override RewriteNullSemantics, I'd consider having a ApplyEqualityNullSemantics, which gets called only for equality/inequality where compensation is needed (i.e. both sides are nullable or non-optimized mode etc.). The default implementation of this method would add IS NULL compensation as today, but SQL Server (and others) could override it to return the new IS DISTINCT FROM.

How does that sound?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to me; in fact, ideally I would like to completely avoid calling RewriteNullSemantics from providers that are happy with IS NOT DISTINCT FROM (maybe even more than =).

While there are 2 trivial extremes (always use IS NOT DISTINCT FROM, always RewriteNullSemantics), we should allow for some flexibility so that each provider can choose between the two translations depending on the single case (for example SqlServer currently has some trouble with bool? expressions, while Postgres might want to only use the IS NOT DISTINCT FROM for complex predicates).

I originally planned to introduce the IS NOT DISTINCT FROM predicate much earlier in the pipeline (as it matches the C# semantics), but that would require touching more parts of the translation pipeline.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should allow for some flexibility so that each provider can choose between the two translations [...]

Yep, agree - that's why an overridable method probably makes the most sense, where the default does the current behavior, and providers are free to override and do whatever decision-making they want for returning IS DISTINCT FROM instead. To be clear, all I'm advocating in this comment is making a narrower/more specialized extension point for providers to override.

I originally planned to introduce the IS NOT DISTINCT FROM predicate much earlier in the pipeline (as it matches the C# semantics), but that would require touching more parts of the translation pipeline.

Yeah, this would require knowing the nullability (or rather, nullness) of parameters, which we can't do earlier in the pipeline (before the parameter nullability cache).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this would require knowing the nullability (or rather, nullness) of parameters, which we can't do earlier in the pipeline (before the parameter nullability cache).

What I meant is that IS [NOT] DISTINCT FROM could be the "trivial" translation of the C# [in]equality, especially before the parameter cache, when the SQL cannot yet rely on a parameter being [non-]NULL.
Conversely the RewriteNullSemantics could be the (shared/reusable) lowering for providers that do not support it (or that only support it efficiently in some cases).

Also, this makes me wonder... do parameters always lead to separate cache entries depending on them being NULL/non-NULL? (aka is the current assumption that NULL vs non-NULL parameters lead to different queries?)
It might be possible (and more efficient as it could lead to better cache hit ratios) to avoid this partitioning for parameters that are used in "null-safe" expressions (such as IS [NOT] DISTINCT FROM).

What I mean is that the usual translation for tab.column == myParam would be:

  • if myParam != null (C# comparison)
tab.column = @myParam
  • if myParam == null (C# comparison)
tab.column IS NULL

With IS [NOT] DISTINCT FROM the translation could be unified:

tab.column IS NOT DISTINCT FROM @myParam

(as usual, assuming that the DB plan is as good as with the two previous translation)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant is that IS [NOT] DISTINCT FROM could be the "trivial" translation of the C# [in]equality, especially before the parameter cache, when the SQL cannot yet rely on a parameter being [non-]NULL.
Conversely the RewriteNullSemantics could be the (shared/reusable) lowering for providers that do not support it (or that only support it efficiently in some cases).

That's an interesting idea... One thing to keep is mind is that we have the "relational semantics" opt-in, which very few use and which means that e.g. equality is not compensated for. I think the model is generally that SQL gets to SqlNullabilityProcessor representing 3-value logic (relational semantics), and SqlNullabilityProcessor is the thing responsible for doing all the necessary compensating to convert the tree to be 2-value logic. I'm not sure if we actually keep to this model, but if so, it would be better to keep use simple equality before SqlNullabilityProcessor, and only rewrite to IS DISTINCT FROM at that point. But definitely an interesting point to discuss.

do parameters always lead to separate cache entries depending on them being NULL/non-NULL? (aka is the current assumption that NULL vs non-NULL parameters lead to different queries?)
It might be possible (and more efficient as it could lead to better cache hit ratios) to avoid this partitioning for parameters that are used in "null-safe" expressions (such as IS [NOT] DISTINCT FROM).

Yes, this is how things currently work; and you're right that if we shift to IS [NOT] DISTINCT FROM, quite a few LINQ queries which today produce different SQLs (WHERE b.Foo = @foo vs. WHERE b.Foo IS NULL) will start producing the same SQLs. We could consider a scheme where we manage caching more intelligently (rather than assume that parameter nullability is always important); I'd generally consider it somewhat low-priority in terms of memory/perf impact, but I could be wrong.

/cc @maumar on this conversation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On that last point, you'd probably be interested in #17598; honestly, I don't think we've seen much need for the complexity over the years.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is how things currently work; and you're right that if we shift to IS [NOT] DISTINCT FROM, quite a few LINQ queries which today produce different SQLs (WHERE b.Foo = @foo vs. WHERE b.Foo IS NULL) will start producing the same SQLs. We could consider a scheme where we manage caching more intelligently (rather than assume that parameter nullability is always important); I'd generally consider it somewhat low-priority in terms of memory/perf impact, but I could be wrong.

I think it might be relevant for queries with "many" parameters, especially in the context of compiled queries: emitting all of the combinations would result in 2^(num_parameters) queries, which probably is "too big" with 10-12 parameters already.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're probably referring to "precompiled" queries, where the actual end SQLs get pregenerated as into C# code... And you're right, because of the explosion of parameter nullness permutations, we currently only pregenerate SQL for up to 3 nullable parameters (note that not all parameters are nullable in the first place); for queries with more we fall back to generating the SQL at runtime.

Note that we have to sometime do this anyway, since we sometimes can't cache the SQL; this happens when we peek into actual parameters value (so beyond whether they're null or not) - since only the nullness is part of the cache key, we can't cache (search for doNotCache to see these cases).

@@ -14,6 +15,8 @@ namespace Microsoft.EntityFrameworkCore.Sqlite.Query.Internal;
/// </summary>
public class SqliteSqlNullabilityProcessor : SqlNullabilityProcessor
{
private static readonly bool _useIs = new Version(new SqliteConnection().ServerVersion) >= new Version(3, 8, 11);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't think we have a very clear policy here. As long as having the check is cheap and doesn't add any particular complexity (as seems to be the case here), I'd err towards just having it - even though a version from 2015 does indeed sound very unlikely. Of course when can discuss this on a case-by-case basis.

FROM [LevelTwo] AS [l0]
WHERE [l0].[Level1_Optional_Id] = [l].[Id]
ORDER BY [l0].[Id]) IS NULL
ORDER BY [l0].[Id]) IS DISTINCT FROM N'Foo'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

WHEN [l8].[OneToOne_Required_PK_Date] IS NOT NULL AND [l8].[Level1_Required_Id] IS NOT NULL AND [l8].[OneToMany_Required_Inverse2Id] IS NOT NULL THEN [l8].[Id]
END
WHERE [l4].[OneToOne_Required_PK_Date] IS NOT NULL AND [l4].[Level1_Required_Id] IS NOT NULL AND [l4].[OneToMany_Required_Inverse2Id] IS NOT NULL AND ([l2].[Id] = [l6].[Id] OR ([l2].[Id] IS NULL AND [l6].[Id] IS NULL))) IS NULL
WHERE [l4].[OneToOne_Required_PK_Date] IS NOT NULL AND [l4].[Level1_Required_Id] IS NOT NULL AND [l4].[OneToMany_Required_Inverse2Id] IS NOT NULL AND [l2].[Id] IS NOT DISTINCT FROM [l6].[Id]) IS DISTINCT FROM 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes

@@ -60,7 +60,10 @@ public override async Task Compare_negated_bool_with_bool_equal(bool async)
"""
SELECT [e].[Id]
FROM [Entities1] AS [e]
WHERE [e].[BoolA] <> [e].[NullableBoolB] AND [e].[NullableBoolB] IS NOT NULL
WHERE CASE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes the SQL a bit more complex (though seems superior to the original as NullableBoolB is only evaluated once. What's the issue here (is it #34001)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, this is related to the NOT folding/simplification in the SearchConditionConvertingExpressionVisitor.
OTOH in the current local branch I completely disabled the use of IS [NOT] DISTINCT FROM for bool?, so it neither improves nor regresses in this case.

I plan to work more on simplifying these expressions, which is part of the reason why I was looking into XOR (see #34071), but in the short term I would be happy with simply avoiding regressions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing, as I said the new SQL already seems better than the old (even if a bit more verbose).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The transformation that would take care of this is an extension of:

@roji roji assigned roji and maumar Jun 24, 2024
@ranma42 ranma42 force-pushed the is-not-distinct-from branch from b60c9fb to ef1ea97 Compare June 29, 2024 15:22
@ranma42
Copy link
Contributor Author

ranma42 commented Jun 29, 2024

I rebased this PR to resolve conflicts and to start getting to the issues that have been reported:

  • rename IsExpression to IsNotDistinctFromExpression or IsDistinctFromExpression; I would like to keep the equality case as base, but I am afraid that IsNotDistinctFromExpression is complex (IsDistinctFromExpression has the disadvantage that it is the not-equal)
  • add tests to ensure that there are no regressions in the null rewrite
    added in Simplify and improve null semantics rewrites #34072
  • check precedence of IS DISTINCT FROM in SqlServer
  • remove bad translations of nullable bool? comparisons in SqlServer Implement IS [NOT] DISTINCT FROM translation #34048 (comment)
  • improve comparisons of nullable bool? by taking advantage of SQL Server: Use XOR to translate some NOT expressions #34080 (optional, desirable)
  • get the PR to pass the tests in CI
  • introduce a reusable abstraction for providers (ApplyEqualityNullSemantics; might need some more polishing)
  • add an AppContext switch to opt out of the translation

@ranma42 ranma42 force-pushed the is-not-distinct-from branch from ef1ea97 to d0edd89 Compare June 30, 2024 10:56
@roji
Copy link
Member

roji commented Jun 30, 2024

I would like to keep the equality case as base, but I am afraid that IsNotDistinctFromExpression is complex (IsDistinctFromExpression has the disadvantage that it is the not-equal)

Yeah, the negated logic here is unfortunate. There's also the option of having two distinct expression types, though that would make it more complicated to handled in visitors needing to identify them.

@ranma42 ranma42 force-pushed the is-not-distinct-from branch from 29dab6c to 619058a Compare December 23, 2024 17:02
@ranma42
Copy link
Contributor Author

ranma42 commented Dec 23, 2024

Rebased to solve conflicts and keep an eye on SQL Server updates in CI 😉

This is currently on top of #34166 which affects the translation of boolean comparisons

@ranma42 ranma42 force-pushed the is-not-distinct-from branch from 619058a to 18bdea8 Compare December 23, 2024 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants