Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement nullability computation for SqlFunctionExpression #33814

Merged
merged 2 commits into from
May 29, 2024

Conversation

ranma42
Copy link
Contributor

@ranma42 ranma42 commented May 25, 2024

The current implementation of SqlNullabilityProcessor.VisitSqlFunction() does not take into account the nullability of the function arguments (or the target instance) nor whether they propagate nulls to the function result.

This causes several additional null checks, which are not actually needed.

The computation is modeled after ProcessNullNotNull, specifically:

if (!sqlFunctionExpression.IsNullable)
{
// when we know that function can't be nullable:
// non_nullable_function() is null-> false
// non_nullable_function() is not null -> true
return _sqlExpressionFactory.Constant(
sqlUnaryExpression.OperatorType == ExpressionType.NotEqual,
sqlUnaryExpression.TypeMapping);
}
// see if we can derive function nullability from it's instance and/or arguments
// rather than evaluating nullability of the entire function
var nullabilityPropagationElements = new List<SqlExpression>();
if (sqlFunctionExpression is { Instance: not null, InstancePropagatesNullability: true })
{
nullabilityPropagationElements.Add(sqlFunctionExpression.Instance);
}
if (!sqlFunctionExpression.IsNiladic)
{
for (var i = 0; i < sqlFunctionExpression.Arguments.Count; i++)
{
if (sqlFunctionExpression.ArgumentsPropagateNullability[i])
{
nullabilityPropagationElements.Add(sqlFunctionExpression.Arguments[i]);
}
}
}
// function(a, b) IS NULL -> a IS NULL || b IS NULL
// function(a, b) IS NOT NULL -> a IS NOT NULL && b IS NOT NULL
if (nullabilityPropagationElements.Count > 0)
{
var result = nullabilityPropagationElements
.Select(
e => ProcessNullNotNull(
_sqlExpressionFactory.MakeUnary(
sqlUnaryExpression.OperatorType,
e,
sqlUnaryExpression.Type,
sqlUnaryExpression.TypeMapping)!,
operandNullable))
.Aggregate(
(r, e) => SimplifyLogicalSqlBinaryExpression(
sqlUnaryExpression.OperatorType == ExpressionType.Equal
? _sqlExpressionFactory.OrElse(r, e)
: _sqlExpressionFactory.AndAlso(r, e)));
return result;
}
}

@ranma42 ranma42 changed the title Implement nullability computation for SqlFunction Implement nullability computation for SqlFunctionExpression May 25, 2024
@maumar
Copy link
Contributor

maumar commented May 28, 2024

@ranma42 I really like this improvement. However, there is a slight change in the semantics of nullability propagation. The original intent for this feature was to improve null check sql (IS NULL / IS NOT NULL), where we can. Here is the original issue: #18555 and pr: #19607

The idea is that argument propagates nullability should be set to true iff (if and only if) the function can result in null based on it's argument being null (which is vast majority of sql functions). However, there is a possibility that a function can be null even if all of the arguments are not null - a good example are some of the JSON functions like JSON_VALUE. Also, users can create their own custom functions that have this property. In those cases function propagates nullability should be set to false for all the arguments, which means that we can't take any shortcuts.
This was originally improvement for our own functions and only later extended to user created functions, but that idea wasn't well documented/communicated.

If I read the code correctly, the new implementation always takes the argument propagates nullability value into account, so a function which set them all to false will be marked as non-nullable, which is not always correct. Instead, we should check if at least one of the argument propagates nullability, or instance propagates nullability is set to true, and only if that's the case, apply the logic that was added. Otherwise we can't rely on the optimization and we should just use function.IsNullable. See how things are done in ProcessNullNotNull

In hindsight, what we probably should have done is to add a property like useNullPropagationInformation, which would allow us to document/communicate this properly to users, rather than rely on this obscure pattern. Alas, to late to change now.

@ranma42
Copy link
Contributor Author

ranma42 commented May 29, 2024

@ranma42 I really like this improvement. However, there is a slight change in the semantics of nullability propagation. The original intent for this feature was to improve null check sql (IS NULL / IS NOT NULL), where we can. Here is the original issue: #18555 and pr: #19607

The idea is that argument propagates nullability should be set to true iff (if and only if) the function can result in null based on it's argument being null (which is vast majority of sql functions). However, there is a possibility that a function can be null even if all of the arguments are not null - a good example are some of the JSON functions like JSON_VALUE. Also, users can create their own custom functions that have this property. In those cases function propagates nullability should be set to false for all the arguments, which means that we can't take any shortcuts. This was originally improvement for our own functions and only later extended to user created functions, but that idea wasn't well documented/communicated.

Sorry, I was not aware of this and I ended up replicating only a part of the ProcessNullNotNull.
After learning about this, I managed to find the note here:

This optimization should only be used if the function can only return null when it's parameters are null.

I guess that the trick in this case is that Microsoft.EntityFrameworkCore.Metadata.Internal.DbFunctionParameter._propagatesNullability starts as false, so if the user never invokes PropagatesNullability() all of them end up being false, hence setting up the state you described.

Note that this behavior is not explained in the documentation of SqlFunctionExpression.
Would it make sense to open a (separate) PR to explain the behavior of the nullability propagation arguments?

If I read the code correctly, the new implementation always takes the argument propagates nullability value into account, so a function which set them all to false will be marked as non-nullable, which is not always correct. Instead, we should check if at least one of the argument propagates nullability, or instance propagates nullability is set to true, and only if that's the case, apply the logic that was added. Otherwise we can't rely on the optimization and we should just use function.IsNullable. See how things are done in ProcessNullNotNull

Ah, I see, that's the purpose of the if (nullabilityPropagationElements.Count > 0)

This means that the semantics changes radically. IIUC the semantics are:

  • if no argument (nor the instance) is set to propagate nullability, just rely on the IsNullable flag
  • otherwise, compute nullability mostly as I currently did

I will update the PR shortly to match this.

In hindsight, what we probably should have done is to add a property like useNullPropagationInformation, which would allow us to document/communicate this properly to users, rather than rely on this obscure pattern. Alas, to late to change now.

If this interface was to be redesigned, there are also other cases where it might be interesting to have a richer semantics (for example a UDF with COALESCE-like behavior).

AFAICT it might even be possible to extend this part of the SqlFunctionExpression in the future:

  • extending the public constructors should trivially be backward-compatible
  • making the semantics richer by adding more complex nullability information might require changes to providers/libraries that want to take advantage of that; as long as the members that are currently available are present, it should not break (and ideally not even regress) the consumers of the API

While this might be an interesting research direction (it would ideally remove the special cases for COALESCE etc... eventually it might even make it possible to unify the nullability computation for all operations, by simply interpreting them as "function" with appropriate arguments and nullability propagation settings), I believe that such an endeavor would require a thorough design.

@maumar
Copy link
Contributor

maumar commented May 29, 2024

Note that this behavior is not explained in the documentation of SqlFunctionExpression. Would it make sense to open a (separate) PR to explain the behavior of the nullability propagation arguments?

Good idea, filed #33833 to track this.

This means that the semantics changes radically. IIUC the semantics are:

  • if no argument (nor the instance) is set to propagate nullability, just rely on the IsNullable flag
  • otherwise, compute nullability mostly as I currently did

That's exactly the intended logic. Note also that it gives an uncertain user "a way out" - if all of them are set to false we just take the nullability of SqlFunction, we may produce less efficient sql but it will always be correct. Wheras in the previous design user had to correctly specify all the nullability propagation elements or risk data corruption in some cases.

If this interface was to be redesigned, there are also other cases where it might be interesting to have a richer semantics (for example a UDF with COALESCE-like behavior).

AFAICT it might even be possible to extend this part of the SqlFunctionExpression in the future:

  • extending the public constructors should trivially be backward-compatible
  • making the semantics richer by adding more complex nullability information might require changes to providers/libraries that want to take advantage of that; as long as the members that are currently available are present, it should not break (and ideally not even regress) the consumers of the API

While this might be an interesting research direction (it would ideally remove the special cases for COALESCE etc... eventually it might even make it possible to unify the nullability computation for all operations, by simply interpreting them as "function" with appropriate arguments and nullability propagation settings), I believe that such an endeavor would require a thorough design.

Yeah, I think it's currently not worth the effort needed to get this right. Current design should be ok, especially once we document it bit better

@ranma42 ranma42 force-pushed the compute-func-nullability branch from 5474977 to 9e0f48e Compare May 29, 2024 06:31
@ranma42
Copy link
Contributor Author

ranma42 commented May 29, 2024

I pushed an updated branch and kept the original one for ease of comparison.
As shown in https://github.com/ranma42/efcore/compare/compute-func-nullability-bad..compute-func-nullability the main differences in tests are:

  • MAX is now correctly treated as nullable even when aggregating over non-nullable expressions 👍 (NULL happens when the collection is empty)
  • Compare_function_without_null_propagation_to_null now retains the original NULL check

I had to cherry-pick ranma42@1ff7042 from #33814 , otherwise the nullability computation would fail upon the out-of-range access.

@ranma42 ranma42 force-pushed the compute-func-nullability branch from 9e0f48e to f24ca35 Compare May 29, 2024 17:50
@ranma42
Copy link
Contributor Author

ranma42 commented May 29, 2024

rebased on top of the current main, to include the nullability arity fixes (which are required to avoid failing the evaluation of nullabilities, specifically for abs and rtrim)

@maumar maumar merged commit a4dbfb5 into dotnet:main May 29, 2024
7 checks passed
@maumar
Copy link
Contributor

maumar commented May 29, 2024

another great contribution, thanks! @ranma42

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants