Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CASE WHEN expression labels evaluated eagerly instead of lazily #65

Open
dallmair opened this issue Aug 14, 2024 · 2 comments · May be fixed by #66
Open

CASE WHEN expression labels evaluated eagerly instead of lazily #65

dallmair opened this issue Aug 14, 2024 · 2 comments · May be fixed by #66

Comments

@dallmair
Copy link
Contributor

Background

We have a table similar to:

Type (string) Value (string)
text abc
date 07/27/2023 00:00:00

We're trying to transform the date value to a different format using NQuery with:

SELECT  Type
        , CASE WHEN Type = 'date' THEN FORMAT(TO_DATETIME(Value), 'yyyy-MM-dd') ELSE Value END AS Value
FROM    ourTable

Expected result

Returns table with date value transformed to 2023-07-27.

Actual result

Throws exception:

System.FormatException: The string 'abc' was not recognized as a valid DateTime. There is an unknown word starting at index '0'.
   at System.Convert.ToDateTime(String value, IFormatProvider provider)
   at System.String.System.IConvertible.ToDateTime(IFormatProvider provider)
   at System.Convert.ToDateTime(Object value, IFormatProvider provider)
   at NQuery.Symbols.BuiltInFunctions.ToDateTime(Object value) in .\nquery-vnext\src\NQuery\Symbols\BuiltInFunctions.cs:line 176
   at lambda_method11(Closure )
   at NQuery.Iterators.ComputeScalarIterator.Read() in .\nquery-vnext\src\NQuery\Iterators\ComputeScalarIterator.cs:line 41
   at NQuery.Iterators.ProjectionIterator.Read() in .\nquery-vnext\src\NQuery\Iterators\ProjectionIterator.cs:line 31
   at NQuery.QueryReader.Read() in .\nquery-vnext\src\NQuery\QueryReader.cs:line 43
   at NQuery.Data.QueryReaderExtensions.ExecuteDataTable(QueryReader queryReader) in .\nquery-vnext\src\NQuery.Data\QueryReaderExtensions.cs:line 21
   at NQueryViewer.MainWindow.<>c__DisplayClass15_0.<ExecuteQuery>b__0() in .\nquery-vnext\src\NQueryViewer\MainWindow.xaml.cs:line 143
   at System.Threading.Tasks.Task`1.InnerInvoke()
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location ---
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)
--- End of stack trace from previous location ---
   at NQueryViewer.MainWindow.ExecuteQuery() in .\nquery-vnext\src\NQueryViewer\MainWindow.xaml.cs:line 140

Analysis

The problem is caused by the caching of expression values in variables implemented in ExpressionBuilder.BuildCachedExpression. This code transforms expressions like a.b.c.Prop to

var temp1 = a.b;
var temp2 = temp1.c;
return temp2.Prop;

And it transforms our CASE WHEN Type = 'date' THEN FORMAT(TO_DATETIME(Value), 'yyyy-MM-dd') ELSE Value END to:

var temp1 = Type;
var temp2 = 'date';
var temp3 = Value;
var temp4 = TO_DATETIME(temp3);
var temp5 = 'yyyy-MM-dd';
return temp1 = temp2 ? FORMAT(temp4, temp5) : Value;

In words: It pulls calculations out of the CASE expression, changing the semantics (searched CASE expressions are short-circuited in SQL).

Open questions

As a little experiment, I removed caching by replacing all calls to BuildCachedExpression with BuildLiftedExpression. All unit tests are still green, but I'm unsure whether that is a good idea as I do not know:

  1. Why was the caching of expression values introduced?
  2. Does it have any measurable performance effect?
  3. Is it necessary for correctness in some cases I don't understand?
  4. Can caching be removed altogether? Or would this have negative effects, and it's necessary to disable it selectively to fix the CASE WHEN bug described here?

Highly appreciate any insights.

dallmair added a commit to dallmair/nquery-vnext that referenced this issue Aug 16, 2024
@dallmair
Copy link
Contributor Author

Thinking about this some more, the existing caching is necessary for correctness due to NULL propagation semantics. For example, A() == B() must be compiled to:

var temp1 = A();
var temp2 = B();
return temp1 == null || temp2 == null ? null : temp1 == temp2;

Without caching, the evaluation would happen twice, which is at best undesired and at worst incorrect (when the function has side effects).

Thus, it makes sense to keep the general approach, and fix the problem in the compilation of CASE WHEN, because that is the only expression in SQL that has lazy-evaluation semantics.

@dallmair dallmair linked a pull request Aug 16, 2024 that will close this issue
@terrajobst
Copy link
Owner

Hmm, that's odd:

  1. The case you described should also trigger for nested queries. For example (WHEN <condition> THEN <single-row-subselect>). If the condition is indicative of whether or not the subselect will produce more than one row then for correctness the THEN expression shouldn't be executed when WHEN is false. Your fix wouldn't address that. If that problem exists, we need a different fix. And if the problem doesn't exist, then whatever handles that should handle the case for scalars as well.
  2. I don't think we need/should need lambdas for lazy evaluation. That's a pretty heavy solution. Rather, code gen should be changed to be lazy without having to make it a lambda.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants