Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Handle first class spans #35279

Closed
wants to merge 3 commits into from

Conversation

ChrisJollyAU
Copy link
Contributor

@roji Following from the discussion in #35100 this is what I have for handling first class spans

This is directly handling the span stuff.

A couple of comments:

  1. Array types end up with being wrapped in a method call op_Implicit to convert to a Span
  2. Have to try jump over to its argument in the funcletizer to handle the parameter/argument etc. Need to leave implicit convert in place
  3. Translating visitors can effectively ignore/jump through the method call
  4. As seen in InvalidOperationException when LangVersion is set to preview #35100 Lambda .Compile, doesn't handle the interpretation with Span types
  5. Have to handle MemoryExtensions.Contains and other similar methods as well
  6. As the .Contains is not a Queryable or Enumerable, this isn't handled in the QueryableMethodTranslatingExpressionVisitor but in the RelationalSqlTranslatingExpressionVisitor
  7. From the above, handling primitive collections and the like also has to be handled in the Contains in RelationalSqlTranslatingExpressionVisitor. (Note: potential duplicate code for this and creating the OPENJSON)
  8. CI: Don't think the CI is using the new roslyn with this feature yet

Couple of things still to do

  1. Finish the Sql Server test (around 35 still failing but around half looks like just some parameter naming)
  2. Sqlite/InMemory/Cosmos tests
  3. Bit of tidying up
  4. Investigate deduplicating code for Contains and primitive collections
  5. Use new roslyn compiler

Posting this as a draft so I can get some feedback if this is the direction to go in. Don't want to have to redo too much if we go a different route

@ChrisJollyAU ChrisJollyAU mentioned this pull request Dec 12, 2024
6 tasks
@@ -45,7 +45,7 @@ public ByteArraySequenceEqualTranslator(ISqlExpressionFactory sqlExpressionFacto
}

if (method.IsGenericMethod
&& method.GetGenericMethodDefinition().Equals(MemoryExtensionsMethods.SequenceEqual)
&& MemoryExtensionsMethods.SequenceEqual.Contains(method.GetGenericMethodDefinition())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChrisJollyAU thanks for working on this, and sorry that I didn't have time to look earlier.

As per dotnet/runtime#109757 (comment) - which I hope you agree is a good approach to handle this - I think we'd be normalizing away method calls like MemoryExtensions.Contains - replacing them with corresponding non-Span calls - very early in the pipeline, in ExpressionTreeFuncletizer. If we go down this road, no later part of the query pipeline will ever see these MemoryExtensions methods.

Does that make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did see that. It is one way to do it and was the alternative I considered. Not sure what the performance cost is though of rewriting the query away from span (implicit) back to normal versus handling it direct.

Note that I actually have it that there is only interpretation being done with very minimal changes to the funcletizer. Nothing is getting the full compilation. It just gets directly picked in in the translator mostly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, take for example this

Span<string> data = new[] { "ALFKI" + "SomeVariable", "ANATR" + "SomeVariable", "ALFKI" + "X" };
var someVariable = "SomeVariable";

return AssertQuery(
    async,
    ss => ss.Set<Customer>().Where(c => data.Contains(c.CustomerID + someVariable)));

Currently this doesn't even compile (Expression tree cannot contain value of ref struct or restricted type) but should that support get added (that other thread has mentioned a couple of issues in order to get it working), rewriting it away would not be able to work. This handling it direct in the translator side would handle it (without much change if any I think)

Copy link
Member

@roji roji Dec 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what the performance cost is though of rewriting the query away from span (implicit) back to normal versus handling it direct.

What I had in mind would be to do the rewriting as part of the funcletization pass, rather than adding an additional pass just for that; if we do it that way, the perf impact should be completely negligible, I think.

But more important: when the funcletizer identifies a tree fragment that can be client-evaluated, it does that and embeds the result either as a constant or as a parameter. If such an evaluatable tree fragment happens to contains a Contains, won't that now start throwing, since client evaluation involves going through the LINQ interpreter? Am I missing something here?

if so, then the funcletizer must do the substitution early (i.e. remove any Span-based method overloads), because it has to happen before the LINQ interpreter is possibly used.

Currently this doesn't even compile [...]

Right, but that seems pretty orthogonal to the discussion (and not necessarily extremely important). I indeed don't think the C# compiler will allow ref structs inside LINQ expression trees any time soon; that's also not a change from 9 to 10 - it has never been allowed, and as far as I know nobody has ever complained about it (there's no real reason to use a Span variable rather than an array when querying like this).

@ChrisJollyAU
Copy link
Contributor Author

@roji Meant to ask, whats the thoughts from the rest of the team?

@roji
Copy link
Member

roji commented Dec 16, 2024

@ChrisJollyAU on what specifically?

@ChrisJollyAU
Copy link
Contributor Author

@ChrisJollyAU on what specifically?

on which sort of direction they would go for (rewrite back to enumerable or handle the MemoryExtensions functions direct in the translator)?

@roji
Copy link
Member

roji commented Dec 16, 2024

Well, as I wrote above, unless my mental model is wrong, since the funcletizer does client-evaluation via the LINQ interpreter, and the latter doesn't support the ref struct MemoryExtensions functions, rewriting must be done in the funcletizer; or am I missing something?

@ChrisJollyAU
Copy link
Contributor Author

Well, as I wrote above, unless my mental model is wrong, since the funcletizer does client-evaluation via the LINQ interpreter, and the latter doesn't support the ref struct MemoryExtensions functions, rewriting must be done in the funcletizer; or am I missing something?

Only needs to be rewritten IF you plan to simplify that expression via the interpreter. Otherwise you can leave everything in place with no modifications until you get to the translator.

Probably best example is the Contains_over_concatenated_parameter_and_constant test

Currently when it hits the expression in the Where data.Contains(someVariable + "SomeConstant"), it can client evaluate that because it has both data and someVariable. This produces a true result and is able to be changed.
Thus the WHERE clause is WHERE @__Contains_0 = CAST(1 AS bit)
If you don't client evaluate it, it becomes similar to other queries (like the other test that uses a column reference in place of someVariable).
And you get

WHERE @__p_1 IN (
    SELECT [d].[value]
    FROM OPENJSON(@__data_0) WITH ([value] nvarchar(max) '$') AS [d]
)

Interestingly unless I'm missing something, this is the only test that has changed like this.

What I've got here on this PR, is literally a 1 line change (in EvaluatableExpressionFilter) and the whole funcletizer works 100% with no problems without a single change. It doesn't try to run any of the ref struct stuff through the interpreter

@roji
Copy link
Member

roji commented Dec 16, 2024

What I've got here on this PR, is literally a 1 line change (in EvaluatableExpressionFilter) and the whole funcletizer works 100% with no problems without a single change.

I see now, that's the part I was missing. It's definitely an interesting (and viable) approach; but here are some thoughts:

  • The funcletizer isn't the only place where EF uses the LINQ interpreter. For example, for EF 10 the current plan is to allow for dynamic queries under NativeAOT by using the LINQ interpreter to execute the shapers that EF generates (since compilation isn't possible with NativeAOT). At that point, if a ref struct overload is present in the shaper that would fail (e.g. a query ending with x.Contains(y) that for any reason cannot be translated to SQL and must be client-evaluated).
  • In this PR, the funcletizer indeed hasn't changed, but instead lots of other places need to change in order to recognize and translate the MemoryExtension call (which this PR does). Even if we don't want to normalize the call away in the funcletizer, then rather than forcing all providers to deal with the new overload everywhere, we'd probably want to normalize it away later in preprocessing - just like various other normalizations we perform (e.g. list.Any(x => x == y) gets normalized to list.Contains(y)).
  • Continuing on the above, identifying and replacing the MemoryExtension call in the funcletizer would involve a very targeted (and easy) change in exactly one place in the funcletizer, compared to the many changes in this PR; these changes also affect providers, meaning that with this approach, all providers must react, whereas with a funceltizer-based rewriting just one place needs to.
  • I agree that not client-evaluating Contains in the funcletizer (as per your approach) isn't the end of the world, but it isn't great either - that's an odd exception (and who knows what other Span-based overloads are coming).

Let me know what you think about the above. Regardless, I'll bring both these options up for discussion with the team.

@roji
Copy link
Member

roji commented Dec 17, 2024

@ChrisJollyAU see #35339 for what my proposed approach looks like.

@ChrisJollyAU
Copy link
Contributor Author

@ChrisJollyAU see #35339 for what my proposed approach looks like.

Yeah just looking at it now (once I got 10.0 sdk installed).

  • Continuing on the above, identifying and replacing the MemoryExtension call in the funcletizer would involve a very targeted (and easy) change in exactly one place in the funcletizer, compared to the many changes in this PR; these changes also affect providers, meaning that with this approach, all providers must react, whereas with a funceltizer-based rewriting just one place needs to.

I do see what you're wanting there. Ensure that the translators (and the specific provider related ones) only get one view of the expression that they need to handle

@roji
Copy link
Member

roji commented Dec 17, 2024

Yeah, exactly. In general, where there's no semantic difference between to incoming constructs (e.g. list.Any(x => x == y) and list.Contains(y)), we generally try to normalize such differences away in the preprocessing phase; that's a set of visitors that run for all EF providers, before translation, so they remove the burden of dealing with different construct variants from everyone.

It is worth keeping in mind, however, that some fragments may not get translated at all, but rather end up getting client-evaluated (part of an untranslatable top-level Select). Because of this, we must be careful with what we normailze, since sometimes two constructs that can't possibly be different when translated (e.g. to SQL) can definitely be different when executed locally; I was thinking about normalizing Equals to == (equality operator) in preprocessing a few months back, since there's no real meaning to the distinction in e.g. SQL, but that would alter the client-side semantics for top-level select etc.

Finally, this case with the ref structs is slightly different, since I'm proposing to do the "normalization" in the funcletizer, as opposed to in preprocessing: that's even earlier (since the funceltizer uses the LINQ interpreter). Doing anything in the funcletizer must be done with special care, since it runs before the query cache, and so runs on every query execution; in contrast, preprocessing happens after the query cache, and so only runs once, the first time a query shape is seen. However, the actual perf impact in this case (see #35339) should really be negligible - a single string comparison for most queries, and a tiny bit more for queries with Contains/SequenceEquals.

@roji
Copy link
Member

roji commented Dec 18, 2024

@ChrisJollyAU as #35339 has been merged, I'll go ahead and close this. Thanks for your work here and for the valuable conversation - it's still possible we'll do something along these lines in the future, and also if you see any further trouble with the approach in #35339 please let us know! I'll do my best to review your other PRs soon.

@roji roji closed this Dec 18, 2024
@ChrisJollyAU ChrisJollyAU deleted the firstclassspan branch December 19, 2024 06:06
@ChrisJollyAU
Copy link
Contributor Author

@roji Thanks. This ended up an interesting problem and even though we didn't go for this approach, I think it was still good for exploring the problem and seeing some of the pros, cons and limitations.

In the long term might only need this sort of thing if the add support for using a ref struct directly (like in the snippet from earlier). But from the other threads theres a lot of other work to be done before that could be supported.

@roji
Copy link
Member

roji commented Dec 19, 2024

@ChrisJollyAU yeah, absolutely - and thanks for the valuable discussion around all this.

I really doubt ref struct will end up being supported in expression trees any time soon - expression tree support for C# features has been stuck in a much earlier C# version; I'm also not convinced using typed Spans inside EF LINQ queries is super useful - but we can always cross that bridge when we get to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants