-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify the semantic model caching scheme we use to reuse semantic models during typing. #45565
Simplify the semantic model caching scheme we use to reuse semantic models during typing. #45565
Conversation
a4fa12f
to
27baafe
Compare
a3b4b04
to
499e554
Compare
@@ -9,13 +9,13 @@ | |||
using Microsoft.CodeAnalysis.Host.Mef; | |||
using Microsoft.CodeAnalysis.Shared.Extensions; | |||
|
|||
namespace Microsoft.CodeAnalysis.SemanticModelWorkspaceService | |||
namespace Microsoft.CodeAnalysis.SemanticModelReuse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed namespace and types to be clearer about what purpose they serve.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c
@@ -978,71 +979,6 @@ public bool ContainsInMemberBody(SyntaxNode node, TextSpan span) | |||
private static TextSpan GetBlockBodySpan(BlockSyntax body) | |||
=> TextSpan.FromBounds(body.OpenBraceToken.Span.End, body.CloseBraceToken.SpanStart); | |||
|
|||
public int GetMethodLevelMemberId(SyntaxNode root, SyntaxNode node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these members were errantly placed in ISyntaxFacts. i moved them to a dedicated service for semanticmodel reuse as that's all they're intended for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll never complain about the number of methods of SyntaxFacts going down!
namespace Microsoft.CodeAnalysis.CSharp.SemanticModelReuse | ||
{ | ||
[ExportLanguageService(typeof(ISemanticModelReuseLanguageService), LanguageNames.CSharp), Shared] | ||
internal class CSharpSemanticModelReuseLanguageService : ISemanticModelReuseLanguageService |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: there is a workspace-service and a lang-service. teh former is what is used for the basic semantic-model caching. The latter is waht it used for doing vb/c# specific work in that process.
...ble/SemanticModelReuse/SemanticModelWorkspaceServiceFactory.SemanticModelWorkspaceService.cs
Outdated
Show resolved
Hide resolved
…orkspaceServiceFactory.SemanticModelWorkspaceService.cs Co-authored-by: Sam Harwell <sam@tunnelvisionlabs.com>
{ | ||
internal static partial class SyntaxTreeExtensions | ||
{ | ||
public static bool IsPreProcessorDirectiveContext(this SyntaxTree syntaxTree, int position, SyntaxToken preProcessorTokenOnLeftOfPosition, CancellationToken cancellationToken) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is just a move.
@@ -81,6 +81,7 @@ internal partial interface ISyntaxFacts | |||
/// preprocessor directive. For example `if` or `pragma`. | |||
/// </summary> | |||
bool IsPreprocessorKeyword(SyntaxToken token); | |||
bool IsPreProcessorDirectiveContext(SyntaxTree syntaxTree, int position, CancellationToken cancellationToken); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved this from ISemanticFacts. This is not a semantic fact. it's a syntax fact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, the inconsistency between Preprocessor and PreProcessor bugs me. but i'm not changing that now.
@jasonmalinowski ptal. thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks overall good, I do have one general concern still about the perf impacts of this. This assumes that some feature already primed the the cache first with the non-speculative model, but is that actually happening in practice? Consider:
- Open a file.
- We produce a semantic model for classification since it runs. But classification doesn't prime the cache since it isn't calling into this.
- First keystroke happens in a method. Completion is calling this, but this is the call that actually primes the cache. As a side effect, we produced a new semantic model from scratch for this request.
- Second keystroke happens. Classification and everything else is still using the full model, and completion is just filtering our existing data. It's not until this commits and a second completion comes up do we get a benefit.
What might be useful even if this isn't a concern: have callers to this API pass in their feature name, and just record a per-feature cache hit/miss rate. Then it'd be easy to see that this is actually working at all, and if so which feature is actually getting the win. I could totally imagine something might happen awkwardly that completion is still saddled with the full request of having to make a new semantic model, but hey, signature help (which isn't as critical since it's not on a blocking path) gets the benefit.
{ | ||
public Task<SemanticModel> GetSemanticModelForNodeAsync(Document document, SyntaxNode node, CancellationToken cancellationToken = default) | ||
public SemanticModelReuseWorkspaceService(Workspace _) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the constructor that isn't doing anything with this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(If I'm guessing it's because you want to imply this service should be per workspace due to event subscriptions but that probably just deserves a comment than me guessing.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
basically, we have multiple impls of this. one for the codefix layer, and one for the other layers. the shared code just unilaterally calls new SemanticModelReuseWorkspaceService(worjkspace)
. So i need the codefix impl to take in a workspace, even if it doesn't use it.
return null; | ||
} | ||
|
||
return previousAccessors[currentAccessors.IndexOf(currentAccessor)]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential indexing violation of for some reason GetAccessors here if GetAccessors didn't return a set of accessors that contained currentAccessor. Not sure if there's some icky broken code cases but hopefully we don't get this far?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tehre should be no indexing violation. First, we guarantee we have the same count of accessors. Second, we're getting the index of currentAccessor within currentAccessors. The second part must succeed. i.e. it would/should be impossible to have an accessor not be within the accessor list it is contained in.
The debug-asserts are for parts of this code that i don't have full confidence are always certain. i.e. i'm not 100% certain that if "top level version" stays the same that all these correspondances hold. But i am 100% certain that an accessor must be in its containing accessor list, or else the wheels have fallen off the wagon entirely.
SyntaxNode? TryGetContainingMethodBodyForSpeculation(SyntaxNode node); | ||
|
||
/// <summary> | ||
/// Given a previous semantic model, and a method-eque node in the current tree for that same document, attempts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Given a previous semantic model, and a method-eque node in the current tree for that same document, attempts | |
/// Given a previous semantic model, and a method-esque node in the current tree for that same document, attempts |
return; | ||
|
||
var solution = e.NewSolution; | ||
foreach (var (docId, _) in map) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just do map.Keys? Or is this actually faster for some reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no. you're correct. i've been coding in go
a little too much recently :-/
{ | ||
if (!solution.ContainsDocument(docId)) | ||
{ | ||
_semanticModelMap = ImmutableDictionary<DocumentId, SemanticModelReuseInfo?>.Empty; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this code is great, but it obviously looks like a terrifying race since this could potentially run well after a new solution has been added, and somebody has called ReuseExistingSpeculativeModelAsync with a document from that new solution. Given this is a cache and that's really rare, it's fine, but may be worth a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(and to be 100% clear I wouldn't try to address that race in any way, but this does stick out like somebody forgot to use an Interlocked...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i will add comment.
var document = CreateDocument("", LanguageNames.CSharp); | ||
|
||
// trying to get a model for null should return a non-speculative model | ||
var model = await document.ReuseExistingSpeculativeModelAsync(null, CancellationToken.None); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: named parameter. (Or add @akhera99's feature to GitHub, whichever is easier.)
Assert.False(model2.IsSpeculativeSemanticModel); | ||
|
||
// Should be the same models. | ||
Assert.Equal(model1, model2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's Assert.Same() if you want to explicitly state object instances versus any (potential) Equals override.
|
||
// This should be able to get a speculative model using the original model we primed the cache with. | ||
var model2 = await document2.ReuseExistingSpeculativeModelAsync(source.IndexOf("return"), CancellationToken.None); | ||
Assert.True(model2.IsSpeculativeSemanticModel); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is ParentModel if you want to assert it's parented by the prior one, although I can imagine that might be a bit too much "asserting implementation details" if you want to stay away from that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also worth adding an assert that model2.SyntaxTree.ToString() gets you in-body-edited text back, just to make sure we didn't make a speculative model but it's somehow speculating on the wrong thing?
|
||
// This should be able to get a speculative model using the original model we primed the cache with. | ||
var model3 = await document3.ReuseExistingSpeculativeModelAsync(source.IndexOf("return"), CancellationToken.None); | ||
Assert.True(model3.IsSpeculativeSemanticModel); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto: ParentModel assertion? (Answer can also be "no" if you'd like.)
var tree1 = await document.GetSyntaxTreeAsync(); | ||
var basemethod1 = tree1.FindTokenOnLeftOfPosition(position, CancellationToken.None).GetAncestor<CSharp.Syntax.BaseMethodDeclarationSyntax>(); | ||
|
||
// Ensure we prime the reuse cache with the true semantic model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will raise concern back on main thread.
This analysis is spot on. However, it overly focuses just on the first edit in a file/body. In that case, it is correct that you pay the full initial cost of getting that primed semantic model. However, importantly, from that point on as long as you are editing within that body (and not doing anything to throw off the top-level-version info), then that primed semantic model will be used. I typed out several full statements in a method body and got a 100% hit rate on that semantic model for all edits post-priming. This is the value here of paying the price initially for the first semantic model (which we're always going to pay because at least someone is going to ask for semantics), but then being able to use it for a set of edits which is significant in terms of how people do code. |
Thank you for the deep and thorough review @jasonmalinowski ! |
This PR will be easier to review one commit at a time.
--
The new model is drastically simpler than the old one, and bases its implementation on the observation that the majority of typing tends to be localized. i.e. users do not type one char, then jump elsewhere and type another thing, then jump elsewhere and type another thing**. Instead, they will have a batch of a typing in one location, will routinely edit in a single method body, and will eventually go elsewhere to do their next bit of work.
Given that, the semantic model cache we have is simple: We start by retrieving a real semantic-model
S
for a document at timeT
. This is used normally. When an edit happens and all the edits that happened betweenT
and now were intra-method edits, then we useS
to retrieve a speculative modelS'
that can be used to answer questions that typing features need.In a similar vein, because many features (like all the completion providers) will light up at teh same time asking for the same semantic model, we cache
S'
as well, returning it if no edits have occurred.This PR will be easier to review one commit at a time.
--
**
Note: that jumping/typing pattern does occur when cycling through things and potentially replacing matches. However, even for that case, we should be no worse than we were before.