Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify the semantic model caching scheme we use to reuse semantic models during typing. #45565

Merged
merged 52 commits into from
Jul 7, 2020

Conversation

CyrusNajmabadi
Copy link
Member

@CyrusNajmabadi CyrusNajmabadi commented Jun 30, 2020

This PR will be easier to review one commit at a time.

--

The new model is drastically simpler than the old one, and bases its implementation on the observation that the majority of typing tends to be localized. i.e. users do not type one char, then jump elsewhere and type another thing, then jump elsewhere and type another thing**. Instead, they will have a batch of a typing in one location, will routinely edit in a single method body, and will eventually go elsewhere to do their next bit of work.

Given that, the semantic model cache we have is simple: We start by retrieving a real semantic-model S for a document at time T. This is used normally. When an edit happens and all the edits that happened between T and now were intra-method edits, then we use S to retrieve a speculative model S' that can be used to answer questions that typing features need.

In a similar vein, because many features (like all the completion providers) will light up at teh same time asking for the same semantic model, we cache S' as well, returning it if no edits have occurred.

This PR will be easier to review one commit at a time.

--

**
Note: that jumping/typing pattern does occur when cycling through things and potentially replacing matches. However, even for that case, we should be no worse than we were before.

@CyrusNajmabadi CyrusNajmabadi force-pushed the semanticModelWork branch 2 times, most recently from a4fa12f to 27baafe Compare July 1, 2020 00:37
@CyrusNajmabadi CyrusNajmabadi changed the title Semantic model work Simplify the semantic model caching scheme we use to reuse semantic models during typing. Jul 1, 2020
@@ -9,13 +9,13 @@
using Microsoft.CodeAnalysis.Host.Mef;
using Microsoft.CodeAnalysis.Shared.Extensions;

namespace Microsoft.CodeAnalysis.SemanticModelWorkspaceService
namespace Microsoft.CodeAnalysis.SemanticModelReuse
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed namespace and types to be clearer about what purpose they serve.

Copy link
Member Author

@CyrusNajmabadi CyrusNajmabadi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c

@@ -978,71 +979,6 @@ public bool ContainsInMemberBody(SyntaxNode node, TextSpan span)
private static TextSpan GetBlockBodySpan(BlockSyntax body)
=> TextSpan.FromBounds(body.OpenBraceToken.Span.End, body.CloseBraceToken.SpanStart);

public int GetMethodLevelMemberId(SyntaxNode root, SyntaxNode node)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these members were errantly placed in ISyntaxFacts. i moved them to a dedicated service for semanticmodel reuse as that's all they're intended for.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll never complain about the number of methods of SyntaxFacts going down!

@CyrusNajmabadi CyrusNajmabadi marked this pull request as ready for review July 1, 2020 00:59
@CyrusNajmabadi CyrusNajmabadi requested a review from a team as a code owner July 1, 2020 00:59
namespace Microsoft.CodeAnalysis.CSharp.SemanticModelReuse
{
[ExportLanguageService(typeof(ISemanticModelReuseLanguageService), LanguageNames.CSharp), Shared]
internal class CSharpSemanticModelReuseLanguageService : ISemanticModelReuseLanguageService
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: there is a workspace-service and a lang-service. teh former is what is used for the basic semantic-model caching. The latter is waht it used for doing vb/c# specific work in that process.

{
internal static partial class SyntaxTreeExtensions
{
public static bool IsPreProcessorDirectiveContext(this SyntaxTree syntaxTree, int position, SyntaxToken preProcessorTokenOnLeftOfPosition, CancellationToken cancellationToken)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is just a move.

@@ -81,6 +81,7 @@ internal partial interface ISyntaxFacts
/// preprocessor directive. For example `if` or `pragma`.
/// </summary>
bool IsPreprocessorKeyword(SyntaxToken token);
bool IsPreProcessorDirectiveContext(SyntaxTree syntaxTree, int position, CancellationToken cancellationToken);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved this from ISemanticFacts. This is not a semantic fact. it's a syntax fact.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, the inconsistency between Preprocessor and PreProcessor bugs me. but i'm not changing that now.

@CyrusNajmabadi
Copy link
Member Author

@jasonmalinowski ptal. thanks!

Copy link
Member

@jasonmalinowski jasonmalinowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks overall good, I do have one general concern still about the perf impacts of this. This assumes that some feature already primed the the cache first with the non-speculative model, but is that actually happening in practice? Consider:

  1. Open a file.
  2. We produce a semantic model for classification since it runs. But classification doesn't prime the cache since it isn't calling into this.
  3. First keystroke happens in a method. Completion is calling this, but this is the call that actually primes the cache. As a side effect, we produced a new semantic model from scratch for this request.
  4. Second keystroke happens. Classification and everything else is still using the full model, and completion is just filtering our existing data. It's not until this commits and a second completion comes up do we get a benefit.

What might be useful even if this isn't a concern: have callers to this API pass in their feature name, and just record a per-feature cache hit/miss rate. Then it'd be easy to see that this is actually working at all, and if so which feature is actually getting the win. I could totally imagine something might happen awkwardly that completion is still saddled with the full request of having to make a new semantic model, but hey, signature help (which isn't as critical since it's not on a blocking path) gets the benefit.

{
public Task<SemanticModel> GetSemanticModelForNodeAsync(Document document, SyntaxNode node, CancellationToken cancellationToken = default)
public SemanticModelReuseWorkspaceService(Workspace _)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the constructor that isn't doing anything with this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(If I'm guessing it's because you want to imply this service should be per workspace due to event subscriptions but that probably just deserves a comment than me guessing.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basically, we have multiple impls of this. one for the codefix layer, and one for the other layers. the shared code just unilaterally calls new SemanticModelReuseWorkspaceService(worjkspace). So i need the codefix impl to take in a workspace, even if it doesn't use it.

return null;
}

return previousAccessors[currentAccessors.IndexOf(currentAccessor)];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential indexing violation of for some reason GetAccessors here if GetAccessors didn't return a set of accessors that contained currentAccessor. Not sure if there's some icky broken code cases but hopefully we don't get this far?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tehre should be no indexing violation. First, we guarantee we have the same count of accessors. Second, we're getting the index of currentAccessor within currentAccessors. The second part must succeed. i.e. it would/should be impossible to have an accessor not be within the accessor list it is contained in.

The debug-asserts are for parts of this code that i don't have full confidence are always certain. i.e. i'm not 100% certain that if "top level version" stays the same that all these correspondances hold. But i am 100% certain that an accessor must be in its containing accessor list, or else the wheels have fallen off the wagon entirely.

SyntaxNode? TryGetContainingMethodBodyForSpeculation(SyntaxNode node);

/// <summary>
/// Given a previous semantic model, and a method-eque node in the current tree for that same document, attempts
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Given a previous semantic model, and a method-eque node in the current tree for that same document, attempts
/// Given a previous semantic model, and a method-esque node in the current tree for that same document, attempts

return;

var solution = e.NewSolution;
foreach (var (docId, _) in map)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just do map.Keys? Or is this actually faster for some reason?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no. you're correct. i've been coding in go a little too much recently :-/

{
if (!solution.ContainsDocument(docId))
{
_semanticModelMap = ImmutableDictionary<DocumentId, SemanticModelReuseInfo?>.Empty;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this code is great, but it obviously looks like a terrifying race since this could potentially run well after a new solution has been added, and somebody has called ReuseExistingSpeculativeModelAsync with a document from that new solution. Given this is a cache and that's really rare, it's fine, but may be worth a comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and to be 100% clear I wouldn't try to address that race in any way, but this does stick out like somebody forgot to use an Interlocked...)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i will add comment.

var document = CreateDocument("", LanguageNames.CSharp);

// trying to get a model for null should return a non-speculative model
var model = await document.ReuseExistingSpeculativeModelAsync(null, CancellationToken.None);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: named parameter. (Or add @akhera99's feature to GitHub, whichever is easier.)

Assert.False(model2.IsSpeculativeSemanticModel);

// Should be the same models.
Assert.Equal(model1, model2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's Assert.Same() if you want to explicitly state object instances versus any (potential) Equals override.


// This should be able to get a speculative model using the original model we primed the cache with.
var model2 = await document2.ReuseExistingSpeculativeModelAsync(source.IndexOf("return"), CancellationToken.None);
Assert.True(model2.IsSpeculativeSemanticModel);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is ParentModel if you want to assert it's parented by the prior one, although I can imagine that might be a bit too much "asserting implementation details" if you want to stay away from that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also worth adding an assert that model2.SyntaxTree.ToString() gets you in-body-edited text back, just to make sure we didn't make a speculative model but it's somehow speculating on the wrong thing?


// This should be able to get a speculative model using the original model we primed the cache with.
var model3 = await document3.ReuseExistingSpeculativeModelAsync(source.IndexOf("return"), CancellationToken.None);
Assert.True(model3.IsSpeculativeSemanticModel);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto: ParentModel assertion? (Answer can also be "no" if you'd like.)

var tree1 = await document.GetSyntaxTreeAsync();
var basemethod1 = tree1.FindTokenOnLeftOfPosition(position, CancellationToken.None).GetAncestor<CSharp.Syntax.BaseMethodDeclarationSyntax>();

// Ensure we prime the reuse cache with the true semantic model.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will raise concern back on main thread.

@CyrusNajmabadi
Copy link
Member Author

Consider:

This analysis is spot on. However, it overly focuses just on the first edit in a file/body. In that case, it is correct that you pay the full initial cost of getting that primed semantic model. However, importantly, from that point on as long as you are editing within that body (and not doing anything to throw off the top-level-version info), then that primed semantic model will be used. I typed out several full statements in a method body and got a 100% hit rate on that semantic model for all edits post-priming. This is the value here of paying the price initially for the first semantic model (which we're always going to pay because at least someone is going to ask for semantics), but then being able to use it for a set of edits which is significant in terms of how people do code.

@CyrusNajmabadi CyrusNajmabadi merged commit 840dc9b into dotnet:master Jul 7, 2020
@ghost ghost added this to the Next milestone Jul 7, 2020
@CyrusNajmabadi CyrusNajmabadi deleted the semanticModelWork branch July 7, 2020 19:01
@CyrusNajmabadi
Copy link
Member Author

Thank you for the deep and thorough review @jasonmalinowski !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants