Converting StrongInject to incremental generators #59105
Replies: 3 comments 19 replies
-
This absolutely must not happen.The compiler intends to cache source generator outputs across passes of the command line compiler. The approach seen here would reuse stale (and potentially invalid) outputs from a previous run of the compiler, producing unspecified behavior of the compiler and the resulting assembly. I have a longer reply for the other items separately. |
Beta Was this translation helpful? Give feedback.
-
The primary problem with the above presentation is the manner in which the problem was approached. The approach I've been recommending for incremental source generators is the following:
This approach has many benefits in practice. It's easy to see that it completely eliminates the original problem where the compilation changing leads to re-running the output step (the output generation now only runs when the model changes, which means the output itself may have changed). It also allows the generator author to optimize the incremental data extraction over time (e.g. adopting future compiler APIs for more efficient semantic filtering) without requiring changes to the source generation step itself. In practice, this approach has proven relatively straightforward for new source generators. It's much more complex for converting an existing |
Beta Was this translation helpful? Give feedback.
-
Tagging @jasonmalinowski and @chsienki here. I think IGs were a good start and certainly suitable for certain classes of generators. However, its (understandable) restrictions really make it non-suitable precisely for the types of expensive generators we would want it to be most viable for. Yair isn't doing something strange or undesirable. Indeed, i think StrongInject is a poster child for a great place to do some very interesting generation work. Unfortunately, the current design of generators (incremental or otherwise) seem to make it nigh impossible to get good perf and not cause huge complexity in the generator itself. I would like us all to brainstorm what is possible here, and what we think may be acceptable to expose if we think have a desire for highly powerful generators like this to be possible without huge complexity for the author, and with great perf for the user. Importantly, i think this means brainstorming new options. The current IG system is good at some thigns, but has shortcomings elsewhere. Unless there is an amazing idea on how stronginject could work with those shortcomings, we should focus on what we can do to make generators like that viable in a way we feel is sensible. |
Beta Was this translation helpful? Give feedback.
-
I am the author of StrongInject, a source generator which provides a compile time IOC container.
It is one of the most popular SourceGenerators, and also one of the largest, and most production ready. As such it serves as a useful testcase for the practicality of IncrementalGenerators.
@sharwell reported in YairHalberstadt/stronginject#179 that StrongInject was causing performance issues in @jnm2's solution in VS. This is because running StrongInject took a long time on this particular very large project, and it was rerunning on every single edit. @sharwell suggested moving to Incremental Generators to alleviate this issue.
This discussion serves as an overview of the challenges involved in this, the suggested paths I looked into, the current approach I took, my future ideas for how to improve this, and a plea for suggestions as to how I can do better, possibly with the help of additional APIs provided by roslyn.
StrongInject overview
Every time StrongInject runs it runs the following stages (high level, and details not one hundred percent accurate):
Dictionary<ITypeSymbol, Registration>
A module might look like this:
A Container might look like this:
Stronginject would then generate a graph that represented something like this:
And lower that into code:
Investigated incremental approaches
Running stages 3 and 4 incrementally seems particularly tricky, since small changes to registrations can cause large changes to the resolution graph.
Instead my approach focused on reducing how often stages 3 and 4 have to run at all, by only rerunning when relevant changes were made to registrations or the registered types.
1. Check for changes in the
Dictionary<ITypeSymbol, Registration>
This approach runs stages 1 and 2 on every edit, and checks if the output of stage 2 has changed at all. If not it doesn't need to rerun stages 3 and 4.
pseudocode:
Unfortunately this approach is dead in the water because Symbols cannot be compared between compilations.
Also calculating the Dictionary is still somewhat expensive and doing that on every change is undesirable.
There are a couple of related approaches that solve the problem that symbols cannot be compared between compilations:
1.1 Convert symbols to custom StrongInject specific types.
Instead of working with symbols directly, we could convert the symbols to some stronginject specific type, which just stored the information necessary to be able to generate code, and could be compared across compilations.
E.g.
This unfortunately suffers from a number of issues.
The main problem is that the amount of information required is actually quite high. StrongInject needs to know:
This leads to a number of issues:
class A : IEquatable<A>
).It also leads to the problem that I can't use roslyn APIs any more. For example, I need to be able to check if
List<string>
can be cast toIEnumerable<object>
. Reimplementing this code forStrongInjectTypeSymbol
s is error prone.1.2 Use a custom comparer to compare the symbols across compilations
We could use a custom comparer to check whether any changes have been made to the symbols in the Dictionary that StrongInject cares about, and continue using the actual roslyn symbols in future steps.
This alleviates problems 1 and 2, and mitigates problem 4.
Comparing the symbols for changes is still complex, and still leads to the risk of infinite recursion.
This also creates some new problems:
2. Look for changes in classes that may be relevant.
Instead of building up a
Dictionary<ITypeSymbol, Registration>
on every single change, what if instead we find a way to cheaply look for any changes that might be relevant to StrongInject, ideally in an incremental fashion. Once we know that somethings changed, we find a way to to rerun all 4 changes from scratch, and simply discard any of the information we used to find out what changed.Ideally we want to have an API that looks something like this:
The advantage of this is we don't need to tie how we check for changes to how we run the generator afterwards. This makes it easier for us to design change checking to be incremental, and also means we don't need to calculate and cache all the information we might possibly need in the future. We only need to store information relevant to checking whether any relevant changes have in fact occurred.
To do this we need to check whether any of the following has changed:
Checking for any changes to a module or container is easy - checking whether the SyntaxTree has changed is a good way to do this.
In order to do number 2 efficiently we want to restrict ourselves to only checking syntactic changes as much as possible. Purely syntactic checks only need to rerun when their SyntaxTree changes, whilst semantic checks need to rerun on every change.
The ideal algorithm would look something like this:
Doing this in the current API is tricky:
I'll show you how I do the magic in the section below on the current solution.
CreateDataClassContainingOnlyThingsWeCareAboutIfTheyveChanged has many of the same problems as earlier solutions:
Checking whether there's changes to type declarations in MetadataReferences is easy when the MetadataReference is a PortableExecutableReference reference or similar - these change infrequently, so it's fine to rerun whenever they change, So we can store a list of MetadataIDs and rerun if those ever change.
What's more complec is checking whether a CompilationReference has changed. These change all the time. What we will have to do is again store a list of all trees in the referenced compilation, and rerun whenever there's any relevant changes to referenced compilations too.
Current Solution
The current solution is similar to solution 2. but sacrifices correctness for the purpose of simplicity.
StrongInject will only ever report diagnostics on the module or container classes. So even though changes to other classes may effect those diagnostics, the use probably doesn't mind if the diagnostics don't immediately update if they're not focused on the module/container files.
As a proxy for whether the user is focused on the module/container files, StrongInject checks whether any of those files have been edited, and only reruns if so.
This solution aims to create a reasonable user experience at the sacrifice of correctness. For example imagine StrongInject reported an error because a user registered
Service
asIService
butService
doesn't implementIService
. Then if the user fixed that StrongInject wouldn't update the diagnostic until the user edited the module where it was registered.This is okish in an IDE, but will lead to problems in hot reload for example. This always works in actual builds since builds are never incremental - at least for now.
Tricking the compiler
Generating StrongInject code requires the use of a compilation. And if our code relies on a Compilation, the IncrementalGenerator API will rerun it on every single change.
In order to avoid this happening we use this lovely hack:
CompilationWrapper is a type defined here.
It stores a Compilation, and implements IEqualityComparer. Equals always returns true, ensuring that changes to the compilation won't cause the generator to rerun. However it's important to always run with the latest compilation when the generator needs to rerun for another reason, so on every equals comparison it swaps out the compilation of whichever CompilationWrapper is earlier with the compilation from the later CompilationWrapper.
This seems to work well but is of course risky. For example it could theoretically fail in a concurrent environment, where a CompilationWrapper might be created for an earlier Compilation after a CompilationWrapper is created for a later Compilation, leading to us swapping out the later compilation for the earlier compilation on an Equality comparison, rather than the other way round.
Desired API
It would be useful if Roslyn provided a way to separate out the question of whether a generator should rerun, from what it uses when it runs. This does require trusting the user to get this right, unlike the current API which is correct by design, but I think is necessary for implementing something like StrongInject incrementally in an efficient fashion.
Conclusion
I hope it's clear from this:
If any of that isn't clear, I'm happy to clarify further.
I would also love to hear any suggestions you have for alternative designs to run StrongInject incrementally!
Thanks!
cc @CyrusNajmabadi
Beta Was this translation helpful? Give feedback.
All reactions