Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: provide trigger info to generators to enable more expensive workloads #51497

Closed
Sergio0694 opened this issue Feb 25, 2021 · 4 comments

Comments

@Sergio0694
Copy link
Contributor

Sergio0694 commented Feb 25, 2021

Overview

Let me start by saying that source generators are absolutely brilliant, they're super powerful and a lot of fun to use 🎉
This proposal is a follow up from a conversation in the C# Discord in #roslyn (as suggested by @CyrusNajmabadi) and it's about giving more info to source generators so that they can better organize their workflow, and in particular opt-in into more expensive code paths that could not be sustainable if the generator is just being invoked multiple times a second. This would be extremely useful in more advanced scenarios. Here's a complete breakthrough of the problem this would solve, and the proposed solution.

The problem

Source generators are by default being invoked multiple times a second by Roslyn, which makes their overhead pile up pretty fast especially when there's multiple generators in use in the project being worked on. Because of this, quoting Cyrus, they should "try to take 1ms or less". Which is absolutely fine in many common scenarios such as:

  • Generating automatic service registration for DI.
  • Generate automatic constructors or simple methods for some types.
  • Generating some auto-implemented classes (eg. refit).
  • Generating readonly UTF8 byte sequences from unicode string constants.
  • Etc.

In general, when the generator is doing some reasonable work within the tight constraints of the official guidelines and just generating some code, things are absolutely fine. But, what if a generator needs to do some more expensive work, say anything that takes >= 100ms? Right now that's simply not doable, because it would immediately cripple the entire IDE.

There is a big consideration here: not all work has to be done at every keystroke or cursor movement. Especially for more expensive work, it would often be absolutely fine to just have the generator run only before the project is actually being built (either from VS or from dotnet build, just in all cases where an assembly is being built). In these cases, a generator taking longer to do work would be perfectly acceptable as the UX wouldn't suffer much. After all, developers would have had to stop for a bit anyway to wait for the build to be finished before doing more work.

Yes, one could setup an MSBuild pre-build task to achieve something like this. But that has lots of downsides:

  • It's a whole other thing to setup other than a source generator.
  • They're... Super clunky. I'd much rather just stick to source generators and Roslyn APIs 😄
  • It doesn't offer the same easy access to the whole compilation, syntax tree and metadata.
  • What if you want to both do some expensive work and add some code to the compilation?
  • What if it's precisely that code being added to the compilation that is expensive to generate?

The proposed solution

We could solve this with minimal changes to the public API surface by adding something like this to GeneratorExecutionContext (possibly to GeneratorInitializationContext too, thought that may be not necessary):

namespace Microsoft.CodeAnalysis
{
    public readonly struct GeneratorExecutionContext
    {
        public ExecutionTriggerType ExecutionTrigger { get; }
    }

    public enum ExecutionTriggerType
    {
        SourceChanged,
        SelectionChanged,

        // ...
        // These are just some example names. The important
        // one is really just this one here (or equivalent):
        BuildStarted
    }
}

This single change would make source generators much more flexible and allow developers to:

  • Only execute their expensive built-time generators when code is actually being built. In all other cases, they could literally just do a single check on ExecutionTrigger and return immediately.
  • Have generators that can do both, with eg. an initial part that is executed always, and then a more expensive part that is only scheduled when a build is actually taking place. This approach makes it extremely easy to mix the two things and make changes.
  • Still reuse 100% of the existing infrastructure and setup. It's the same exact source generators we're already using, no need to delve into MSBuild tasks or anything like that.

One practical example: ComputeSharp

I've talked with a few developers in the C# Discord server about this and many have expressed interest for such a feature, as it would give them more freedom to do expensive operations in a generator that are not "dynamic", as in, they don't need to be run all the time, but just need to be executed "at least once" before the project is built. That is something that currently just cannot be expressed in a source generator, and that this proposal would solve.

To add to that, I want to make a practical example to show where/how this proposal could be useful. I have a project called ComputeSharp, which is a library that allows C# developers to execute code on the GPU. They simply write a compute shader in C#, and then the library transforms their code to HLSL, dynamically compiles a DX12 compute shader and runs it on the GPU. The general workflow is as follows:

  1. The user writes a compute shader (a simple struct type with an IComputeShader interface), in C#.
  2. My IComputeShaderSourceGenerator generator kicks in, performs C# -> HLSL transpilation (I'm sorry @333fred, I know 🤣), and generates some methods to invoke each shader type appropriately (they're defined here).
  3. The user launches the app and tries to execute a shader.
  4. ComputeSharp can access those generated methods through the type constraint on the input shader type, and uses them to grab the HLSL source, load up the DXC compiler and compile the shader to DXIL, as well as loading the dispatch data, etc.
  5. Then after that it's just a bunch of DX-related stuff and eventually the code actually runs on the GPU 🚀

This approach works great, but there's some obvious downsides:

  • I need to ship my library with 20MB worth of native dependencies for the DXC compiler and another related DLL.
  • The previous point also means I can only target x64 right now, as that's the only platform where DXC is available. Whereas the compiled DXIL binaries are platform agnostic, so eg. they could run on x86/ARM/ARM64 too.
  • I need to actually compile the shader at runtime every time a given application is launched. Shaders are cached, but there's still some pretty heavy overhead for the first launch (compiling a shader takes about ~100ms).

With the proposed solution, my IComputeShaderSourceGenerator generator could just do this:

  1. If the trigger says the user is doing stuff, just analyze the shader and emits all the diagnostics.
  2. If the project is actually being built, do 1) but then also load DXC and actually build all the shaders and embed the DXIL code directly into the consuming applications (as a span property, and array, etc.).

Hope this helps! 🙂

@dotnet-issue-labeler dotnet-issue-labeler bot added Area-Language Design untriaged Issues and PRs which have not yet been triaged by a lead labels Feb 25, 2021
@HurricanKai
Copy link
Member

For reference, we at Silk.NET would also benefit a lot from this.
Personally I would prefer a system similar to what analyzers have, but I expect this to be much simpler, and to cover the vast majority of cases! Thanks Sergio for actually taking the time to come up with a proposal 🙂

@jaredpar jaredpar added Feature Request New Feature - Source Generators Source Generators and removed untriaged Issues and PRs which have not yet been triaged by a lead labels Mar 3, 2021
@jaredpar jaredpar added this to the 16.10 milestone Mar 3, 2021
@jaredpar
Copy link
Member

jaredpar commented Mar 3, 2021

@chsienki @jasonmalinowski relevant to the discussions around Step based APIs

@jasonmalinowski
Copy link
Member

@chsienki Given all we've done for incremental APIs and then also the proposed API for implementation-only stuff, do we have anything more for this?

@Sergio0694
Copy link
Contributor Author

I would consider this issue resolved now. Incremental generators have addressed the issues surrounding performance hits with expensive generation, and further performance improvements for build-time specific steps are already possible with RegisterImplementationSourceOutput, and the possible mirror API RegisterReferenceSourceOutput (#57589), if needed. Special thanks again to @sharwell for all his help with suggestions and advices while rewriting all the source generators in ComputeSharp to be incremental 🙌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants