Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate strings instead of Roslyn SyntaxNodes in interop source generators #95882

Open
jkoritzinsky opened this issue Dec 12, 2023 · 4 comments
Labels
area-System.Runtime.InteropServices source-generator Indicates an issue with a source generator feature
Milestone

Comments

@jkoritzinsky
Copy link
Member

Since the start of the interop source generators, we've always decided to use Roslyn SyntaxNodes as both our source-of-truth as well as our go-to emit strategy. This choice has provided us a number of benefits:

  • C# constructs are represented the same way as the C# compiler represents them.
  • We can easily define an API surface that specifies "this API returns a name/expression/statement" without building our own type hierarchy.
  • We can copy tokens from the source of truth to our emitted code.
  • We never try emitting code that is only understood by Roslyn compilers newer than we can safely confirm we'll run against as the APIs don't exist.
  • We could reuse the code in a code-fix to inline the source-generated code.

However, it has also lead to a number of problems:

  • The SyntaxFactory APIs are very verbose. We've basically had to add comments above each block to explain what the code is doing.
  • NormalizeWhitespace (what we use to handle adding whitespace) is extremely slow.
  • We have at times accidentally added additional GC roots to the SyntaxNodes in the source SyntaxTrees due to how we structure some of our internal data types.
  • The source generator API surface requires us to pass strings to Roslyn, so we need to go from SyntaxNode to string for Roslyn to parse anyway.
  • We have a few places where we produce invalid syntax nodes because writing out the truly accurate node is a pain (in particular type names).
  • We never did the code-fix reuse idea and I doubt we'll ever do so (and we can't without additional work due to the above item).

Additionally, the Roslyn team recommends using a string-based approach.

Finally, @Sergio0694 updated ComputeSharp to use an "indented text writer" approach and saw a ~460x performance improvement in CPU time and a ~36x Memory footprint improvement. Even if we only get a fraction of that improvement due to our desire to have an extensibility model, doing the work to rewrite the APIs seems worthwhile.

Additionally, with that amount of improvement, we may also see build-time improvements for assemblies with lots of LibraryImports, such as CoreLib.

@jkoritzinsky jkoritzinsky added area-System.Runtime.InteropServices source-generator Indicates an issue with a source generator feature labels Dec 12, 2023
@ghost
Copy link

ghost commented Dec 12, 2023

Tagging subscribers to this area: @dotnet/interop-contrib
See info in area-owners.md if you want to be subscribed.

Issue Details

Since the start of the interop source generators, we've always decided to use Roslyn SyntaxNodes as both our source-of-truth as well as our go-to emit strategy. This choice has provided us a number of benefits:

  • C# constructs are represented the same way as the C# compiler represents them.
  • We can easily define an API surface that specifies "this API returns a name/expression/statement" without building our own type hierarchy.
  • We can copy tokens from the source of truth to our emitted code.
  • We never try emitting code that is only understood by Roslyn compilers newer than we can safely confirm we'll run against as the APIs don't exist.
  • We could reuse the code in a code-fix to inline the source-generated code.

However, it has also lead to a number of problems:

  • The SyntaxFactory APIs are very verbose. We've basically had to add comments above each block to explain what the code is doing.
  • NormalizeWhitespace (what we use to handle adding whitespace) is extremely slow.
  • We have at times accidentally added additional GC roots to the SyntaxNodes in the source SyntaxTrees due to how we structure some of our internal data types.
  • The source generator API surface requires us to pass strings to Roslyn, so we need to go from SyntaxNode to string for Roslyn to parse anyway.
  • We have a few places where we produce invalid syntax nodes because writing out the truly accurate node is a pain (in particular type names).
  • We never did the code-fix reuse idea and I doubt we'll ever do so (and we can't without additional work due to the above item).

Additionally, the Roslyn team recommends using a string-based approach.

Finally, @Sergio0694 updated ComputeSharp to use an "indented text writer" approach and saw a ~460x performance improvement in CPU time and a ~36x Memory footprint improvement. Even if we only get a fraction of that improvement due to our desire to have an extensibility model, doing the work to rewrite the APIs seems worthwhile.

Additionally, with that amount of improvement, we may also see build-time improvements for assemblies with lots of LibraryImports, such as CoreLib.

Author: jkoritzinsky
Assignees: -
Labels:

area-System.Runtime.InteropServices, source-generator

Milestone: -

@Sergio0694
Copy link
Contributor

Love it! For context, here's my IndentedTextWriter, if you want to reuse or adapt this for the interop generators. It also has a couple of interpolated string handlers included (I'd imagine you already have polyfills for the necessary types, but if not you can either just link them from corelib or perhaps use PolySharp) 🙂

@stephentoub
Copy link
Member

FWIW, the regex generator has been using the built-in IndentedTextWriter and it's worked fine.

var writer = new IndentedTextWriter(sw);

@huoyaoyuan
Copy link
Member

In case if anyone is unaware, there is also an indented string builder proposed at roslyn side: dotnet/roslyn#71162

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.Runtime.InteropServices source-generator Indicates an issue with a source generator feature
Projects
Status: No status
Development

No branches or pull requests

5 participants