Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source Generator configuration via attributes instead of .json file #1736

Merged
merged 22 commits into from
Jan 22, 2024

Conversation

AArnott
Copy link
Collaborator

@AArnott AArnott commented Jan 14, 2024

  • Entirely remove support for configuring the analyzers or source generator via a MessagePackAnalyzer.json file.
  • All configuration is done via attributes. In particular:
    • Annotate a partial class with [GeneratedMessagePackResolverAttribute] to activate source generation of the resolver and formatters for all discoverable serializable types. This new attribute has several properties that may be used to customize code generation.
    • New assembly-level attributes:
      • [assembly: MessagePackKnownFormatterAttribute(Type)] points to implementations of IMessagePackFormatter<T> that should be included in the source generated resolver. This interface may be implemented more than once per class. All type arguments will be automatically considered to be serializable by the analyzer.
      • [assembly: MessagePackAssumedFormattableAttribute(Type)] points to types that should be assumed to have formatters defined (but may not be available to reference directly via MessagePackKnownFormatterAttribute). The formatters that justify this will have to be combined with the source generated resolver at runtime (possibly using the CompositeResolver class).
  • Some diagnostics (MsgPack003 and later) are no longer produced by a traditional analyzer, but rather by the 'source generator' (even when no source will be generated due to the lack of the GeneratedMessagePackResolverAttribute appearing anywhere in the compilation.
  • Formatters are generated as private, nested types within the source generated resolver instead of directly inside a user-specified namespace.
  • The two source generator packages (one targets unity's roslyn 3.8, one targets 4.4 for incremental source generator support) will be merged into one, following the pattern in Support multi-targeting for Roslyn components dotnet/sdk#20355 so that one package can host multiple roslyn-targeted builds for the analyzer/source generator.
  • The source generator nuget package introduced in the v2.6 prerelease will be discontinued and the source generator moved into the longer-lived analyzer package.
  • Notice when a MessagePackAnalyzer.json file exists and emit a warning advising users to visit a migration document online.

Remaining investigations

  • Correct error when two types in different namespaces but the same name both have formatters created for them.
  • Adjust source generator tests to parse with the C# language version that comes with the oldest unity version we intend to support.
  • Generate code that follows the FormatterCache<T> pattern.
  • Remove support for roslyn3.8
  • Move diagnostics reporting from the source generator back to the analyzer, per the roslyn team's recommendation

Potential directions

This PR does not do this, but a subsequent change could do it if there is interest:

  • The MessagePack.Annotations package may add a dependency onto the analyzers+source generator package so that analyzers are present and on by default for messagepack users, and source generation is ready to go when the user applies the source generation attribute.
    This may be a bit complicated to pull off (though, far less than this PR was) because it means either the MessagePack project will have to depend on the analyzer building package project (adding several projects to its build dependency tree) or we'll have to play tricks to add it as a nuget dependency without being a build dependency.

Closes #1691

The goal is to eliminate all dependencies on MSBuild properties (which Unity cannot provide) or AdditionalFiles (which MSBuild and Unity support but are a pain to use).
Instead, we'll rely on a partial class with an attribute to trigger source generation. "custom types" that used to be specified via the AdditionalFiles will now be specified via assembly-targeted attributes.
These support packages.config projects, which are very obsolete.
Also given where we're going with multiple roslyn target assemblies, they wouldn't work properly anyway.
@AArnott AArnott marked this pull request as ready for review January 16, 2024 02:23
@AArnott AArnott requested a review from neuecc January 16, 2024 02:24
@neuecc
Copy link
Member

neuecc commented Jan 17, 2024

thanks, I'll response soon(now reviewing).

Copy link
Member

@neuecc neuecc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, very good!

  • All formatters are now inner class of Resolver, so when formatters to same name types of different namespaces exists, will be error.
namespace Namespace1
{
    [MessagePackObject]
    public class MogeMoge
    {
        [Key(0)]
        public int MyProperty { get; set; }
    }
}

namespace Namespace2
{
    [MessagePackObject]
    public class MogeMoge
    {
        [Key(0)]
        public int MyProperty { get; set; }
    }
}
  • InstanceWithStandardAotResolver should generate FormatterCache<T>

This is also true for StandardAotResolver.Instance.
I don't think CompositeResolver.Create should be used in the library.

  • Avoid filescoped namespaces

Since Unity is up to C# 9.0, filescoped namespace (C# 10.0) cannot be used.
Since we do not dare to branch, it is better to assume C# 9.0 for the generated code.

  • Package name

I am not sure if the name MessagePackAnalyzer is a good name ...... when the main use is Source Generator.
I think it's OK to use another package name (MessagePackGenerator, MessagePackSourceGenerator, MessagePack.SourceGenerator).
We don't have to worry about the package name if bundle them, though.

Unity

Once again, let's get the Unity compiler version straight.

  • Unity 2022.3.12f or newer: 4.3.0
  • Unity 2022.2 or newer: 4.1.0
  • Unity 2021.2 or newer: 3.8.0

We believe that our target should be "Unity 2022.3 (LTS)".
But the problem is that minor versions have different compilers.
As of 2022.3.0(released at 2023-5-30), it is 4.1.0
In 2022.3.12 (released at 2023-10-26) it was updated to 4.3.0.

Ideally, though, it would be good to support all of them.
However, in order to keep the code base simple, it is okay to support only 4.3.0.
However, since the 3.8.0 version already exists, 3.8.0 + 4.3.0 would be best.
In that case, the minimum supported version of Unity would be 2021.2.

In this PR, the Unity version will be released as a standalone package.
I think it would be better to bundle it with MessagePackAnalyzer after #1734.
In other words, no .Unity suffix,

analyzers/dotnet/roslyn3.8/cs/MessagePack.SourceGenerator.dll
analyzers/dotnet/roslyn4.1/cs/MessagePack.SourceGenerator.dll

is a good packing project name.

@AArnott
Copy link
Collaborator Author

AArnott commented Jan 18, 2024

Thanks for the review. I know it was a lot of code churn. I'll get to work on the updates. In same cases, I added points you made to the PR description so I don't lose track.

when formatters to same name types of different namespaces exists, will be error.

Ah yes, I anticipated that and meant to fix it, but forgot.

Since Unity is up to C# 9.0

Elsewhere it is said that "Unity 2021.3 is using Roslyn v3 and it dies in 2024 April". Does that mean in April a later C# version is available? If so, can we adopt it now in our prerelease?

However, in order to keep the code base simple, it is okay to support only 4.3.0.
However, since the 3.8.0 version already exists, 3.8.0 + 4.3.0 would be best.

Oh, I'd love to eject the 3.8.0 version. A lot of complexity comes from that.
Also compiler version 4.3.0 comes with support for language version 11, so we could do a lot more things (including namespace statements) in the generated code, which would be nice.
That said, as you say the support is already there.

We believe that our target should be "Unity 2022.3 (LTS)".
But the problem is that minor versions have different compilers.

I can't say for unity for sure, but LTS typically means that version will be supported for a long time, provided you update to the latest servicing release. Given that, 2022.3.0 may not be supported by unity any more, and if so no one should be using it and we wouldn't need to support it either.

I am not sure if the name MessagePackAnalyzer is a good name ...... when the main use is Source Generator.

I'm not settled with the name either. But MessagePackAnalyzer is the existing package's name, so I thought keeping it had some value. It is only a source generator when activated though -- the analyzer is on by default. So analyzer or source generator in the name are both a touch misleading since both are present. I thought about MessagePack.CompilerExtensions, but no one would know what that meant, probably.

I don't think CompositeResolver.Create should be used in the library.

Oh, it's so convenient though. :) How do you measure the perf difference between that and a direct FormatterCache<T>? I'd like to get some data to back this policy.

In this PR, the Unity version will be released as a standalone package.

Not so. This PR only builds analyzers into one package, that includes the roslyn 3.8 and 4.3 analyzers together. The .Unity suffixes you see are only in the project names and the names of the files within the package.

@pCYSl5EDgo
Copy link
Contributor

pCYSl5EDgo commented Jan 18, 2024

#1734 (comment) it is said that "Unity 2021.3 is using Roslyn v3 and it dies in 2024 April". Does that mean in April a later C# version is available? If so, can we adopt it now in our prerelease?

For regular Unity users, No.
Unfortunately, Unity Technologies updated their Roslyn Compiler but they didn't update default C# LangVersion.
In fact, users who uses additional extension library such as Cysharp/CsprojModifier can use C#10/11.

@neuecc
Copy link
Member

neuecc commented Jan 18, 2024

Unity explicitly fixes the langversion to 9.0 at compile time.
Therefore, even with the latest Unity version where the compiler version has been updated, the current available language version remains at 9.0.
(There are hacks to increase the language version, such as explicitly specifying and overriding the langversion, but these are not common practices.)

Deciding the minimum version for Unity is quite troubling.
However, let's make a decision.
Considering future development efficiency, let's set "2022.3.12f1" as the lower limit!
In other words, I agree to remove the .Unity Analyzer and only use 4.3.0.
This means that (apart from the language version of the generated code) we no longer need to worry about Unity.

I've been able to make this decision only since three months ago when the compiler was updated!
It might have been good that it was delayed until this time.

How do you measure the perf difference between that and a direct FormatterCache<T>?

If you take MicroBenchmark, you will see a difference because of the lookup on every GetFormatter.
However, there is no need to measure the difference, since most of the resolvers are already using FormatterCache,
There should be no reason to skimp on this part.

Not so. This PR only builds analyzers into one package, that includes the roslyn 3.8 and 4.3 analyzers together.

Oh, sorry, I see, I read csproj carefully, I understand, thank you.

Regarding the package name, it might be good to keep it as MessagePackAnalyzer.

Copy link
Collaborator Author

@AArnott AArnott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been giving some thought to how we could make AOT generation more "on by default". With source generation being so much easier than the old mpc flow, it seems we may be within striking distance. So here's a thought I had tonight:

Suppose the source generator always generates the formatters, even if the user doesn't define a partial class with the [GeneratedMessagePackResolver] attribute. Suppose further that the DynamicObjectResolver looks for these formatters in the same assembly as the type to be formatted before it generates one dynamically and uses the pre-compiled one if it's found.
Now, we could 'discover' the formatter for a given type a variety of ways. But the most performant way is probably to... generate the resolver too! Then we have just one type to find via reflection for the whole assembly, and if we find it, we activate it (from the DynamicObjectResolver) and use it to search for pre-created formatters before dynamically creating them. How do we find the resolver? Well, we could emit one assembly-level attribute that we search for at runtime that points directly at the resolver. This could work whether the resolver is declared partially by the user code (and thus in a user-determined namespace) or whether we just generated it fully automatically.

Now what about strictly AOT environments where reflection to find the formatters or resolver doesn't work? Well, that can work the same way today (in this PR): the user declares the partial class for the resolver to effectively control the namespace and name of the resolver so the user can write code that finds it up-front, thereby avoiding all reflection.

The user can also opt into declaring the partial class explicitly in order to specify non-default options for code generation in the attribute on the resolver class.

This proposal means that anyone compiling against MessagePack v3 would effectively get AOT performance 'for free'. It also means these AOT code generators had better be good, because they'll be promoted from being used for a (small?) subset of projects to all projects. We tend to get bug reports fairly regularly that are unique to AOT formatters, so since the new source generator is based on the same T4 templates, we'll inherit those and need to be prepared to respond quickly to incoming bug reports.

@neuecc
Copy link
Member

neuecc commented Jan 19, 2024

In MemoryPack, I have discontinued IL Generation and switched to using only Source Generators. Having multiple code generation processes makes it difficult to maintain quality, so it has always been desirable to consolidate them into a single process. However, it would be challenging to discontinue IL Generation in MessagePack...

For your reference, MemoryPack has two Formatter registration processes.
One is through the static constructor.

partial class Foo
{
    static Foo()
    {
        MemoryPackFormatterProvider.Register<Foo>();
    }

    static void IMemoryPackFormatterRegister.RegisterFormatter()
    {
        if (!MemoryPackFormatterProvider.IsRegistered<Foo>())
        {
            MemoryPackFormatterProvider.Register(new MemoryPackableFormatter<Foo>());
        }
        if (!MemoryPackFormatterProvider.IsRegistered<Foo[]>())
        {
            MemoryPackFormatterProvider.Register(new ArrayFormatter<Foo>());
        }
    }
}

The static constructor does not trigger unless the type is accessed, so as a fallback, it was also triggered during GetFormatter<T>.

static bool TryInvokeRegisterFormatter(Type type)
{
    if (typeof(IMemoryPackFormatterRegister).IsAssignableFrom(type))
    {
        // currently C# can not call like `if (T is IMemoryPackFormatterRegister) T.RegisterFormatter()`, so use reflection instead.
        var m = type.GetMethod("MemoryPack.IMemoryPackFormatterRegister.RegisterFormatter", BindingFlags.Public | BindingFlags.NonPublic | BindingFlags.Static);
        if (m == null)
        {
            throw new InvalidOperationException("Type implements IMemoryPackFormatterRegister but can not found RegisterFormatter. Type: " + type.FullName);
        }
        m!.Invoke(null, null); // Cache<T>.formatter will set from method
        return true;
    }

    return false;
}

This is possible because in MemoryPack, the Formatter is attached to the target type, making it easy to find. However, this is not something that can be directly applied to the MessagePack system, but it's worth mentioning for reference.

I've also experimented with applying [ModuleInitializer], but the inability to apply it to generic types was a bottleneck. If the goal isn't to achieve complete AOT, then registering a formatter factory with ModuleInitializer and using MakeGeneric might be a viable approach.

@AArnott
Copy link
Collaborator Author

AArnott commented Jan 19, 2024

Thanks for the ideas from MemoryPack. Cool stuff you can do when you have a clean slate.

However [ModuleInitializer] isn't something I'm comfortable with automatically creating many of. Total avoidance of reflection is not a goal for me (though certainly I want to continue to support the AOT paths that do avoid it totally). I just want to bring the Ref.Emit and JIT time currently lost at startup to messagepack to zero or close to it.
Exactly one ModuleInitializer that registered the resolver might be OK, except that would guarantee that loading that assembly would also load the MessagePack assembly (and possibly other dependencies) before serialization was necessary, which would make adoption harder to justify for apps like Visual Studio that monitor those types of things. So I think it would be best to avoid any automatically-executing code paths like that.

The roslyn team strongly urges diagnostics to come from analyzers. Source generators should only report diagnostics that are specific to code generation. This, even if it means we have to perform analysis twice: once in the source generator (which comes first) and again in the analyzer.
The analyzers appear to support other languages, but not really, because in formulating diagnostics, they 'downcast' to C# syntax nodes to get good locations.
We can 'fix' this by making language-specific analyzer projects that share a lot of code, but unless we know there are VB/F# users out there, it's not worth it.
@AArnott
Copy link
Collaborator Author

AArnott commented Jan 20, 2024

I've made all the changes originally scoped (this excludes the "always AOT" ideas, which I think I'll save for a follow-up PR), except for the proposal to drop roslyn 3.8 support. Since the support is already there (which took a lot of work in this PR), and it's a little work to remove, but that's work, and maybe we'll want to retain the special build authoring that supports two roslyn versions (whatever they are), in case we want to drop 3.8 support but then retain 4.3 support while also doing something special for 4.5... I think keeping the 3.8 support for now may make our lives a bit easier later on.

So... I'm ready for a final review.

@AArnott AArnott mentioned this pull request Jan 21, 2024
@neuecc
Copy link
Member

neuecc commented Jan 22, 2024

@AArnott
Thank you, I've checked. It's OK.

@AArnott AArnott merged commit e2b0dbd into MessagePack-CSharp:develop Jan 22, 2024
1 of 3 checks passed
@AArnott AArnott deleted the sgattributes branch January 22, 2024 12:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants