Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide fast-path serialization logic in JSON source generator #51945

Closed
layomia opened this issue Apr 27, 2021 · 19 comments
Closed

Provide fast-path serialization logic in JSON source generator #51945

layomia opened this issue Apr 27, 2021 · 19 comments
Assignees
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Text.Json tenet-performance Performance related issue
Milestone

Comments

@layomia
Copy link
Contributor

layomia commented Apr 27, 2021

The JSON source generator (#45448) takes a metadata-based approach where reflection-based type metadata gathering is moved from run-time to compile time. This primarily helps improve start-up time, privates bytes usage, and app size.

For simple JsonSerializerOptions usages, we can also generate serialization logic using Utf8JsonWriter directly. which can help improve serialization throughput.

API Proposal

namespace System.Text.Json.Serialization.Metadata
{
    public abstract partial class JsonTypeInfo<T>
    {
        // Existing:
        // internal JsonTypeInfo() { }

        // A method that can be called to serialize an instance of T, using <see cref="JsonSerializerOptions"/> specified at compile-time.
        public Action<Utf8JsonWriter, T>? Serialize { get; set; }
    }

    // Instructs the System.Text.Json source generator to assume the specified options will be used at run-time via <see cref="JsonSerializerOptions"/>.
    [AttributeUsage(AttributeTargets.Assembly, AllowMultiple = false)]
    public class JsonSerializerOptionsAttribute : JsonAttribute
    {
        // Specifies the default ignore condition.
        public JsonIgnoreCondition DefaultIgnoreCondition { get; set; }

        // Specifies whether to ignore read-only fields.
        public bool IgnoreReadOnlyFields { get; set; }

        // Specifies whether to ignore read-only properties.
        public bool IgnoreReadOnlyProperties { get; set; }

        // Specifies whether to ignore custom converters provided at run-time.
        public bool IgnoreRuntimeCustomConverters { get; set; }

        // Specifies whether to include fields for serialization and deserialization.
        public bool IncludeFields { get; set; }

        // Specifies a built-in naming polices to convert JSON property names with.
        public JsonKnownNamingPolicy NamingPolicy { get; set; }

        // Specifies whether JSON output should be pretty-printed.
        public bool WriteIndented { get; set; }
    }
}

namespace System.Text.Json.Serialization
{
    // Existing: [AttributeUsage(AttributeTargets.Assembly, AllowMultiple = true)]
    [AttributeUsage(AttributeTargets.Assembly | AttributeTargets.Class | AttributeTargets.Struct | AttributeTargets.Interface, AllowMultiple = true)]
    public sealed class JsonSerializableAttribute : Attribute
    {
        // Existing:
        // public string TypeInfoPropertyName { get; set; }
        // public JsonSerializableAttribute(Type type) { }

        // Constructor to be place directly on types.
        public JsonSerializableAttribute() { }

        // Instructs the generator on what to generate for the type.
        public JsonSourceGenerationMode GenerationMode { get; set; }
    }
  
    public enum JsonKnownNamingPolicy
    {
        Unspecified = 0,
        BuiltInCamelCase = 1
    }

    // The mode for source generation by the System.Text.Json source generator.
    public enum JsonSourceGenerationMode
    {
        // Instructs the JSON source generator to generate serialization logic
       // and type metadata to fallback to when the run-time options don't match.
        MetadataAndSerialization = 0,

        // Instructs the JSON source generator to generate type-metadata initialization logic.
        Metadata = 1,

        // Instructs the JSON source generator to generate serialization logic.
        Serialization = 2,
    }
}

Feature behavior

  • Fast-path methods are only overriden when [JsonSerializerOptionsAttribute] is used.
  • The generated JsonContext.Default property will generate/use an options populated with the values from the [JsonSerializerOptionsAttribute].

Scenarios

Given a simple type:

[JsonSerializable]
public struct JsonMessage​
{public string Message { get; set; }}
Calling fast-path directly
[assembly: JsonSerializerOptions(NamingPolicy = JsonKnownNamingPolicy.BuiltInCamelCase)]

JsonTypeInfo<JsonMessage> messageInfo = JsonContext.Default.JsonMessage;​

​var ms = new MemoryStream();using (var writer = new Utf8JsonWriter(ms)){​
    messageInfo.Serialize!(writer, message);}
Using fast-path via JsonSerializer

Some features like reference-loop handling & async (de)serialization are not supported in generated fast-path logic. For those cases, the context or type info should be passed to the serializer directly. The serializer will detect when the fast path can be called or not. For example for reference-handling, the serializer would know that it can call the fast-path logic for primitives and structs.

[assembly: JsonSerializerOptions]

JsonContext context = new JsonContext(new JsonSerializerOptions() { ReferenceHandler = ReferenceHandler.Preserve });
JsonSerializer.Serialize(new JsonMessage(), JsonContext.Default.JsonMessage);
@layomia layomia added this to the 6.0.0 milestone Apr 27, 2021
@layomia layomia self-assigned this Apr 27, 2021
@ghost
Copy link

ghost commented Apr 27, 2021

Tagging subscribers to this area: @eiriktsarpalis, @layomia
See info in area-owners.md if you want to be subscribed.

Issue Details

The JSON source generator (#45448) takes a metadata-based approach where reflection-based type metadata gathering is moved from run-time to compile time. This primarily helps improve start-up time, privates bytes usage, and app size.

For simple POCOs and simple JsonSerializerOptions usages, we can also generate serialization logic using Utf8JsonWriter directly. which can help improve serialization throughput.

[assembly: JsonSerializable(typeof(JsonMessage))]

public struct JsonMessage​
{public string message { get; set; }}JsonTypeInfo<JsonMessage> messageInfo = JsonContext.Default.JsonMessage;​

​var ms = new MemoryStream();using (var writer = new Utf8JsonWriter(ms)){​
    messageInfo.SerializeObject!(writer, message);}// ​Recall metadata can also be passed to serializer and be used when nested in other object graphs:​

JsonSerializer.SerializeToUtf8Bytes(message, messageInfo);
Author: layomia
Assignees: layomia
Labels:

area-System.Text.Json, tenet-performance

Milestone: 6.0.0

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Apr 27, 2021
@layomia
Copy link
Contributor Author

layomia commented Apr 27, 2021

"Simple options":

Run-time options (JsonSerializerOptions)
Supported for fast-path​ Fallback to serializer logic​
DefaultIgnoreCondition​ IgnoreNullValues (obsolete)​
IncludeFields​ ReferenceHandler​
PropertyNamingPolicy (Built-in camel)​ NumberHandling​
WriteIndented (writer handles this)​ Async serialization​
IgnoreReadOnlyFields​ Encoder​
IgnoreReadOnlyProperties​ Converters​
Design-time options (Attributes)
Supported for fast-path​ Fallback to serializer logic​
JsonIgnoreAttribute​ JsonConverterAttribute​
JsonIncludeAttribute​ JsonExtensionDataAttribute​
JsonPropertyNameAttribute​ JsonNumberHandlingAttribute​

@layomia layomia removed the untriaged New issue has not been triaged by the area owner label Apr 27, 2021
@Tornhoof
Copy link
Contributor

Are the fallbacks per type or global?, i.e. if I add a converter is fast-path disabled for the type in the converter or for all types?

@layomia
Copy link
Contributor Author

layomia commented Apr 28, 2021

@Tornhoof the fallbacks would be per type. So if a custom converter is used for the type or one of its members, a fast-path action will not be generated. Other types that fit the characteristics would have fast-path logic generated.

@terrajobst terrajobst added blocking Marks issues that we want to fast track in order to unblock other important work api-ready-for-review API is ready for review, it is NOT ready for implementation labels May 13, 2021
@jkotas
Copy link
Member

jkotas commented May 14, 2021

Other source generators are often adopting a pattern of partial methods that get filled in by the source generator. In this case, it would look like this:

[GeneratedJsonSerializer(JsonSerializerOptions.Default)]
partial static void Serialize(MyType value, Utf8JsonWriter writer);

Would this be a better alternative for the lean fast serializers? What are the pros and cons of this simple static method that just gets the job done vs. what is proposed above?

@layomia
Copy link
Contributor Author

layomia commented May 14, 2021

@jkotas We could absolutely adopt a pattern like this. It is succinct and efficient.

The downside is that it cannot be used within the JsonSerializer. The initial proposal attaches the serialization logic to a JsonTypeInfo<T> so that it can be used within the serializer, as well as called directly by users.

Applications/services that use non-trivial features of JsonSerializer (such as Bing) want to have benefits of source generation highlighted in #45448. The metadata approach enables their most important scenarios (Faster start-up, reduced private set, AOT-friendliness, rich serializer feature set). Layering lean-fast serialization on top of the metadata approach allows them to also benefit from improved throughput, whereas the lean/fast generation based on partial methods alone won't be compatible.


For improved usability for really simple scenarios, I think it would be good to add such a pattern alongside the one initially added above. The generator's implementation could be following, depending on configuration:

Assuming the default generation mode (Serialization)

[GeneratedJsonSerializer(JsonSerializerDefaults.General)]
partial static void Serialize(Utf8JsonWriter writer, MyType value)
{
    writer.WriteStartObject();
    ...
    writer.WriteEndObject();
}

Assuming Metadata or SerializationWithMetadataFallback mode

In this case we could the code generated with JsonContext to avoid code duplication.

[assembly: JsonSourceGenerationMode(JsonSourceGenerationMode.Metadata)]

partial static void Serialize(Utf8JsonWriter writer, MyType value)
{
    JsonContext context = JsonContext.GetOrAdd(JsonSerializerDefaults.General); // Lazy create and cache a compatible context
    context.MyType.Serialize(writer, value);
}

@jkotas
Copy link
Member

jkotas commented May 14, 2021

rich serializer feature set

Do we know exactly what is the rich feature set needed by Bing, etc., that would not be available via the simple fast mode?

For example, do they really need SerializationWithMetadataFallback? Having work with Bing, this option sounds like something they would like to actively avoid since they would not want to generate code at runtime to compensate for wrong build time configuration.

Also, should we have a similar simple fast deserialization path?

[GeneratedJsonDeserializer]
partial static MyType Deserialize(Utf8JsonReader reader);

@layomia
Copy link
Contributor Author

layomia commented May 14, 2021

Do we know exactly what is the rich feature set needed by Bing, etc., that would not be available via the simple fast mode?

For Bing, the major features are async (de)serialization & custom converters. To represent other use cases, I highlighted all the unavailable serialization features in #51945 (comment).

For example, do they really need SerializationWithMetadataFallback? Having work with Bing, this option sounds like something they would like to actively avoid since they would not want to generate code at runtime to compensate for wrong build time configuration.

Bing would not be a likely user of this feature. It was included for scenarios where multiple options instances with different values are used in a project. We can pull it out if it's not a first-class consideration.

If we do pull it out, we would not need the JsonSourceGenerationMode types since we can auto-detect it based on the partial methods and/or JsonSerializableAttribute usage.

Also, should we have a similar simple fast deserialization path?

Yes, this issue is focused for the preview 5 goal to add fast-path serialization. Deserialization is planned for p6. Here's the expected matrix of support:

Run-time options
Supported Not supported (fallback to serializer logic)
DefaultIgnoreCondition IgnoreNullValues (obsolete)
IncludeFields ReferenceHandler
PropertyNamingPolicy (Built-in camel)
Async deserialization
Converters
Encoder
PropertyNameCaseInsensitive
Design-time options
Supported Not supported (fallback to serializer logic)
Using parameterless ctor Using parameterized ctor
JsonIgnoreAttribute JsonConverterAttribute
JsonIncludeAttribute JsonExtensionDataAttribute
JsonPropertyNameAttribute JsonNumberHandlingAttribute

@layomia
Copy link
Contributor Author

layomia commented May 14, 2021

@jkotas the idea of using static abstract methods on interfaces helps with the issue of how generated metadata or serialization logic can be used within JsonSerializer without passing things around. It is blocked on language support. It isn't compatible however with non-IJsonSerializable types e.g. framework types.

interface IJsonSerializable
{
    void Write(IJsonSerializeable value, Utf8JsonWriter writer);
    static IJsonSerialiable Read(Utf8JsonReader reader);
}

static class JsonSerializer
{
    static byte[] SerializeToUtf8Bytes(IJsonSerializable value) {  }
    static IJsonSerializable Deserialize(ReadOnlySpan<byte> json) { throw null; }
}

@jkotas
Copy link
Member

jkotas commented May 14, 2021

If we do pull it out, we would not need the JsonSourceGenerationMode types since we can auto-detect it based on the partial methods and/or JsonSerializableAttribute usage.

Yes, I think simpler would be better. I have not realized that JsonSerializableAttribute only exists to support SerializationWithMetadataFallback, and that you can get away with auto-detection otherwise. How would the auto-detection work exactly?

the idea of using static abstract methods on interfaces helps with the issue of how generated metadata or serialization logic can be used within JsonSerializer

Directly including json serialization logic in the type implementation itself via interface is antipatern that should be avoided. For example, it is unfriendly to IL trimming - IL trimming will end up keeping the serialization logic even on types that are never actually serialized by the application since it won't be able to prove that the interface is not used indirectly.

Also, I am not sure how the shape you have proposed actually works. Where would IJsonSerializable Deserialize(ReadOnlySpan<byte> json) get the list of types to use for the deserialization from?

@terrajobst
Copy link
Member

terrajobst commented May 14, 2021

Video

  • Instead of generating a single JsonContext type per assembly, we should let the user define the JsonContext type (controlling namespace, type name, etc.) and apply the assembly level attributes to this type and have the generator fill in the body of the body. This allows the user to have multiple different JSON options for different types in a single assembly, e.g.
    • Fabrikam.Telemetry.TelemetryJsonContext
    • Fabrikam.Shopping.ShoppingJsonContext
  • We should rename JsonSerializerOptionsAttribute the name implies that it's related to JsonSerializerOptions but this describes the runtime serialization options while this attributes the static, source generator-based options that are similar, but generally disjoint because of the nature of needing to be statically describable (e.g. no converters, no polymorphic naming conventions etc).
  • In fact, we should consider taking all the attributes and base types that are relevant to source-generated JSON serializer into a dedicated namespace (e.g. System.Text.Json.Serialization.Generator or System.Text.Json.Serialization.Metadata).
namespace System.Text.Json.Serialization.Metadata
{
    public abstract partial class JsonTypeInfo<T>
    {
        // Existing:
        // internal JsonTypeInfo() { }
        public Action<Utf8JsonWriter, T>? Serialize { get; set; }
    }
    [AttributeUsage(AttributeTargets.Assembly, AllowMultiple = false)]
    public class JsonSerializerOptionsAttribute : JsonAttribute
    {
        public JsonIgnoreCondition DefaultIgnoreCondition { get; set; }
        public bool IgnoreReadOnlyFields { get; set; }
        public bool IgnoreReadOnlyProperties { get; set; }
        public bool IgnoreRuntimeCustomConverters { get; set; }
        public bool IncludeFields { get; set; }
        public JsonKnownNamingPolicy NamingPolicy { get; set; }
        public bool WriteIndented { get; set; }
    }
}
namespace System.Text.Json.Serialization
{
    // Existing: [AttributeUsage(AttributeTargets.Assembly, AllowMultiple = true)]
    [AttributeUsage(AttributeTargets.Assembly |
                    AttributeTargets.Class |
                    AttributeTargets.Struct |
                    AttributeTargets.Interface, AllowMultiple = true)]
    public partial class JsonSerializableAttribute : Attribute
    {
        public JsonSourceGenerationMode GenerationMode { get; set; }
    }
    public enum JsonKnownNamingPolicy
    {
        Unspecified = 0,
        BuiltInCamelCase = 1
    }
    public enum JsonSourceGenerationMode
    {
        MetadataAndSerialization = 0,
        Metadata = 1,
        Serialization = 2
    }
}

@terrajobst terrajobst added api-needs-work API needs work before it is approved, it is NOT ready for implementation and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels May 14, 2021
@layomia
Copy link
Contributor Author

layomia commented May 17, 2021

@jkotas

How would the auto-detection work exactly?

In the updated proposal following API review, the generator would by default generate both metadata and serialization code for all types. It can be changed to serialization code only on a per-type basis. This is what the user would provide to kick off generation for serialization logic only (with default options):

[JsonSerializable(typeof(JsonMessage), GenerationMode = JsonSourceGenerationMode.Serialization)]
public partial class MyJsonContext : JsonSerializerContext
{
}

Then to call the generated fast-path serialization code:

using MemoryStream ms = new();
using Utf8JsonWriter writer = new(ms);
MyJsonContext.Default.JsonMessage.Serialize!(writer, new JsonMessage { Message: "Hello" });

Directly including json serialization logic in the type implementation itself via interface is antipatern that should be avoided. For example, it is unfriendly to IL trimming - IL trimming will end up keeping the serialization logic even on types that are never actually serialized by the application since it won't be able to prove that the interface is not used indirectly.

Makes sense to avoid a pattern that roots all serialization logic/metadata even when not used. cc @eerhardt / @davidfowl who are interested interface approach.

Also, I am not sure how the shape you have proposed actually works. Where would IJsonSerializable Deserialize(ReadOnlySpan json) get the list of types to use for the deserialization from?

This should be static T Deserialize<T>(ReadOnlySpan<byte> json) where T : IJsonSerializable { throw null; }

@jkotas
Copy link
Member

jkotas commented May 17, 2021

Serialize!

Why do I need the ! here? When is this going to be null?

@layomia
Copy link
Contributor Author

layomia commented May 17, 2021

Why do I need the ! here? When is this going to be null?

It would be null if Metadata-only is specified:

[JsonSerializable(typeof(JsonMessage), GenerationMode = JsonSourceGenerationMode.Metadata)]
public partial class MyJsonContext : JsonSerializerContext
{
}

User would use the generated code as follows, calling the serializer. There'd still be be some throughput gain (for small POCOs, 10-15%) by passing the type metadata directly (and avoiding a dictionary look up to fetch it like done in existing serializer methods).

byte[] json = JsonSerializer.SerializeToUtf8Bytes(messageInstance, MyJsonContext.Default.JsonMessage);

@jkotas
Copy link
Member

jkotas commented May 17, 2021

It feels like unnecessary ceremony to me. I would like to write just:

[JsonSerializable(typeof(JsonMessage))]
public partial class MyGeneratedJsonSerializers : JsonSerializerContext
{
}

MyGeneratedJsonSerializers.Serialize(writer, new JsonMessage { Message: "Hello" });

@layomia
Copy link
Contributor Author

layomia commented May 17, 2021

As it aligns with the proposal, my understanding is that this feedback is to:

  • make the Serialization-only mode the default JsonSourceGenerationMode value so it doesn't need to specified for the really simple scenarios
  • Generate a top level static Serialize method on the generated context which assumes the default options (or whatever JsonSerializerOptionsAttribute values were statically indicated)

I've noted the feedback for discussion in the next review. The second bullet sounds really good to me. The first one needs a policy discussion on what scenario we want the least ceremony for. In API review, we landed on the mode that works for all serializer features being the default. Definitely open to change when we look at it again.

Concretely, we could also add a [JsonSourceGenerationMode(Mode)] attribute that is applied throughout the context so that even if the Serialization-only mode needs to be specified, it can be specified just once rather than per-type.

[JsonSourceGenerationMode(JsonSourceGenerationMode.Serialization)]
[JsonSerializable(typeof(JsonMessage))]
[JsonSerializable(typeof(Foo))]
[JsonSerializable(typeof(Bar))]
public partial class MyGeneratedJsonSerializers : JsonSerializerContext
{
}

MyGeneratedJsonSerializers.Serialize(writer, new JsonMessage { Message: "Hello" });
MyGeneratedJsonSerializers.Serialize(writer, new Foo());
MyGeneratedJsonSerializers.Serialize(writer, new Bar());

The per-type option would override the global one when generating for a type. The property would change to nullable so that we know when nothing was specified:

[AttributeUsage(AttributeTargets.Class, AllowMultiple = true)]
public sealed class JsonSerializableAttribute : Attribute
{
    // Existing:
    // public string? TypeInfoPropertyName { get; set; }
    // public JsonSerializableAttribute(Type type) { }

    // Instructs the generator on what to generate for the type.
    public JsonSourceGenerationMode? GenerationMode { get; set; } // Now nullable
}

@eerhardt
Copy link
Member

It feels like unnecessary ceremony to me. I would like to write just:

MyGeneratedJsonSerializers.Serialize(writer, new JsonMessage { Message: "Hello" });

An issue is all the overloads that JsonSerializer.Serialize provides would also need to be generated on the MyGeneratedJsonSerializers as well. For example:

  • return string
  • return byte[]
  • Takes a Utf8JsonWriter
  • Takes a Stream

And as we add new overloads in the future, we would need to add the overloads to the generated class as well.

@jkotas
Copy link
Member

jkotas commented May 18, 2021

Yup, it would be nice to only generate the shapes that user actually wants, e.g. specifying the desired shape using partial method (#51945 (comment)). It would also allow pay-for-play async shapes.

@layomia
Copy link
Contributor Author

layomia commented Jul 1, 2021

Closing this issue as done. #55043 tracks fast-path deserialization using the reader.

@layomia layomia closed this as completed Jul 1, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Jul 31, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Text.Json tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

5 participants