Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a design document for System.Text.Json polymorphic serialization #226

Closed
1 change: 1 addition & 0 deletions INDEX.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,7 @@ Use update-index to regenerate it:
| 2021 | [Compile-time source generation for strongly-typed logging messages](accepted/2021/logging-generator.md) | [Maryam Ariyan](https://github.com/maryamariyan), [Martin Taillefer](https://github.com/geeknoid) |
| 2021 | [Objective-C interoperability](accepted/2021/objectivec-interop.md) | [Aaron Robinson](https://github.com/AaronRobinsonMSFT) |
| 2021 | [Preview Features](accepted/2021/preview-features/preview-features.md) | [Immo Landwerth](https://github.com/terrajobst) |
| 2021 | [System.Text.Json polymorphic serialization](accepted/2021/json-polymorphism.md) | [Eirik Tsarpalis](https://github.com/eiriktsarpalis) |
| 2021 | [TFM for .NET nanoFramework](accepted/2021/nano-framework-tfm/nano-framework-tfm.md) | [Immo Landwerth](https://github.com/terrajobst), [Laurent Ellerbach](https://github.com/Ellerbach), [José Simões](https://github.com/josesimoes) |
| 2021 | [Tracking Platform Dependencies](accepted/2021/platform-dependencies/platform-dependencies.md) | [Matt Thalman](https://github.com/mthalman) |

Expand Down
275 changes: 275 additions & 0 deletions accepted/2021/json-polymorphism.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,275 @@
# System.Text.Json polymorphic serialization

**Owner** [Eirik Tsarpalis](https://github.com/eiriktsarpalis)

This documents describes the proposed design for extending [polymorphism support](https://github.com/dotnet/runtime/issues/45189) in System.Text.Json.

[Draft Implementation PR](https://github.com/dotnet/runtime/pull/53882).

## Background

By default, System.Text.Json will serialize a value using a converter derived from its declared type,
eiriktsarpalis marked this conversation as resolved.
Show resolved Hide resolved
regardless of what the runtime type of the value might be. This behavior is in line with the
Liskov substitution principle, in that the serialization contract is unique (or "monomorphic") for a given type `T`,
regardless of what subtype of `T` we end up serializing at runtime.
eiriktsarpalis marked this conversation as resolved.
Show resolved Hide resolved

A notable exception to this rule is members of type `object`, in which case the runtime type of the value
is looked up and serialization is dispatched to the converter corresponding to that runtime type.
This is an instance of _polymorphic serialization_, in the sense that the schema might vary depending on
the runtime type a given `object` instance might have.

Conversely, in _polymorphic deserialization_ the runtime type of a deserialized value might vary depending on the
shape of the input encoding. Currently, System.Text.Json does not offer any form of support for polymorphic
deserialization.
eiriktsarpalis marked this conversation as resolved.
Show resolved Hide resolved

We have received a number of user requests to add polymorphic serialization and deserialization support
to System.Text.Json. This can be a useful feature in domains where exporting type hierarchies is desirable,
for example when serializing tree-like data structures or discriminated unions.

It should be noted however that polymorphic serialization comes with a few security risks:

* Polymorphic serialization applied indiscriminately can result in unintended data leaks,
since properties of unexpected derived types may end up written on the wire.
* Polymorphic deserialization can be vulnerable when deserializing untrusted data,
in certain cases leading to remote code execution attacks.

## Introduction

The proposed design for polymorphic serialization in System.Text.Json can be split into two
largely orthogonal features:

1. Simple Polymorphic serialization: extends the existing serialization infrastructure for `object` types to
arbitrary classes that can be specified by the user. It trivially dispatches to the converter corresponding
to the runtime type without emitting any metadata on the wire and does not provide any provision for
polymorphic deserialization.
2. Polymorphism with type discriminators ("tagged polymorphism"): classes can be serialized and deserialized
polymorphically by emitting a type discriminator ("tag") on the wire. Users must explicitly associate each
supported subtype of a given declared type with a string identifier.

## Simple Polymorphic Serialization

Consider the following type hierarchy:
```csharp
public class Foo
{
public int A { get; set; }
}

public class Bar : Foo
{
public int B { get; set; }
}

public class Baz : Bar
{
public int C { get; set; }
}
```
Currently, when serializing a `Bar` instance as type `Foo`
the serializer will apply the JSON schema derived from the type `Foo`:
```csharp
Foo foo1 = new Foo { A = 1 };
Foo foo2 = new Bar { A = 1, B = 2 };
Foo foo3 = new Baz { A = 1, B = 2, C = 3 };

JsonSerializer.Serialize<Foo>(foo1); // { "A" : 1 }
JsonSerializer.Serialize<Foo>(foo2); // { "A" : 1 }
JsonSerializer.Serialize<Foo>(foo3); // { "A" : 1 }
```
Under the new proposal we can change this behaviour by annotating
layomia marked this conversation as resolved.
Show resolved Hide resolved
the base class (or interface) with the `JsonPolymorphicType` attribute:
```csharp
[JsonPolymorphicType]
public class Foo
{
...
}
```
which will result in the above values now being serialized as follows:
```csharp
JsonSerializer.Serialize<Foo>(foo1); // { "A" : 1 }
JsonSerializer.Serialize<Foo>(foo2); // { "A" : 1, "B" : 2 }
JsonSerializer.Serialize<Foo>(foo3); // { "A" : 1, "B" : 2, "C" : 3 }
```
Note that the `JsonPolymorphicType` attribute is not inherited by derived types.
In the above example `Bar` inherits from `Foo` yet is not polymorphic in its own right:
```csharp
Bar bar = new Baz { A = 1, B = 2, C = 3 };
JsonSerializer.Serialize<Bar>(bar); // { "A" : 1, "B" : 2 }
```
If annotating the base class with an attribute is not possible,
polymorphism can alternatively be opted in for a type using the
new `JsonSerializerOptions.SupportedPolymorphicTypes` predicate:
```csharp
public class JsonSerializerOptions
{
public Func<Type, bool> SupportedPolymorphicTypes { get; set; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This property name seems to suggest it's a collection of types? 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was having similar thoughts. What would be some suggested alternatives? SupportedPolymorphicTypesPredicate perhaps?

eiriktsarpalis marked this conversation as resolved.
Show resolved Hide resolved
}
```
Applied to the example above:
```csharp
var options = new JsonSerializerOptions { SupportedPolymorphicTypes = type => type == typeof(Foo) };
eiriktsarpalis marked this conversation as resolved.
Show resolved Hide resolved
JsonSerializer.Serialize<Foo>(foo1, options); // { "A" : 1, "B" : 2 }
JsonSerializer.Serialize<Foo>(foo2, options); // { "A" : 1, "B" : 2, "C" : 3 }
```
It is always possible to use this setting to enable polymorphism _for every_ serialized type:
```csharp
var options = new JsonSerializerOptions { SupportedPolymorphicTypes = _ => true };

// `options` treats both `Foo` and `Bar` members as polymorphic
Baz baz = new Baz { A = 1, B = 2, C = 3 };
JsonSerializer.Serialize<Foo>(baz, options); // { "A" : 1, "B" : 2, "C" : 3 }
JsonSerializer.Serialize<Bar>(baz, options); // { "A" : 1, "B" : 2, "C" : 3 }
```
As mentioned previously, this feature provides no provision for deserialization.
If deserialization is a requirement, users would need to opt for the
polymorphic serialization with type discriminators feature.

## Polymorphism with type discriminators

This feature allows users to opt in to polymorphic serialization for a given type
by associating string identifiers with particular subtypes in the hierarchy.
These identifiers are written to the wire so this brand of polymorphism is roundtrippable.

At the core of the design is the introduction of `JsonKnownType` attribute that can
be applied to type hierarchies like so:
```csharp
[JsonKnownType(typeof(Derived1), "derived1")]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would users be able to specify the name of the discriminator field in the serialized model here? The examples use $type, but not all web APIs which use discriminated union-like types use that naming.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Furthermore: are non-string values permitted? It's also often the case that web APIs use integers for these discriminator values, which would often be mapped to an enum in C# code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would users be able to specify the name of the discriminator field in the serialized model here? The examples use $type, but not all web APIs which use discriminated union-like types use that naming.

Yes, that is something we are considering although it has not been included in the prototype for now. There are a few infrastructural problems to be sorted out, for example our metadata parsing implementation currently requires that all metadata properties start with $ and are placed before any other properties in the object.

Furthermore: are non-string values permitted? It's also often the case that web APIs use integers for these discriminator values, which would often be mapped to an enum in C# code.

Not in the prototype, but you're making a good point that this should be supported. We would probably need to make the JsonKnownTypeAttribute accept object parameters but require that each type uses a unique identifier type, so the following configuration would be invalid:

[JsonKnownType(typeof(Bar), 1)]
[JsonKnownType(typeof(Baz), "baz")]
public class Foo {}

As far as supported identifier types go, am I right to assume that only string and int values should be accepted?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for example our metadata parsing implementation currently requires that all metadata properties start with $ and are placed before any other properties in the object.

Yeah - in my own converters I've resorted to making a copy of the reader, finding and deserializing the discriminator, and then using the reader copy to deserialize as the actual type, updating the original reader with the copy after the fact. While this fixes the "must be the first field in the JSON object" issue, it's not that great because it parses the value multiple times.

As far as supported identifier types go, am I right to assume that only string and int values should be accepted?

As far as I'm aware, yes, only string and int are used for these. However, I think it may be convenient to support enum types too, as they're easily converted to int, and are (IMO) the natural type to use for a discriminator.

[JsonKnownType(typeof(Derived2), "derived2")]
layomia marked this conversation as resolved.
Show resolved Hide resolved
public class Base
{
public int X { get; set; }
}

public class Derived1 : Base
{
public int Y { get; set; }
}

public class Derived2 : Base
{
public int Z { get; set; }
}
```
This allows roundtrippable polymorphic serialization using the following schema:
```csharp
var json1 = JsonSerializer.Serialize<Base>(new Derived1()); // { "$type" : "derived1", "X" : 0, "Y" : 0 }
var json2 = JsonSerializer.Serialize<Base>(new Derived2()); // { "$type" : "derived2", "X" : 0, "Z" : 0 }

JsonSerializer.Deserialize<Base>(json1); // uses Derived1 as runtime type
JsonSerializer.Deserialize<Base>(json2); // uses Derived2 as runtime type
```
Alternatively, users can specify known type configuration using the
`JsonSerializerOptions.TypeDiscriminatorConfigurations` property:
```csharp
public class JsonSerializerOptions
{
public IList<TypeDiscriminatorConfiguration> TypeDiscriminatorConfigurations { get; }
eiriktsarpalis marked this conversation as resolved.
Show resolved Hide resolved
}
```
which can be used as follows:
```csharp
var options = new JsonSerializerOptions
{
TypeDiscriminatorConfigurations =
{
new TypeDiscriminatorConfiguration<Base>()
.WithKnownType<Derived1>("derived1")
.WithKnownType<Derived2>("derived2")
}
};
```
or alternatively
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why provide two models? Could we get away with just this one (non-generic)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having just the non-generic one should be sufficient, however the generic one is type-safe so users might appreciate having that option.

```csharp
var options = new JsonSerializerOptions
{
TypeDiscriminatorConfigurations =
{
new TypeDiscriminatorConfiguration(typeof(Base))
.WithKnownType(typeof(Derived1), "derived1")
.WithKnownType(typeof(Derived2), "derived2")
}
};
```

### Open Questions

The type discriminator semantics could be implemented following two possible alternatives,
which for the purposes of this document I will be calling "strict mode" and "lax mode".
Each approach comes with its own sets of trade-offs.

#### Strict mode

"Strict mode" requires that any runtime type used during serialization must explicitly specify a type discriminator.
For example:
```csharp
[JsonKnownType(typeof(Derived1),"derived1")]
[JsonKnownType(typeof(Derived2),"derived2")]
public class Base { }

public class Derived1 : Base { }
public class Derived2 : Base { }
public class Derived3 : Base { }

public class OtherDerived1 : Derived1 { }

JsonSerializer.Serialize<Base>(new Derived1()); // { "$type" : "derived1" }
JsonSerializer.Serialize<Base>(new Derived2()); // { "$type" : "derived2" }
JsonSerializer.Serialize<Base>(new Derived3()); // throws NotSupportedException
JsonSerializer.Serialize<Base>(new OtherDerived1()); // throws NotSupportedException
JsonSerializer.Serialize<Base>(new Base()); // throws NotSupportedException
```
Any runtime type that is not associated with a type discriminator will be rejected,
including instances of the base type itself. This approach has a few drawbacks:

* Does not work well with open hierarchies: any new derived types will have to be explicitly opted in.
* Each runtime type must use a separate type identifier.
* Interfaces or abstract classes cannot specify type discriminators.
eiriktsarpalis marked this conversation as resolved.
Show resolved Hide resolved

#### Lax mode

"Lax mode" as the name suggests is more permissive, and runtime types without discriminators
are serialized using the nearest type ancestor that does specify a discriminator.
Using the previous example:
```csharp
[JsonKnownType(typeof(Derived1),"derived1")]
[JsonKnownType(typeof(Derived2),"derived2")]
public class Base { }

public class Derived1 : Base { }
public class Derived2 : Base { }
public class Derived3 : Base { }

public class OtherDerived1 : Derived1 { }

JsonSerializer.Serialize<Base>(new Derived1()); // { "$type" : "derived1" }
JsonSerializer.Serialize<Base>(new Derived2()); // { "$type" : "derived2" }
JsonSerializer.Serialize<Base>(new Derived3()); // { } serialized as `Base`
JsonSerializer.Serialize<Base>(new OtherDerived1()); // { "$type" : "derived1" } inherits schema from `Derived1`
JsonSerializer.Serialize<Base>(new Base()); // { } serialized as `Base`
```
This approach is more flexible and supports interface and abstract type hierarchies:
```csharp
[JsonKnownType(typeof(Foo), "foo")]
[JsonKnownType(typeof(IBar), "bar")]
public interface IFoo { }
public abstract class Foo : IFoo { }
public interface IBar : IFoo { }

public class FooImpl : Foo {}

JsonSerializer.Serialize<IFoo>(new FooImpl()); // { "$type" : "foo" }
```
However it does come with its own set of problems:
```csharp
[JsonKnownType(typeof(Foo), "foo")]
[JsonKnownType(typeof(IBar), "bar")]
public interface IFoo { }
public class Foo : IFoo { }
public interface IBar : IFoo { }

public Baz : Foo, IBar { }

JsonSerializer.Serialize<IFoo>(new Baz()); // diamond ambiguity, could either be "foo" or "bar",
// throws NotSupportedException.
```