Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: Add a JSON schema exporting component for STJ contracts #102788

Closed
Tracked by #100159
eiriktsarpalis opened this issue May 28, 2024 · 14 comments · Fixed by #103322
Closed
Tracked by #100159

[API Proposal]: Add a JSON schema exporting component for STJ contracts #102788

eiriktsarpalis opened this issue May 28, 2024 · 14 comments · Fixed by #103322
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Text.Json
Milestone

Comments

@eiriktsarpalis
Copy link
Member

eiriktsarpalis commented May 28, 2024

Introduction

This issue defines the JSON schema exporting APIs largely following the design of the stj-schema-mapper prototype. Rather than introducing a JSON schema exchange type, the proposed exporter methods generate schema documents represented as JsonNode instances which can be modified or mapped to other schema models as required.

The exporter employs a callback model allowing users to enrich the generated schema for every node in the generated type graph using metadata from arbitrary attribute annotations.

Contributes to #100159

API Proposal

namespace System.Text.Json.Schema; // New namespace

public static class JsonSchemaExporter
{
    public static JsonNode GetJsonSchemaAsNode(this JsonSerializerOptions options, Type type, JsonSchemaExporterOptions? exporterOptions = null);
    public static JsonNode GetJsonSchemaAsNode(this JsonTypeInfo typeInfo, JsonSchemaExporterOptions? exporterOptions = null);
}

public sealed class JsonSchemaExporterOptions
{
    /// The default options singleton.
    public static JsonSchemaExporterOptions Default { get; } = new();

    /// Determines whether schema references should be generated for recurring schema nodes.
    public bool AllowSchemaReferences { get; init; } = true;

    /// Determines the compatibility mode used by the exporter.
    public JsonSchemaExporterCompatibilityMode CompatibilityMode { get; init; } = JsonSchemaExporterCompatibilityMode.Strict;

    /// Defines a callback that is invoked for every schema that is generated within the type graph.
    public Action<JsonSchemaExporterContext, JsonObject>? OnSchemaNodeGenerated { get; init; }
}

/// Context of the current schema node being generated.
public readonly struct JsonSchemaExporterContext
{
    /// The parent collection type if the schema is being generated for a collection element or dictionary value.
    public ReadOnlySpan<string> Path { get; }

    /// The JsonTypeInfo for the type being processed.
    public JsonTypeInfo TypeInfo { get; }

    /// The JsonPropertyInfo if the schema is being generated for a property.
    public JsonPropertyInfo? PropertyInfo { get; }
}

public enum JsonSchemaExporterCompatibilityMode
{
    /// Generates schemas that are strict with respect to nullability annotations and constructor parameter requiredness.
    Strict = 0,

    /// Generates schemas that are more permissive but preserve compatibility with the specified JsonSerializerOptions.
    JsonSerializer = 1,
}

API Usage

Here's an example using the callback API to extract schema information from other attributes:

var options = new JsonSchemaExporterOptions
{
    OnSchemaNodeGenerated = static (ctx, schema) =>
    {
        DescriptionAttribute? descriptionAttribute = ctx.PropertyInfo?.AttributeProvider?
             .GetCustomAttribute<DescriptionAttribute>();

        if (descriptionAttribute != null)
        {
            schema["description"] = (JsonNode)descriptionAttribute.Description;
        }
    }
});

JsonNode node = JsonSerializerOptions.Default.GetJsonSchemaAsNode(typeof(MyPoco), options);
// { "type" : "object", "properties" : { "X" : { "type" : "integer", "description" : "This is property X" } } }

public class MyPoco
{
    [Description("This is property X")]
    public int X { get; set; }
}

cc @captainsafia @stephentoub @gregsdennis

Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-text-json, @gregsdennis
See info in area-owners.md if you want to be subscribed.

@eiriktsarpalis eiriktsarpalis added api-ready-for-review API is ready for review, it is NOT ready for implementation blocking Marks issues that we want to fast track in order to unblock other important work labels May 28, 2024
@gregsdennis
Copy link
Contributor

gregsdennis commented May 29, 2024

As a maintainer of the JSON Schema specification, I'm very interested in the mapping between types and their constituents and the schemas that are generated. I consider this mapping part of the API. I would like to suggest very limited functionality (just obvious stuff, like what you show in your example) to start. I recommend steering away from defining edge cases.

We have a project in JSON Schema that seeks to define that mapping in a language-agnostic way. While there are plenty of generators (codegen / schemagen), so far none of these have jumped in to help define these rules, instead merely adding to the pool of non-interoperability by doing what makes sense to them. I'd love for this effort to contribute back into our project to help define the mapping between JSON Schema and types.

What I'd like to see out of this effort is a beginning toward generating a schema from a type, then being able to take that schema and generate equivalent types in other languages.


As a .Net developer, I understand that this isn't the final vision for JSON Schema provided by .Net. As such, I wonder where this fits in the future when the final vision (whatever that ends up being) is realized. Will these types and methods still be available? Are we comfortable with this existing alongside some potential dedicated JSON Schema model?

@gregsdennis
Copy link
Contributor

Specific notes about API:

JsonSchemaExporterCompatibilityMode

"Strict" has a different meaning in a JSON Schema context than what's listed in the comment. For JSON Schema, "strict" generally means that only the declared properties are allowed, i.e. the schema contains additionalProperties: false. (This also has a side effect of not being able to use the schema as a "base type".)

Also be aware that declaring a property with type: null means that the property is present with a null value. It has no effect if the property doesn't exist in the instance. It seems that having Strict (allowing null property values) as the default is in conflict with the serializer's default behavior of skipping properties with default values.

AllowSchemaReferences

I'm curious how generation works when setting this false and trying to generate for a recursive structure or some other circumstance that requires a $ref.

In my generation lib, I found that a lot of cases required that I have optimization (collecting similar types into $defs) enabled.

ParentCollection

Why have this separate from the type to generate for? List<T> is generally quite different from T.

@eiriktsarpalis
Copy link
Member Author

I'm very interested in the mapping between types and their constituents and the schemas that are generated. I consider this mapping part of the API. I would like to suggest very limited functionality (just obvious stuff, like what you show in your example) to start. I recommend steering away from defining edge cases.

To clarify, this is not a .NET type to JSON schema mapping component, it is an STJ contract (aka JsonTypeInfo) to JSON schema mapping component. As such it neither exactly corresponds to the shape of a type (e.g. because of the presence of attributes, custom converters, user-defined contracts or other known STJ quirks) nor is it responsible for driving the contract derivation logic. That responsibility solely sits with whatever IJsonTypeInfoResolver you are using, and the schema exporter's responsibility would be to provide a faithful and complete mapping of the contracts to a rudimentary schema document. The general principle is that whatever JSON is produced from or accepted by a particular JsonTypeInfo should be valid under the schema exported from that same JsonTypeInfo.

I'd love for this effort to contribute back into our project to help define the mapping between JSON Schema and types.

What I'd like to see out of this effort is a beginning toward generating a schema from a type, then being able to take that schema and generate equivalent types in other languages.

I'm a firm believer of schema-first approaches, namely using schemas to generate types and not the other way around. Code-first largely relies on convention and can be imprecise because not all type system concepts have first-class representation or are relevant in a particular schematization format. I realize I'm stating this in an API proposal that caters to code-first scenaria (still a huge deal in .NET!), however longer-term I would personally place my bets on schema-first.

As a .Net developer, I understand that this isn't the final vision for JSON Schema provided by .Net. As such, I wonder where this fits in the future when the final vision (whatever that ends up being) is realized. Will these types and methods still be available? Are we comfortable with this existing alongside some potential dedicated JSON Schema model?

Longer term, we need an exchange type that offers basic validation and keyword extensibility. I've purposely designed the proposal so that corresponding APIs targeting the exchange type can be added in the future by qualifying method names with "Node", e.g. "GetJsonSchemaAsNode" and "OnSchemaNodeGenerated".

@eiriktsarpalis
Copy link
Member Author

It seems that having Strict (allowing null property values) as the default is in conflict with the serializer's default behavior of skipping properties with default values.

That's changing in .NET 9 (for properties at least). We stopped short of making it the default behaviour for fear of breaking changes, but given that this is a new component we have the luxury of being more deliberate about the desirable behaviour.

I'm curious how generation works when setting this false and trying to generate for a recursive structure or some other circumstance that requires a $ref.

Similar to the serializer, it will throw an exception if the JsonSerializerOptions.MaxDepth value is reached.

Why have this separate from the type to generate for? List is generally quite different from T.

It provides context when visiting the schema for T that it is being generated as an element for List<T>. Users can use that information to, say extract attribute annotations from the parent collection or make other modifications to the schema document.

@gregsdennis
Copy link
Contributor

gregsdennis commented May 29, 2024

To clarify, this is not a .NET type to JSON schema mapping component, it is an STJ contract (aka JsonTypeInfo) to JSON schema mapping component.

This is a nuance. I think most people are going to see/use it as generating a schema from a type; it just has to go through the STJ source generator a little, which kinda makes sense because it's a JSON schema. I doubt (public) users will really see the difference.

Regardless, there's still some underlying logic that dictates what kinds of .Net constructions yield what kinds of schema constructions. That's what the IDL vocab project (basically me) is interested in.

I'm a firm believer of schema-first approaches, namely using schemas to generate types and not the other way around.

I'm excited to see this take hold across .Net! (cc: @markrendle / @mwadams)

I've purposely designed the proposal so that corresponding APIs targeting the exchange type can be added in the future by qualifying method names with "Node", e.g. "GetJsonSchemaAsNode" and "OnSchemaNodeGenerated".

Yeah, I'm just curious if these are expected to be utilized still when the future version is implemented. If not, is this considered acceptable bloat? I'm not sure how this kind of thing is handled in .Net.

That's changing in .NET 9 (for properties at least).

You're changing the default to serialize nulls? That seems inefficient. Do you have an issue/PR for that?

@eiriktsarpalis
Copy link
Member Author

eiriktsarpalis commented May 29, 2024

Yeah, I'm just curious if these are expected to be utilized still when the future version is implemented. If not, is this considered acceptable bloat? I'm not sure how this kind of thing is handled in .Net.

We can't ever remove APIs once they have been shipped, so it's expected that they would live on next to the future APIs that target an exchange type. It shouldn't be a huge issue size-wise, by that point they just be stub methods that convert to JsonNode. This is consistent with the variety of methods present in JsonSerializer, e.g. Serialize, SerializeToNode, SerializerToElement, SerializeToBytes etc.

You're changing the default to serialize nulls? That seems inefficient. Do you have an issue/PR for that?

It's adding non-nullable reference type enforcement. See #100144

@gregsdennis
Copy link
Contributor

I think I'm interpreting things a bit wrong.

The default for the serializer is to skip null properties when writing, and the default for this is to generate a schema that allows null properties.

My thinking was that these are related, but really the schema generation is related to serializer reads where the default behavior is to allow nulls, which does align with the default schema generation.

Disallowing nulls in the schema aligns with the non-default behavior of disallowing nulls in deserialization, per that issue.

I think we're good here.

@eiriktsarpalis
Copy link
Member Author

The default for the serializer is to skip null properties when writing

That's not the default behavior, unless IgnoreNullValues/JsonIgnoreCondition have been configured to do so. Obviously the exporter should be honoring whatever configuration is being used.

but really the schema generation is related to serializer reads

That's an interesting observation. The prototype implementation is actually generating a schema that satisfies the union of serialization outputs and valid deserialization inputs. In other words, the schema of a property allows null if either the getter or the setter are nullable (it's possible for the two to differ in annotation).

@bartonjs
Copy link
Member

bartonjs commented Jun 4, 2024

  • We moved the JsonSchemaExporter methods from extension to simple static
  • We cut the compatibility mode enum, callers have to just get the JsonSerializerOptions correct ahead of time
    • The copy ctor on JsonSerializerOptions makes this fairly easy
  • We changed the return type of GetJsonSchemaAsNode to be JsonObject, as that's always guaranteed by implementation.
  • The question of a JsonSerializerOptionsDefaults.Schema was asked, and it was decided that belongs in a different proposal.
  • We talked about the ROS<string> Path property, and after bouncing it to string[] decided that ReadOnlySpan<string> is both fine, and better.
namespace System.Text.Json.Schema; // New namespace

public static class JsonSchemaExporter
{
    public static JsonObject GetJsonSchemaAsNode(JsonSerializerOptions options, Type type, JsonSchemaExporterOptions? exporterOptions = null);
    public static JsonObject GetJsonSchemaAsNode(JsonTypeInfo typeInfo, JsonSchemaExporterOptions? exporterOptions = null);
}

public sealed class JsonSchemaExporterOptions
{
    /// The default options singleton.
    public static JsonSchemaExporterOptions Default { get; } = new();

    /// Determines whether schema references should be generated for recurring schema nodes.
    public bool AllowSchemaReferences { get; init; } = true;

    /// Defines a callback that is invoked for every schema that is generated within the type graph.
    public Action<JsonSchemaExporterContext, JsonObject>? OnSchemaNodeGenerated { get; init; }
}

/// Context of the current schema node being generated.
public readonly struct JsonSchemaExporterContext
{
    /// The parent collection type if the schema is being generated for a collection element or dictionary value.
    public ReadOnlySpan<string> Path { get; }

    /// The JsonTypeInfo for the type being processed.
    public JsonTypeInfo TypeInfo { get; }

    /// The JsonPropertyInfo if the schema is being generated for a property.
    public JsonPropertyInfo? PropertyInfo { get; }
}

@bartonjs bartonjs added api-approved API was approved in API review, it can be implemented and removed blocking Marks issues that we want to fast track in order to unblock other important work api-ready-for-review API is ready for review, it is NOT ready for implementation labels Jun 4, 2024
@gregsdennis
Copy link
Contributor

gregsdennis commented Jun 5, 2024

I think the API looks good. Just reiterating that I'm still interested in the eventual logic behind it.

What code constructs produce what schema constructs? To be answered later, I assume, but it would be good to have this publicly documented, and I'd like to help guide these decisions.

@eiriktsarpalis
Copy link
Member Author

eiriktsarpalis commented Jun 5, 2024

Just reiterating that I'm still interested in the eventual logic behind it.

Essentially it's just going to be a port of https://github.com/eiriktsarpalis/stj-schema-mapper, the unit tests should provide an idea of how it's meant to work.

@gregsdennis
Copy link
Contributor

gregsdennis commented Jun 5, 2024

I've submitted eiriktsarpalis/stj-schema-mapper#2 for some recommendations.

@eiriktsarpalis
Copy link
Member Author

I would like to propose the following amendments to the API as approved:

  • Instead of using a mutation-based callback that only works with JsonObject, this change uses a mapping delegate that can additionally handle false and true schemas.
  • The TreatNullObliviousAsNullable flag controls the nullability of the generated schema in the case of null oblivious reference types. By default all oblivious reference types are treated as nullable, echoing the semantics of JsonSerializer.
  • The return type of the GetJsonSchemaAsNode methods is changed from JsonObject to JsonNode to reflect the fact that false and true schema values are now supported.
namespace System.Text.Json.Schema; 

public static class JsonSchemaExporter
{
-    public static JsonObject GetJsonSchemaAsNode(JsonSerializerOptions options, Type type, JsonSchemaExporterOptions? exporterOptions = null);
-    public static JsonObject GetJsonSchemaAsNode(JsonTypeInfo typeInfo, JsonSchemaExporterOptions? exporterOptions = null);
+    public static JsonNode GetJsonSchemaAsNode(JsonSerializerOptions options, Type type, JsonSchemaExporterOptions? exporterOptions = null);
+    public static JsonNode GetJsonSchemaAsNode(JsonTypeInfo typeInfo, JsonSchemaExporterOptions? exporterOptions = null);
}

public sealed class JsonSchemaExporterOptions
{
-    public Action<JsonSchemaExporterContext, JsonObject>? OnSchemaNodeGenerated { get; init; }
+    public Func<JsonSchemaExporterContext, JsonNode, JsonNode>? TransformGeneratedSchemaNode { get; init; }
+    public bool TreatNullObliviousAsNullable { get; init; } = true;
}

@github-actions github-actions bot locked and limited conversation to collaborators Jul 16, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Text.Json
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants