Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discriminators (not) changing the JSON Schema Validator behavior #2516

Closed
FWiesner opened this issue Apr 1, 2021 · 8 comments
Closed

Discriminators (not) changing the JSON Schema Validator behavior #2516

FWiesner opened this issue Apr 1, 2021 · 8 comments

Comments

@FWiesner
Copy link

FWiesner commented Apr 1, 2021

As discussed on the TSC on March 25th, discriminator is supposed to not affect JSON Schema validation.

https://github.com/OAI/OpenAPI-Specification/blame/master/versions/3.1.0.md#L2703

and the following examples puzzle me here, as to me it clearly reads that properties/required in combination with polymorphic use of allOf are to be ignored, if they are not on a nested object definition underneath allOf themselves

foo required with otherSchema carrying the discriminator:

allOf:
  - $ref: otherSchema
  - type: object
    properties:
      foo:
        type: string
    required: ["foo"]

foo ignored/not required:

properties:
  foo:
    type: string
required: ["foo"]
allOf:
  - $ref: otherSchema

This to me is not compatible with the intention to have discriminator to only be a hint.

https://github.com/OAI/OpenAPI-Specification/blame/master/versions/3.1.0.md#L2701 also specifically lists deserialization, serialization and validation

@jdesrosiers
Copy link
Contributor

When using the discriminator, inline schemas will not be considered.

I'm not sure what value that constraint has, but it's not consistent with the processing model of JSON Schema. In 2019-09 and 2020-12, $ref is a keyword like any other. There is no distinction between a reference and an inline schema. { "$ref": "otherSchema" } is not a reference, it's a schema that contains a reference. That schema could include other keywords as well.

In previous JSON Schema drafts, { "$ref": "otherSchema" } is just a reference, so the distinction between an inline schema and a referenced schema made sense. The problem is that those references are intended to be transparent when processing the a schema.

When request bodies or response payloads may be one of a number of different schemas, a discriminator object can be used to aid in serialization, deserialization, and validation.

I don't think there is a problem here other than being a bit misleading. The "can be used to aid in" language sounds consistent with the "hint" language. It "can" (not "must") be used to "aid" (as is in help achieve the goal, not change goal) in validation. As we discussed last week, in some narrow cases, discriminator could make the schema processing more efficient and it certainly could be used to "aid" in producing better error messages.

The problematic part of the spec is https://github.com/OAI/OpenAPI-Specification/blame/master/versions/3.1.0.md#L2725

a discriminator MAY act as a "hint" to shortcut validation

It can't be both a hint and a "shortcut" because shortcutting changes the validation behavior. Given this contradiction, we decided in the TSC meeting that we wanted to interpret the "hint" language to be correct and the "shortcut" language to be wrong.

@FWiesner
Copy link
Author

FWiesner commented Apr 2, 2021

Thanks @jdesrosiers, also for pointing out item three on the hint vs shortcut.

Regarding item one on the inline schemas, you mentioned that it made sense in previous releases. Just, I don’t see the value even in the case of earlier OpenAPI releases and ignored the constraint when I was recently implementing discriminators support for networknt/json-schema-validator.

I wonder whether the constraints can be safely ignored for now and would be removed (ideally also in a 3.0.4 maintenance release).

@jdesrosiers
Copy link
Contributor

I think the references-only constraint exists for the benefit of code-gen tools. If the schema is inline, tools don't have a class name they can use to create an instance. The reference presumably points to an entry in /components/schemas such as /components/schemas/Foo where "Foo" indicates the class name. However, references aren't intended to convey any information other than where to go next when evaluating a schema. Passing information like this using a reference is an abuse of the referencing mechanism in JSON Schema. A better solution would be to introduce an annotation keyword to allow users to declare the class name to use for code generation purposes.

For validation, the references-only constraint definitely has no value.

(@FWiesner, have you seen json-schema-org/json-schema-spec#1082 yet? It's a proposal for a discriminator-like keyword designed for validation. Thought you might find it interesting.)

@FWiesner
Copy link
Author

FWiesner commented Apr 6, 2021

@jdesrosiers I had a look at the proposal and still try to digest it in terms of what it would mean in our particular use case. We neither are building a code generator nor a JSON Schema validator. The problem we have to solve (as we build a SaaS solution) is the dynamic extension of existing OpenAPIs without the need to generate new code. Then we interpret requests based on the dynamic extensions and validate payloads.

At the moment we still are on OpenAPI 3.0.3 and have not upgraded to OpenAPI 3.1 yet.

Today the beauty of discriminators is that in case of B extending A, it is B telling that it extends A. But, it is A that defines the field of choice to discriminate on, not B and not the anyOf which is listing A and B. A with OpenAPI discriminators also does not need to know all of its sub-types. So, you can add B, C, D, etc extending from A without updating A itself and also without having the need to discriminate on the reference to A and its polymorphic alternatives.

As I understand the proposal, A would have all the different alternatives to which it in the end finally resolves in allOf. Certainly beautiful in terms of validation, but problematic when you want A to be an immutable entity. Also a problem comes up when A should accept certain extensions not all the time.

Think of the overlay proposal. With today's discriminator you just merge a new type that extends from A, say X and you add X to existing anyOfs where X should be an accepted alternative.

When A is actually Address and X, Y, Z are certain country-specific extensions like USAddress, GermanyAddress and ChinaAddress, you might have some operations where only generic Address, USAddress and ChinaAddress are legitimate candidates, but not GermanyAddress. Other operations might accept all alternatives, so you simply list them all.

With ifs your overlay has to patch each operation with the alternatives in an allOf/if block. This means composition now would happen from the referencing point (the operation) and no more from GermanyAddress which inherits from Address having a discriminator or you'd have two allOf sections to deal with - the one from the operation and the one on GermanyAddress. I understand your propertyDependencies proposal as a "macro" for ifs, so wonder whether it would change the modeling in any way. It would be great if the proposal thread could have an example that has both the extensible schemas as well as schemas referencing the extensible schemas.

I don't think your proposal (or the alternatives discussed on the thread) won't work, I'm just not yet fully clear of the impacts and whether I like it or not :)

Btw it would be really cool if we could have an extension point constraining its extensions in a way. For example if A is managed by one party and B by a different one with B extending from A, it would be awesome if A could specify that extension attribute names have to follow certain patterns (as an example).

Think of Address again and then SaaSVendorExtendedAddress being implemented by some SaaS vendor. A constraint on SaaSVendorExtendedAddress could then specify that an arbitrary SIExtendedSaaSVendorExtendedAddress extending from SaaSVendorExtendedAddress directly or via multiple levels of inheritance must not define fields matching the pattern ^saasVendor.:* to prevent future clashes when SaaSVendorExtendedAddress gets later extended with fields that SIExtendedSaaSVendorExtendedAddress already has defined.

I don't know whether there is a namespacing proposal for OpenAPIs and JSON Schemas today, but this would be my poor-mans approach and I would raise a separate discussion if it makes sense.

Also, it would be great if your base type could specify the types it supports on extensions. So, if you know that your SaaSVendorExtendedAddress can only support additional string values and only up to eight with a maximum length of 64kb, there is no way to express that on SaaSVendorExtendedAddress in a standard way.

@jdesrosiers
Copy link
Contributor

Thank you for this feedback. While the purpose of propertyDependencies is to serve the validation use case, it's good to have feedback from a different perspectives as well. I'm not sure I fully understand your use case, but I'll try to show some examples where I hope to address as many of the cases you identified as possible. I'll leave out the $defs, so just imagine they are there and point to a valid definition.

Starting with the simplest case, this is effectively B extending A where the "type" property is the discriminator.

{
  "$ref": "#/$defs/A",
  "propertyDependencies": {
    "type": {
      "B": { "$ref": "#/$defs/B" }
    }
  }
}

Adding a new subtype C means adding it's definition and adding one line to the above example. This is effectively, B extending A or, C extending A.

{
  "$ref": "#/$defs/A",
  "propertyDependencies": {
    "type": {
      "B": { "$ref": "#/$defs/B" },
      "C": { "$ref": "#/$defs/C" }
    }
  }
}

If you use references for all the types as we did in the previous examples, the subtype declaration using propertyDependencies is effectively decoupled from both the parent type and the children types making it very easy to define alternate sets of subtypes the same way you defined the full set of subtypes. This schema (B or D) can be used in some cases while the previous schema (B or C) can be used in other cases.

{
  "$ref": "#/$defs/A",
  "propertyDependencies": {
    "type": {
      "B": { "$ref": "#/$defs/B" },
      "D": { "$ref": "#/$defs/D" }
    }
  }
}

I think that addresses most of the use cases you brought up although I think it requires you to look at the problem from a different angle. The one thing, propertyDependencies doesn't support is this,

A with OpenAPI discriminators also does not need to know all of its sub-types.

It does that by assuming a location to find the schema and constructing a reference. That doesn't fit well with JSON Schema, for a number of reasons, but I'll point out one. discriminator expects schemas to be found at /components/schemas/{propertyValue}. That location only exists in the context of an OpenAPI document. If we change it to /$defs/{propertyValue} to make it fit with JSON Schema, then it wouldn't work in an OpenAPI document. I don't see a way to support automatic mapping in a generic way. I think it needs to be explicit.

it would be great if your base type could specify the types it supports on extensions

If I understand you correctly, this can already be done with JSON Schema. Here's an example

{
  "type": "object",
  "properties": {
    ...
  },
  "additionalProperties": {
    "type": "string",
    "maxLength": 65536
  }
}

Any schema that extends that schema will fail validation if the properties it adds does not meet the constraints in additionalProperties.

@FWiesner
Copy link
Author

FWiesner commented Apr 7, 2021

Thanks @jdesrosiers for the detailed response. While you've shown ways to construct things in JSON Schema to make it work, I still think OpenAPI discriminator has its merits. As you pointed out, you can simulate it in JSON Schema in most aspects. But, discriminator still has some beauties despite the pretty poor level of detail of the spec.

It does that by assuming a location to find the schema and constructing a reference. That doesn't fit well with JSON Schema, for a number of reasons, but I'll point out one. discriminator expects schemas to be found at /components/schemas/{propertyValue}. That location only exists in the context of an OpenAPI document. If we change it to /$defs/{propertyValue} to make it fit with JSON Schema, then it wouldn't work in an OpenAPI document. I don't see a way to support automatic mapping in a generic way. I think it needs to be explicit.

In networknt/json-schema-validator I've implemented the validation in a way that implicit parent-child relationship is established by validating the child, which has the allOf reference and traversing back to the parent (yes, it is not truly clean...). Therefore I didn't need any convention. As soon as you have ambiguity as you have the same named schema name in two different places, then explicit mapping you could mandate. But, as long as you use discriminators only with anyOf/allOf you might even skip the test for ambiguity when your referencing location is an anyOf listing A, B, C, D. If A thinks it is a match based on the discriminator and C does so as well (say both extend from X), then ambiguity doesn't really matter as the first match wins. With sequential evaluation it would then always be an A. With parallel evaluation it's certainly first match, first winner (the discriminator spec doesn't btw say whether you should evaluate anyOf sequentially or whether parallelization is allowed)

I think I now also have a grasp on why I don't truly like

{
  "$ref": "#/$defs/A",
  "propertyDependencies": {
    "type": {
      "B": { "$ref": "#/$defs/B" },
      "C": { "$ref": "#/$defs/C" }
    }
  }
}

I'm coming from the Java world and the example you gave is a composition case (guess that's the very heart of JSON Schema). Here A, B and C have no built in relationship. B and C don't really need to have an allOf relationship to A, so they are not As, but mixins. In my world that is very counter-intuitive :). discriminator leads to a clear parent/child relationship which is much closer to OOP type inheritance.

{
  "type": "object",
  "properties": {
    ...
  },
  "additionalProperties": {
    "type": "string",
    "maxLength": 65536
  }
}

That's actually a truly smart idea to enable what I've asked for. Unfortunately I mis-phrased my actual intent.

What I was actually was intending to ask for is whether there is a chance to see forward constraints on what schema definitions are legitimate on extensions.

Think of an of-the-shelf solution that has an extension base defined and the solution does not have support for ECMA 262 regular expressions evaluation on validation, so wants to disallow this constraint on strings. It might have significantly more technical restrictions, like it might lack support for floating point numbers completely, might only support two levels of nested objects added by extensions, have the given maximum string length you've solved above, etc.

@jdesrosiers
Copy link
Contributor

jdesrosiers commented Apr 13, 2021

I've implemented the validation in a way that implicit parent-child relationship is established by validating the child

I think I can see how that could work.

yes, it is not truly clean...

I'm sure it isn't. Because it goes against the grain of how JSON Schema works, it won't easily fit into a standard implementation. I skipped that explanation in my previous comment because it was hard to explain. I'm glad to learn that you have implemented it and understand those challenges even better than I do.

I'm coming from the Java world and the example you gave is a composition case

Yes, that example is not idiomatic Java, but it is idiomatic JSON Schema. However, that's not the only way to use it. Perhaps something like this would be familiar to you.

{
  "$defs": {
    "extendsA": {
      "propertyDependencies": {
        "type": {
          "B": { "$ref": "#/$defs/B" },
          "C": { "$ref": "#/$defs/C" }
        }
      }
    },
    "A": { ... },
    "B": {
      "$ref": "#/$defs/A",
      ...
    },
    "C": {
      "$ref": "#/$defs/A",
      ...
    }
  }
}

B and C now explicitly extend A and it's just as easy to add new types that extend A as the previous example. The only difference between this and the last discriminator example in the spec is that you have to separate A and extendsA (otherwise you would have a circular reference).

@handrews
Copy link
Member

I'm pretty sure that anything that could be done here was done in PR #2618, which clarified the interaction with validation.

There's too much here for me to tell if there were any other specific asks. Please file them separately if necessary, although keep in mind that we are likely replacing disciriminator entirely in OAS 4 (a.k.a. "Moonwalk"), and adding any complexity to it in the 3.x line is probably not something we want to do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants