Proposal: Official Support for Extended Use Cases #740

gregsdennis · 2024-05-31T03:00:27Z

gregsdennis
May 31, 2024
Maintainer

Problem

JSON Schema was originally designed as a JSON validation tool, and as the spec evolved, it has added annotation and even mentions hyper-schema operations.

Since then, users of JSON Schema have invented new ways to utilize JSON Schema beyond validation and annotation of JSON data. However, the specifications continue to be very targeted toward only supporting validation and annotation, making it difficult to create extension specifications that define these new use cases but don't need all of the declared behaviors.

The result is extended functionality without a specification (which means little to no interoperability) that has to resort to partially implementing the specification by leaving out support for the parts that the particular functionality doesn't need.

Proposal

The specifications should be rewritten/reorganized to describe a more abstract evaluation model that enables and supports secondary specifications that define these and future use cases. To facilitate this, the rewrite should focus on defining behaviors and assigning them to keywords rather than fully defining keywords for any particular context.

Core currently explicitly defines the following keyword behaviors:

Applicability - defines how a keyword applies subschemas
Assertion - defines the "correctness" of a value and produces a boolean result
Annotation - defines adding contextual information to a value

There is also an implicit behavior:

Directive - the Core keywords themselves - these provide mechanism that inform tooling on how to process the schema; there are no non-Core directive keywords.

Of these behaviors, directives and applicability are essential to the JSON Schema evaluation process across all known use cases. These behaviors must remain in Core.

However, since the assertion and annotation behaviors are not common to all use cases, they should be removed from Core and defined in appropriate secondary specification(s).

Schema Evaluation

Currently the spec defines evaluation in terms of having an instance present. However many of these new use cases operate without an instance. In order to support them, we will need to define evaluation outside of the context of having an instance.

Schema evaluation will need to be something along the lines of "applying all behaviors from all keywords present in the schema". Then the behaviors that need an instance, like assertion, are simply defined as such.

By using this evaluation model, Core can define the applicator behavior without an instance, and Validation (and Annotation if that becomes it's own spec) can separately define the assertion behavior as requiring an instance.

This opens the door for non-instance use case specs to directly reference Core while avoiding having to declare some sort of partial support of it.

Split Keyword Definitions

The primary trouble that presents itself with this approach is that many keywords exhibit multiple behaviors which will ultimately be defined across multiple specifications. For example, properties exhibits:

applicability - it contains subschemas which apply to object property values
assertion - it produces a boolean value based on the assertion result of all of its subschemas' evaluations
annotation - it produces an annotation of the properties which were both declared in the keyword and found in the instance

For a use case like code generation, only applicability is needed (and it's only needed insofar as it can be described without an instance).

If Core defines properties, it can only assign the applicability behavior. It can't assign assertion behavior because that behavior isn't defined. The only solution is to have secondary specifications define the assertion and annotation behaviors and then augment those behaviors onto Core's definition of properties.

The end result is that if you're working in a validation context, properties is not fully defined in a single place; rather its full behavior is defined across both Core and Validation.

This makes things easier (read: "possible") for authors of extension specs, but could make things more difficult for tooling authors and even users. I think the solution here is to have good documentation that collates those disparate behavior definitions into a single location.

Tooling Support

Tooling will need to declare the specifications they support, which implicitly declares the behaviors they support.

As an example, a validator would claim support for the Validation spec, which references Core. So the validator must support applicability (inherited from Core) and assertions. However a code generator could claim support for a codegen spec, which references a hypothetical Annotations spec, which in turn references Core, so the code generator would support any behaviors defined by the codegen spec as well as annotation and applicability.

Finally, if the same schema were passed through multiple tools, each tool would interpret the schema in different ways, applying only the set of behaviors that that tool supports. This would allow the same schema to be used by, e.g., a code generator and a validator.

Backward Compatibility

I recognize that from a specification point of view, this is a significant architectural change. However, I believe this can be achieved in a way that preserves functional backward compatibility, even if we do this after the first stable release. The key is preserving requirements. Moving requirements between specs or extracting them to new specs shouldn't affect the functionality of a tool. The tool is still going to claim the same feature set, regardless of the documents that specify those features. At most, they should only need to update documentation to indicate which specifications they support.

My preference would be to make these changes before the first stable release because I think it would be easier, but I'm not convinced it's necessary.

Discussion

The main two things I'd like to discuss here are:

the proposed solution to split the spec on behavior and the implications I've discussed above
other options for supporting these extended use cases

Ultimately, official support for extended use cases is the primary outcome that we want. The proposal above is what makes most sense to me.

There are likely other aspects and implications that I haven't considered yet, so please feel free start threads on those topics as well.

mwadams · 2024-05-31T08:38:18Z

mwadams
May 31, 2024
Collaborator

I'm currently going through the process of refactoring Corvus.JsonSchema to switch to a keyword/vocabulary based model. This is partly to support vocab in its current state, partly to make it quicker and easier to extend, and partly to make it quicker and easier to grok.

The keyword dependencies are the biggest PITA :-)

What I've discovered is an interesting layering.

Fundamental capabilities

So far there are only three things I've found that I would call "fundamental"
a. The ability to walk a JSON tree
b. The ability to resolve a JSON reference (using standard RFC semantics)
c. The ability to inspect a JSON schema and infer its vocabulary. It looks superficially like this is a keyword layer issue, but, in fact, you don't have keywords without a vocab so this is essentially a bootstrapping problem closely allied with the existence of vocabularies

Keyword capabilities

These are fundamental capabilities any keyword may be capable of providing.

They will be use-case specific.

Typical things an implementation might want to support include

"reference resolver", "anchor provider", "subschema provider", "property provider", "numeric validation provider", "string validation provider", "annotator" etc.

You may also need to define custom capabilities for a custom vocabulary.

The implementation of these capabilities on a keyword is where keyword dependencies manifest.

There are two types of dependency

direct, where the specification demands the presence of a particular keyword
indirect - where one keyword requires a particular keyword capability defined by another, but doesn't care what keyword actually provides it. Anchors and References are a good example of this.

The spec tends towards the former but it may be better to formally define capabilities and talk in terms of capabilities in future. It would make vocab interoperability simpler.

For example my if keyword doesn't need the then keyword specifically, it needs something capable of providing the then capability for an if capability.

Orchestration

Something that knows how to prioritise and apply the keyword capabilities to an instance for e.g. code generation, validation, annotation etc.

2 replies

mwadams May 31, 2024
Collaborator

I don't think this has a direct bearing on the spec refactoring proposal but, if these are broken up in this way, it shows why there is a need for some meta-definitions at a top level which permit individual specifications to talk about a common capability/term without taking an unnecessary dependency on a sibling specification.

It should be possible for any spec to be defined in terms of this "shared concepts" spec and itself, alone, without reference to sibling specs. Where new capabilities are introduced that are solely the domain of some subset of sibling spec, those capabilities can be defined in another metaspec shared between them. They might later be good candidates for inclusion in a future version of the top level metaspec.

This is also quite a powerful tool for future compatibility.

gregsdennis Jun 1, 2024
Maintainer Author

So far there are only three things I've found that I would call "fundamental"
a. The ability to walk a JSON tree
b. The ability to resolve a JSON reference (using standard RFC semantics)
c. The ability to inspect a JSON schema and infer its vocabulary. It looks superficially like this is a keyword layer issue, but, in fact, you don't have keywords without a vocab so this is essentially a bootstrapping problem closely allied with the existence of vocabularies

I'd say that (a) is the applicability behavior I described above, and (b) and (c) are part of the directives. Both of these I've recommended remain in Core, while other things be moved out.

So I think we agree on these, though we're coming at it from different directions.

Keyword capabilities

These are what I'm calling "behaviors". The proposal is suggesting that each of these behaviors receive their own specification and assign their respective behaviors to the appropriate keywords, perhaps defining those keywords if they haven't been defined elsewhere.

I agree that they are very use-case specific, which is why I want to separate them. Not everyone is going to need validation, but if you do need validation, then it's likely you'll need it for strings, arrays, etc. I don't think it's worthwhile to isolate "string validation".

Some of those other behaviors you mention are part of the Core keywords, like "reference resolution" and "anchor provider". I think these fall into the "ability to walk a JSON tree" that you mentioned before.

The implementation of these capabilities on a keyword is where keyword dependencies manifest.

I don't understand this. These capabilities are orthogonal to keyword dependencies.

additionalProperties depends on properties not because of its applicability, assertive, or annotative behaviors, but because it's just defined to. Then there are others, like allOf which have applicability and assertive behaviors, but they have no dependencies.

So the presence of these behaviors doesn't manifest dependencies.

For example my if keyword doesn't need the then keyword specifically, it needs something capable of providing the then capability for an if capability.

Strictly speaking, if doesn't need a then or an else. if does nothing on its own. In reality, it's the other way around: then and else require the presence of if to do anything.

(I respect that in the particular case of Corvus.JsonSchema, you need both to build an appropriate C# if-then statement.)

Something that knows how to prioritise and apply the keyword capabilities to an instance for e.g. code generation, validation, annotation etc.

I rather see an implementation identifying which behaviors they support, and then all of those behaviors can be independently applied. Validation doesn't necessarily need applicability to run before it, unless perhaps the keyword requires it. Keywords like properties provide a validation result based on the results of their subschemas, so, yeah, it needs to run the applicability before it runs validation, but we could theoretically devise a keyword that provides validation independently from how it applies its subschemas.

It should be possible for any spec to be defined in terms of this "shared concepts" spec and itself, alone, without reference to sibling specs. Where new capabilities are introduced that are solely the domain of some subset of sibling spec, those capabilities can be defined in another metaspec shared between them. They might later be good candidates for inclusion in a future version of the top level metaspec.

If I follow what you're saying, I think I covered this in my off-topic rant. The original idea was to have a spec document that defined the behavior, and then have separate vocab documents that referenced the behavior spec and (re)defined keywords to exhibit that behavior.

In the end, I felt that was a bit extreme, but I like the logic separation.

gregsdennis · 2024-06-01T06:46:16Z

gregsdennis
Jun 1, 2024
Maintainer Author

Another option is to continue to define validation and annotation in Core and simply provide a statement that says these behaviors are optional depending on the application. This would allow something like codegen to be compliant with the spec yet continue to ignore the validation behaviors of all of the keywords.

I would want the that allowance to apply wholesale per behavior, though. Implementations shouldn't be able to support validation on some keywords but ignore it for others, so that if a tool claims to support validation, it conforms to all of the validation behavior defined in the spec.

0 replies

SorinGFS · 2024-06-01T13:05:00Z

SorinGFS
Jun 1, 2024

I think there is sense in your proposal, personally (if I got it right) I would go even more radical. So, as I understand:

defining behaviors and assigning them to keywords rather than fully defining keywords for any particular context

Would this look something like:

{
   "applicator": {
        "anyOf": [ "additionalProperties", "allOf", "anyOf", "oneOf", ... ]
    },
        ...
}

.. or instead the keywords themselfs may be their $ref
while the keywords basically would be in fact standalone schemas like:

*file: $id/schema.json

{
    "format": "iri-reference",
    "type": "string",
    "$comment": "Fragments not allowed.",
    "pattern": "^[^#]*$"
}

?

Anyway, I'm in favor of decentralizing json-schema, and I think your proposal is infact just that. Correct me if I'm wrong. More important than the way this would be achieved is that this discussion is now open and there is interest in this goal. I wrote long time ago, that the actual structure of json-schema with releases which contains a specific snapshot of keyword definitions (which btw are re-defined in every release) is not scalable, not flexible. The solution to that is to release keywords instead of full schemas, to set their meaning in stone, and to move the specific schema releases outside the json-schema specifications opening in this way the road for chaining schemas as you mentioned here:

if the same schema were passed through multiple tools, each tool would interpret the schema in different ways, applying only the set of behaviors that that tool supports

As I see this change, core, applicator and validation keywords can be part of something that we later can refer as the json-schema, a foundation which can be used as is (probably even without mentioning its reference) by the vaste majority of cases, and on top of which we can add the bricks for specific use cases.

11 replies

SorinGFS Jun 1, 2024

The above object, if is placed in a separate file $id/schema.json would mean nothing until would be referenced somwhere. By placing every keyword definition in its own file would allow us the flexibility to combine them how we want. Mean while the textual definition of each keyword can refer the corrensponding file instead of those specified now in each schema release. Basically, a keyword can be defined once, and reused later in releases. As long as you are commited not to change the meaning of keywords in the future this will work just fine. When later would be a need for a new keyword that would be published in a separate file, specs updated with it, and new schema releases would just reference the new keyword.

SorinGFS Jun 1, 2024

Yeah, this discussion isn't about any of that. It's not about retrieving content. It's not about meta-schemas. It's about the specifications: https://github.com/json-schema-org/json-schema-spec/blob/main/jsonschema-core.md.

So, your solution would not imply re-arrangements in the shape of schema how it stands today?

gregsdennis Jun 1, 2024
Maintainer Author

It may be a downstream impact, but, no, this discussion is not about that.

SorinGFS Jun 1, 2024

Ok then. In the first part of this discussion you were saying:

JSON Schema was originally designed as a JSON validation tool, and as the spec evolved, it has added annotation and even mentions hyper-schema operations.
Since then, users of JSON Schema have invented new ways to utilize JSON Schema beyond validation and annotation of JSON data. However, the specifications continue to be very targeted toward only supporting validation and annotation, making it difficult to create extension specifications that define these new use cases but don't need all of the declared behaviors.

When you said "JSON Schema was originally designed as a JSON validation tool" and after that you say "the specifications continue to be very targeted toward only supporting validation and annotation, making it difficult to create extension specifications that define these new use cases but don't need all of the declared behaviors" I thought that the most logical solution would be to separate the concerns not to pile them up. Since even today the immense majority of use cases are for validation, and probably there is a reason why most used versions are draft-04 and draft-07, which are very simple compared to last ones. Therefore, in my mind a simple solution to the problem you mention would be to split the problem: provide a simple foundation for most use cases (validation) which would be stable and with no update needs, and put the dynamic part outside with a reference to the foundation. IMHO even the output part should be outside the foundation as separate schema and specs.
Anyway, I'll wait and see, but just by changing the specs I think the problem can only grow in time.

gregsdennis Jun 1, 2024
Maintainer Author

I thought that the most logical solution would be to separate the concerns...

That's precisely what's being proposed: separate the behaviors into individual specs, then to get back to the original functionality, you have to reassemble the keywords by combining the specs.

Since even today the immense majority of use cases are for validation...

This hasn't been my recent experience. I've seen more people wanting code generation and form generation capabilities than mere validation. This comes out of API specs like OpenAPI and others.

probably there is a reason why most used versions are draft-04 and draft-07, which are very simple compared to last ones

The most-used versions are a draft 4 variant for OpenAPI 3.0 and earlier because the tooling for OpenAPI 3.1 is just catching up.

even the output part should be outside the foundation as separate schema and spec

We've already extracted output into its own spec.

SorinGFS · 2024-06-02T05:37:48Z

SorinGFS
Jun 2, 2024

@gregsdennis
The way json-schema is designed today is centered to main schema which provides access to inner content. The way I see the access is from leaf to the root, meaning if I reference let's say validation, I would get that file, and that file would tell me what its dependencies are and I would get them too, if some of dependencies also have dependencies I would get them too. This method would redefine how vocabularies work, because we would be able to retrieve them directly. Also from specs point of view you'll be able to write vocabulary specific specs rather than centralized specs like today.

off-topic: format-assertion is kind of useless since we may need both, format-annotation and format-assertion in the same schema, which is impossible with two vocabularies referring the same keyword. IMO we need a new keyword like hasFormat inside validation, and the format keyword which is now used by default should be restored in its original location. Both format-annotation and format-assertion vocabularies may be removed after that. Also, I didn't understand why unevaluatedItems and unevaluatedProperties need their own vocabulary...

4 replies

gregsdennis Jun 3, 2024
Maintainer Author

Please don't include off-topic topics. If you want to discuss additional topics, please create new discussions.

gregsdennis Jun 3, 2024
Maintainer Author

if I reference let's say validation, I would get that file, and that file would tell me what its dependencies are and I would get them too, if some of dependencies also have dependencies I would get them too

There is no consumable file for specifications. Are you talking about meta-schemas? We're not considering that as part of this discussion. We just want to support other use cases.

This method would redefine how vocabularies work

We need to do this anyway. But again, that's not part of this discussion.

SorinGFS Jun 3, 2024

@gregsdennis

We're not considering that as part of this discussion. We just want to support other use cases.

I didn't even imagine that would be possible to support other use cases without a different organization of everything, including meta-schemas. Json-schema used to be a standard draft, and now it becomes more like a specific project which collects new concerns other than its original purpose. I have no problem with the new concerns, but I see them separate from the main concern which in my mind is to provide a unique "language" for data interchange.

gregsdennis Jun 3, 2024
Maintainer Author

Meta-schemas are an artifact. They reflect what the specifications do, but they are not prescriptive. We have to first figure out the specs. It's not helpful to talk about the meta-schemas until we figure out what we're doing with the specs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JSON Schema

Proposal: Official Support for Extended Use Cases #740

{{title}}

Replies: 4 comments 17 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

JSON Schema

Proposal: Official Support for Extended Use Cases #740

gregsdennis May 31, 2024 Maintainer

Problem

Proposal

Schema Evaluation

Split Keyword Definitions

Tooling Support

Backward Compatibility

Discussion

Replies: 4 comments · 17 replies

mwadams May 31, 2024 Collaborator

mwadams May 31, 2024 Collaborator

gregsdennis Jun 1, 2024 Maintainer Author

gregsdennis Jun 1, 2024 Maintainer Author

SorinGFS Jun 1, 2024

SorinGFS Jun 1, 2024

SorinGFS Jun 1, 2024

gregsdennis Jun 1, 2024 Maintainer Author

SorinGFS Jun 1, 2024

gregsdennis Jun 1, 2024 Maintainer Author

SorinGFS Jun 2, 2024

gregsdennis Jun 3, 2024 Maintainer Author

gregsdennis Jun 3, 2024 Maintainer Author

SorinGFS Jun 3, 2024

gregsdennis Jun 3, 2024 Maintainer Author

gregsdennis
May 31, 2024
Maintainer

Replies: 4 comments 17 replies

mwadams
May 31, 2024
Collaborator

mwadams May 31, 2024
Collaborator

gregsdennis Jun 1, 2024
Maintainer Author

gregsdennis
Jun 1, 2024
Maintainer Author

SorinGFS
Jun 1, 2024

gregsdennis Jun 1, 2024
Maintainer Author

gregsdennis Jun 1, 2024
Maintainer Author

SorinGFS
Jun 2, 2024

gregsdennis Jun 3, 2024
Maintainer Author

gregsdennis Jun 3, 2024
Maintainer Author

gregsdennis Jun 3, 2024
Maintainer Author