Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal to add keyword metadata to vocabularies by defining a new vocabulary #1257

Closed
wants to merge 5 commits into from

Conversation

gregsdennis
Copy link
Member

This will likely be of most interest to @handrews and maybe @jdesrosiers.

Since the introduction of vocabularies, the idea of them being self-descriptive has been tossed around with abstract ideas but not much in the concrete space. This is a proposal that provides some of that meta-data that we've been looking for via a new vocabulary specifically designed for vocabulary meta-schemas.

This proposal adds three new keywords:

applicators

The value of this keyword is an object whose keys are all of the keywords defined in the vocab which have applicator behavior.

The values of each property is the kind of applicator that keyword is: objectChild, arrayChild, or inPlace.

This references #602, in which @handrews classified applicators as being "object-child" (meaning they look at the children of an object instance), "array-child" (they look at the children of an array instance), or "in-place" (they look at the instance itself).

assertions

The value of this keyword is an array which contains the names of all keywords defined in the vocab which provide assertion behavior.

annotations

The value of this keyword is an object whose keys are all of the keywords defined in the vocab which either produce annotations or collects them from subschemas.

The value of each property is an object with two properties:

  • kind identifies whether the keyword produces annotations, collects them, or both. The value is producer, collector, or an array of these values (much like the array format we use for type). This property is required.
  • producedAnnotation provides a schema for the annotation value if the keyword is an annotation producer. This property is optional. If the keyword is an annotation producer and this property is missing, then the annotation produced by the keyword is expected to be the value of the keyword itself (e.g. title produces an annotation equal to its value).

I've also updated all of the existing meta-schemas to use this new vocab so that you can see how it works. Interestingly [nods to @jdesrosiers], the core keywords don't have any behavior. They are informative only and are self-descriptive rather than instance-descriptive.

I plan on writing up an actual vocabulary for this, but I wanted to get what I had in front of people before I put in that effort.

I know that previous discussions on doing something like this had wanted the vocab URI to actually function as a URL to point to some separate machine-readable document, but I rather like the idea of a vocab-describing vocab and having everything just embedded directly into the meta-schemas.

I think that these keywords go a long way toward building a generic validator. For instance unevaluated* depends on annotation results from in-place applicators. With the definition provided in this proposal, a validator can now know what all of the in-place applicators are just from knowing the vocabularies. The implementation of unevaluated* doesn't need to change to also know to wait for the results from new in-place applicator keywords defined by extra vocabularies. (This is something that my implementation can't currently handle. I'd need to have code for new in-place applicators specifically, which is really awkward.)

That brings me to the other thing that could be included that I haven't added yet: annotation dependencies. For example, additionalProperties depends upon the annotation result from properties and patternProperties.


I had started with an alternate design of adding this information to the subschemas under the properties keyword, but I realized that doesn't really work for the same reason that required was moved out of the subschema into the parent schema going from draft 3 to draft 4.

I think introducing these aggregate keywords is more in line with what we already have, even though it can feel somewhat redundant to list the keywords multiple times.

@gregsdennis gregsdennis requested review from handrews, jdesrosiers and a team and removed request for jdesrosiers and handrews July 14, 2022 02:23
@karenetheridge
Copy link
Member

Shouldn't new vocabularies be prototyped in https://github.com/json-schema-org/json-schema-vocabularies first?

@gregsdennis
Copy link
Member Author

gregsdennis commented Jul 14, 2022

Probably, but not for this. That is for extension vocabs. This is intended to be integrated into the spec as it proposes modifications to the spec's meta-schemas.

From the readme:

This repository is for discussing possible extension vocabularies to be designed and documented outside of the formal JSON Schema organization's effort.

@karenetheridge
Copy link
Member

This is intended to be integrated into the spec.

Why should it be in the spec rather than as an extension?

@gregsdennis
Copy link
Member Author

gregsdennis commented Jul 14, 2022

Because it's intended to augment the specification's meta-schemas.

Note how the spec's meta-schemas now declare this new vocab in the $vocabulary keyword and incorporate the new keywords. Without it being integrated, you end up with a circular reference where the specification defines what an external vocabulary can be, but then uses this external vocabulary to help define itself.

@jdesrosiers
Copy link
Member

I'll have to look at this in more detail later, but here's my thoughts after a quick look.

Knowing applicators is useful because it gives us a way to know what is a schema, but the rest of the information I can't see any real use for at the moment other than documentation.

It occurs to me that this would lock down applicators to only being object-value, array-value, or in-place. We don't have any other examples and we've avoided keywords with more complex structure than that, but do we want to effectively disallow them?

@gregsdennis
Copy link
Member Author

but the rest of the information I can't see any real use for at the moment other than documentation.

It may not be useful to you, but for me, it's useful knowing what type an annotation is expected to be.

@gregsdennis
Copy link
Member Author

we've avoided keywords with more complex structure than that

I don't understand. An applicator can look at the current instance (in-place) or the child of an object or array. There's not really anything else given the JSON data model.

I can see where you may want to look at a combination of these, but I don't see any new options.

@jdesrosiers
Copy link
Member

It may not be useful to you, but for me

I didn't mean to imply that wasn't useful. I just meant I hadn't had a chance to think about it enough to see how it was useful. Please do share your vision of how you would use those values.

I don't understand. An applicator can look at the current instance (in-place) or the child of an object or array.

I'm referring to a keyword that has (for lack of a better term) sub-keywords. I don't have time to come up with a good example, so hopefully a bad one will get the point across for now.

{
  "myKeyword": {
    "foo": 3,
    "bar": { "type": "integer" }
  }
}

This is an applicator that is an object, but only of it's properties is a schema and the other is a number. Like I said, we don't have any keywords like this and we avoid doing things like this, but do we want to forbid this? Maybe we do. I'm not presenting an opinion at this point, just showing the example.

@notEthan
Copy link
Contributor

notEthan commented Jul 15, 2022

@jdesrosiers I think for that example, myKeyword would still be described as doing the same application as if it directly specified a schema. comparable to how dependentSchemas does inplace application though it does not directly contain a schema but has schemas on arbitrary properties.

But a keyword I've considered having a crack at a vocabulary for, which would fall outside of the ones described here, would be one that applies a schema to an arbitrary location identified by pointer, something like:

{
  "descendentsByPointer": {
    "/foo/bar": {"type": "object"},
    "/baz/name": {"type": "string"}
  }
}

This would be an inplace applicator (for an empty pointer), a child applicator, or an applicator for an arbitrary descendent.

I'm not suggesting this vocabulary needs to address my half-baked unimplemented keyword idea, just throwing out thoughts.

@notEthan
Copy link
Contributor

More broadly (than my reply above), I'm quite interested in this. It would help centralize metadata for each keyword, information that each implementation currently has to implement in code, to some degree. (At least, implementing unevaluated to rely on any inplace applicator, any implementation would have to code some form of inplace-applicator metadata.)

previous discussions on doing something like this had wanted the vocab URI to actually function as a URL to point to some separate machine-readable document, but I rather like the idea of a vocab-describing vocab and having everything just embedded directly into the meta-schemas.

I'm uncertain about this. It seems maybe outside the scope of the metaschema. I would say the metaschema's purpose is describing the structure and validity of schema documents, whereas this has to do with the operations of processing the schema. This would significantly broaden the ideas of what the metaschema is for.
I don't want to say that broadening is bad or wrong, at least not without some better suggestion to offer. I do think that for my own implementation, having this data in a separate vocabulary document would probably be better to work with, though that's a bit speculative until I actually implement anything.

Did you experiment with that idea of a separate document and end up finding that it worked better in the metaschema?

I had started with an alternate design of adding this information to the subschemas under the properties keyword, but I realized that doesn't really work for the same reason that required was moved out of the subschema into the parent schema going from draft 3 to draft 4.

I questioned this at first and tried to find a structuring that would put the information on subschemas describing each keyword, which seemed nicer. But you are right, it doesn't fit well there.

@gregsdennis
Copy link
Member Author

gregsdennis commented Jul 15, 2022

Did you experiment with that idea of a separate document and end up finding that it worked better in the metaschema?

When we had previously discussed this metadata, we had explored options for the separate file and came up with nothing that really worked well. This is the first thing I've seen that seems to accomplish this goal.

EDIT Unfortunately, I think a lot of that conversation is buried in history between myself and @handrews as he and I played with several ideas.

@handrews
Copy link
Contributor

@gregsdennis I finally remembered to take a look at this 😅

There are a lot of great ideas in here. A fair amount of this overlaps with the (at least) three keyword description formats I've come up with since we originally talked about having a separate file. (You haven't seen any of them because I haven't felt any of them were compelling enough to be worth proposing).

However, I'm skeptical that a vocabulary of JSON Schema keywords is the right way to accomplish this, even if we want to inline such descriptions into meta-schemas.

There are a couple of reasons for this:

  1. $vocabulary only applies to the first dynamic scope, which is why it needs to be redefined in each meta-schema, rather than being automatically produced by taking the union of all values of $vocabulary that apply to the (non-meta-)schema resource's root. From how you've written this, it looks like you are assuming that the fact that the default dialect meta-schema uses allOf on all of these vocabulary meta-schemas will pull in the necessary keyword descriptions, but it's not clear to me how that it supposed to work. The concept of a "vocabulary meta-schema" is an informal one, and not distinguishable at runtime from some other bit included in a meta-schema with allOf. So there's a divergence between how the set of vocabularies is determined and how you have the keywords specified. This could perhaps be overcome, but...
  2. These just don't feel like JSON Schema keywords to me, although they could perhaps be elements of a more complex version of $vocabulary. You mentioned that the core vocabulary keywords (aside from the applicators) "don't have any behavior", but I think my keyword behaviors concept has clarified that. Their behaviors are essential to the runtime functioning of JSON Schema in a way that could not be handled otherwise. Identifiers write to lexical or dynamic scopes, for example. But having independent applicator, assertion, annotation, etc. keywords doesn't make sense: they are only relevant in the context of $vocabulary. Even if you treated them as annotations similar to $vocabulary, they don't make sense unless associated with a specific URI in $vocabulary. They need to either be associated with that URI by an expanded $vocabulary syntax, or by being in a resource identified by that URI. Making them keywords in their own right just gives then the ability to be used in a places that don't have any clear or useful semantics.

@gregsdennis
Copy link
Member Author

gregsdennis commented Sep 23, 2022

$vocabulary only applies to the first dynamic scope, which is why it needs to be redefined in each meta-schema ... From how you've written this, it looks like you are assuming that the fact that the default dialect meta-schema uses allOf on all of these vocabulary meta-schemas will pull in the necessary keyword descriptions, but it's not clear to me how that it supposed to work.

I'm not sure I follow this.

Are you talking about how it's not included in the main schema.json? That was intentional because that meta-schema doesn't need to use this new vocab; only the child meta-schemas need to use those keywords. I understand that vocabularies don't get "pulled up."

The vocabulary vocabulary defines keywords that only serve to provide details about the keywords another vocabulary defines.

You mentioned that the core vocabulary keywords (aside from the applicators) "don't have any behavior", but I think my keyword behaviors concept has clarified that.

Yes, I wrote that before you shared your slides. The core keywords have behavior, but that behavior is toward the schema rather than toward the instance.

This proposal could (and probably should) be expanded to encompass your analysis. But for now, let's consider it a subset and evaluate it as such.

But having independent applicator, assertion, annotation, etc. keywords doesn't make sense: they are only relevant in the context of $vocabulary. Even if you treated them as annotations similar to $vocabulary, they don't make sense unless associated with a specific URI in $vocabulary. They need to either be associated with that URI by an expanded $vocabulary syntax, or by being in a resource identified by that URI. Making them keywords in their own right just gives then the ability to be used in a places that don't have any clear or useful semantics.

None of this makes sense to me. I think you think this proposal is doing more than is intended.

@handrews
Copy link
Contributor

Are you talking about how it's not included in the main schema.json? That was intentional because that meta-schema doesn't need to use this new vocab; only the child meta-schemas need to use those keywords. I understand that vocabularies don't get "pulled up."

The vocabulary vocabulary defines keywords that only serve to provide details about the keywords another vocabulary defines.

OK, but how does an implementation figure out how to read the new keywords in, say, https://json-schema.org/<whatever>/meta/applicator and take action based on their descriptions of the applicator keywords when it sees a $schema of https://json-schema.org/<whatever>/schema?

@gregsdennis
Copy link
Member Author

gregsdennis commented Sep 26, 2022

OK, but how does an implementation figure out how to read the new keywords in, say, https://json-schema.org//meta/applicator and take action based on their descriptions of the applicator keywords when it sees a $schema of https://json-schema.org//schema?

This data is still mostly informational. I don't expect an implementation to be able to figure out what to do with an applicator or assertion without some additional coding, just like today. I don't think we'll ever get to the point where an implementation can "just know" what to do with a new keyword that's not a pure annotation.

If this isn't what you're talking about, perhaps you can elaborate on what you mean by "take action." What action would you expect from an implementation?

@@ -2,7 +2,8 @@
"$schema": "https://json-schema.org/draft/next/schema",
"$id": "https://json-schema.org/draft/next/meta/applicator",
"$vocabulary": {
"https://json-schema.org/draft/next/vocab/applicator": true
"https://json-schema.org/draft/next/vocab/applicator": true,
"https://json-schema.org/draft/next/vocab/vocabulary": false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This says that schemas using https://json-schema.org/draft/next/meta/applicator as a meta-schema are able to use the vocabulary vocabulary, which I don't think is what you mean. You need this meta-schema's meta-schema (currently declared as https://json-schema.org/draft/next/schema) to include the vocabulary vocabulary in $vocabulary (buffalo buffalo buffalo buffalo buffalo...). In which case it probably would not be the default meta-schema because most schemas won't need to (and shouldn't) use the vocabulary vocabulary.

Copy link
Member Author

@gregsdennis gregsdennis Sep 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need this meta-schema's meta-schema (currently declared as https://json-schema.org/draft/next/schema) to include the vocabulary vocabulary in $vocabulary

But if that's just https://json-schema.org/draft/next/schema, it doesn't solve the problem you're describing.

What I want is for meta-schemas that describe vocabularies to be able to use these new keywords. Does that mean that we need a dedicated vocabulary-describing meta-schema? Then have (e.g.) applicator meta-schema reference that one in $schema?

So

{
  "$schema": "https://json-schema.org/draft/next/schema",
  "$id": "https://json-schema.org/draft/next/meta/vocabulary",
  "$vocabulary": {
    "https://json-schema.org/draft/next/vocab/core": true,
    "https://json-schema.org/draft/next/vocab/applicator": true,
    "https://json-schema.org/draft/next/vocab/unevaluated": true,
    "https://json-schema.org/draft/next/vocab/validation": true,
    "https://json-schema.org/draft/next/vocab/meta-data": true,
    "https://json-schema.org/draft/next/vocab/format-annotation": true,
    "https://json-schema.org/draft/next/vocab/content": true,
    "https://json-schema.org/draft/next/vocab/vocabulary": true
  },
  "$ref": "https://json-schema.org/draft/next/schema",
  ... // new vocab-vocab properties
}

Then https://json-schema.org/draft/next/meta/applicator and friends all have $schema: https://json-schema.org/draft/next/meta/vocabulary?

I think this also provides a pre-packaged meta-schema for other custom vocab meta-schemas to use.

Copy link
Member Author

@gregsdennis gregsdennis Sep 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if it has $schema: https://json-schema.org/draft/next/schema then it can't itself use the keywords it defines. So does it need to be its own meta-schema? That line of logic leads us to needing to resolve Keyword for identifying bootstrapping rules #217 first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$vocabulary is (effectively) an annotation. So when you use it in a meta-schema, it annotates the (non-meta-)schema and says "this schema can use this vocabulary".

If this were not the case, then every non-meta-schema would have to declare $vocabulary, which would be a mess.

So yes, if you want other vocab meta-schemas to use the vocabulary vocabulary (VV), then the VV needs to be in their meta-schema's $vocabulary. Since we would not want to put the VV in the default meta-schema, yes that means they would need a different meta-schema.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gregsdennis

This data is still mostly informational. I don't expect an implementation to be able to figure out what to do with an applicator or assertion without some additional coding, just like today. I don't think we'll ever get to the point where an implementation can "just know" what to do with a new keyword that's not a pure annotation.

If this isn't what you're talking about, perhaps you can elaborate on what you mean by "take action." What action would you expect from an implementation?

At minimum, we need to be able to use any vocabulary description to automatically determine what keywords are part of that vocabulary, which will let us distinguish known keywords from optional (and not directly-supported) vocabularies from completely unknown keywords. Otherwise, it's just JSON-formatted documentation, and that doesn't seem useful to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So yes, if you want other vocab meta-schemas to use the vocabulary vocabulary (VV), then the VV needs to be in their meta-schema's $vocabulary. Since we would not want to put the VV in the default meta-schema, yes that means they would need a different meta-schema.

I think this means that you agree with my commented approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gregsdennis yes, I think so (I think I got confused by the next comment after that)

@gregsdennis
Copy link
Member Author

This is an interesting idea, but I think it still needs work. It was good to get it down and have a chat about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants