-
-
Notifications
You must be signed in to change notification settings - Fork 266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specify process for validating a compound schema document against meta-schema(s) #936
Comments
No, we do not. draft-04 is not forwards compatable in this way. Besides the fact we want a hard "you should upgrade if you can" message, being able to support such would put a burden on developers I do not want. Realistically we should RECOMMEND that schema authors do not embed or referece JSON Schema documents which are constructed for different versions.
What I was attempting to convey, in a very sleepy state, is the following... Say your root schema is 2020-whatever. You process the schema against the meta-schema (if you do that kinda thing), and if you encounter any embedded resource which identifies as using a different feature set, you treat it as it's own document, and validating it as such accordingly. Effectly, if you identify an embedded schema resource, you don't validate it as part of the meta-data validation. This is achievable using a meta-schema, although it may look a little narley to look for a It means that you kind of re-take apart the embedded documents and process them individually.
My thoughts were, if no In the case where |
I think this makes sense. @jdesrosiers is this what you said you're doing already?
This is covered in 2019-09 for independent documents, although it took me a while to find it (we don't talk about it under TL;DR: behavior when the vocabulary can't be determined is implementation-defined, because that's essentially what it always was before.
Yes, that's in the PR and AFAICT not controversial.
Part of the reason for doing this at all is acknowledging the reality that in a large ecosystem you do not necessarily control all schemas involved. If you're relying on stuff from from some officially maintained set of schemas (say, 3rd-party data format schemas) but need to use newer features in your local schemas, this will happen. With the OAS folks, we'd talked about being able to retrofit an Technically, the way the spec has always been written, each schema document is loaded under its own rules. Because nothing in the spec says that the processing is determined by the It also might be worth noting that transcluding an older draft resource into a 2020-NN resource does not mean that that older embedded resource suddenly gains the ability to further embed resources with different |
I don't think we should not require (as in the RFC "MUST" language) any earlier draft to be supported via $schema at all. IMO the advantage of $schema keywords in subschemas is to switch vocabularies, not to switch keyword semantics to older specification definitions. To properly support earlier schema versions correctly will be a real PITA -- e.g. earlier drafts even allowed for $refs to point to anything, not just a subschema (for example, |
@karenetheridge yes, in the long run I expect this to be more about switching among different meta-schemas and/or vocabularies in bundled resource documents. I don't personally care if anyone implements old drafts or not. However, draft-06 and later would not be particularly hard ( I would be fine with restricting this to 2020-NN and later, but either way we still need to handle the rest of the questions raised here.
Continued usage of old drafts suggests that the answer is yes, that's too idealistic. :( |
This is actually an edge case that I missed. My implementation doesn't handle this properly, but I don't see why it couldn't. I just need to check for a sibling As for custom dialects/vocabularies/meta-schemas, I have two ways of dealing with those. If the custom meta-schema uses the Schema: {
"id": "https://exmaple.com/my-schema",
"$schema": "https://example.com/my-dialect",
...
} Meta-schema: {
"id": "https://example.com/my-dialect",
"$schema": "http://json-schema.org/draft-04/schema#",
...
} This is just a nicety to make it easier to make simple meta-schema changes without needing to write any configuration. However, if the custom meta-schema uses some of it's new keywords in the meta-schema itself (like the hyper-schema meta-schema), you can't use the extended/modified schema as the
Meta-validation can be handled properly without any change to meta-schemas if schemas are processed in a certain way when they are loaded. My implementation splits embedded schemas into the equivalent two schemas with a reference when schemas are loaded. The schemas are then validated separately against the meta-schema that applies that schema. (example uses old version of {
"$id": "https://example.com/schema1",
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"foo": {
"$id": "https://example.com/schema2",
"$schema": "http://json-schema.org/draft-06/schema#",
"type": "string"
}
}
} This gets converted to two schemas when loaded. {
"$id": "https://example.com/schema1",
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"foo": { "$ref": "https://example.com/schema2" }
}
}
{
"$id": "https://example.com/schema2",
"$schema": "http://json-schema.org/draft-06/schema#",
"type": "string"
} This all works great if you know you are working with a schema and can split it out when needed. The problem comes when you validate a schema as an instance against a meta-schema. The validator doesn't know that an instance is a schema and should therefore be processed as a schema rather than plain JSON. I can think of two solutions, but they are both awkward. The first is to define bundled schemas as a transport encoding and not a valid schema by itself. Sticking that bundle into a validator and expecting it to validate against a meta-schema is just wrong. The other solution is to introduce some kind of special keyword or type that indicates that a value is a schema of any dialect (which is something a recursive reference can't do). You could then replace all recursive references in meta-schemas with the new keyword/type and validators would have the context necessary to validate a bundled schema. I would probably go with |
@jdesrosiers I'm not totally clear on the second solution which you've implemented in your comment. Could you maybe exapnd it a little please? What we're effectivly saying is, Schemas need to have a different processing model than simply applying the meta-schema, right? I don't think we CAN form a correct meta-schema for these situations, unless we re-create old meta-schemas in newer versions of JSON Schema (Which I think would cause some confusion).
Yes, I think we need to make sure that's covered! Good call. |
That's true, but it only addresses the part of the problem that's effectively solved. There are two distinct situations.
Sure. The problem is that we can't treat an instance as a schema because we don't know that it's a schema. The solution I was proposing is to annotate it somehow so the validator knows that it's a schema and can process it differently. My suggestion was to add a new An instance is valid against This solution would also eliminate the need for special processing when loading a schema. If it works when we don't know it's it's a schema, we don't need to do anything special when we do know it's a schema.
Right. That's the biggest problem with Honestly, at this point I like the other solution better, "define bundled schemas as a transport encoding and not a valid schema by itself". It's simple, doesn't require new keywords, and it's backwards compatible. |
I've been away from this issue for a while so I'm going to try and summarise what I feel is the consensus here, ask people to agree or dissagree on that summary of consensus, and look to move this to a PR.
Yup, in fact, I came to this colusion about 5 mins before reading the first comment of this issue again.
So, an implementation needs to recognise when they are handling a Schema which has an embedded resource, and act as if that resource has been
I think what's being said here is, in this regard, we don't need to change anything. (Although as an aside, I find it worrying that when presented with an unknown I'm not sure my last quote and comment requires any action. This issue is blocking #914 I'm not 100% sure if there's anything relating to handling meta schemas in this issue or not... I'd need to re-parse. |
I merged #914 because I think a follow up PR is fine, and the basis of that PR is correct. |
You mean, "handling a schema", not "handling an instance", right? We can't make any assumptions about the semantics of instances. The important bit that's missing from your summary is addressing the problem that a meta schema can't properly describe a schema with embedded schemas. It should be included in the spec that when validating a schema (as the instance) against a meta-schema (as the schema), the schema can't have embedded schemas. |
Thanks @jdesrosiers I've fixed /instance/schema/. Ah yes, good point... taking an earlier quote....
Essentially, JSON Schema does not afford provision to validate a bundled schema by simply applying a meta-schema without the forknowledge that the instance is a schema. Simply saying
I'm not sure is what we want. We DO want implementations to be able to validate a bundled schema document... but only by virtue of handling each resource individually with the appropriate JSON Schema dialect. To paraphrase, a bundled schema document may be formed out of resources with different JSON Schema dialect, and as such an implementation that applies a meta-schema to the bundled schema document as if it were simply JSON, is likely to encounter incorrect validation results. Implementers may provide a configuration option to allow a user to identify the provided instance as a bundled schema document, which allows the implementation to apply meta-schemas to individual schema resources for which they know the meta-schema associated with dialect identifier ( This is sort of paraphrasing from #936 (comment) also. I guess the main difference to the processing model is: if the instance is identified as a schema, I don't know how you define exactly the processing of the root instance location which contains the embedded resource. Extracting from the link above, I suggested that the validation result be Does this make any sense @jdesrosiers |
@Relequestual This doesn't feel right to me. It violates one of the core principles of JSON Schema. JSON Schema validates plain JSON instances only. Nothing in the schema has any semantics that effects validation. This is the principle we cite when we say Because we are violating a core principle, it could be challenging for implementers to adapt their implementations to support a completely different validation mode. If this is a path we want to take, I think there should be some proof-of-concept to make sure it's feasible. |
OK. Assuming you meant "nothing in the instance has any semantics that effects validation" as opposed to "nothing in the schema has any semantics that effects validation". If this were so, by default, the validation process would remain as is, and validation of the bundle (as an instance) against the meta-schema might fail. If the config option is set, I see two approaches to processing the instance we now know is a bundled schema document. Both appraoches would avoid needing to add a new type, which I'd like to avoid. It has implications for tooling beyond just validation, and I don't feel it's necessary. Two possible approaches similar to your suggested approach:
Does the 2nd approach sound feasable and similar enough to your currently working approach? |
Ha. Yes, that's what I meant.
I wasn't suggesting that either. The point isn't how it identifies as a schema. The point is that it needs to be treated as something other than plain JSON. In this case, it's After thinking about implementation a bit more, I realized that if we treat an instance as a schema, then the schema is not necessary. No matter what the schema (meta-schema) is, we are actually validating against all meta-schemas the implementation supports, not just that meta-schema. The instance (schema) determines what schema ( My suggestion of adding a |
The issue here is that people want to validate a bundled schema. We can specify that AS IS you cannot, but that it must be deconstructed into the individual schemas resources first, then validated. I guess the key question here is, will implementations be required to provide a means / function to do this? If we do require implementations to provide a means, then we have to define the approach. A viable approach would be as was quoted in the first comment of this issue. If we do NOT require implementations to provide a means, then we leave it up to each implementation to implement it if they choose, and end up with N potential solutions, where the resulting output may differ depending on approach. My expectation (if my suggested approach was followed) is to result in multiple validation results (because of the validation mode has been indicated as validating a schema document with bundled schema resources).
I believed the simplest approach would be to, extract out each schema resource, replacing them with |
They can still do that, just not through the standard
Even simpler than replacing embedded schemas with If it helps, this is what my implementation does during each phase of the process.
|
I think we got some cross talking happening here, because I think we're on the same page based on your last comment! =D Schema resource bundles can be validated, but cannot be validated using the standard "apply schema (meta-schema) to instance (schema resource bundle)". In stead, implementations may prove another means by which to allow validation of schema resource bundles. And yes, your approach is actually a LOT cleaner, because once decomposed, you can then apply the standard validation process. I imagine there are some implications here for verbose validation output... Anyway, I'm going to mark this as accepted and move to writing a PR. |
Progress: I've written some notes on this based on the above discussions. I will look to have a PR ready before mid next week. |
I've made some progress, and have a further consideration which I think may have been mentioned on slack or another issue... When bundling a schema, you cannot simply replace the schema object which contains the I started to reason about how to do this. I considered an approach where you must wrap the embedded schema resource in an I considered an approach where we allow Finally, I settle on an approach where, embedded schema resources MUST be put into My only idea here is the Please give me your suggestions? Alternativly, we say NOTHING on how to construct a Compound Schema Document. But I KNOW the most popular schema de-referencing library is full of holes that people WILL trip over, as it JUST replaces the schema object containing I would feel bad to say nothing, but even if we DO specify, people are still likely to try use that library, and run into problems. At least if we said something, it would be easier for someone to create a standardised bundler. (Please do not re-open unrelated concerns around this issue thread, for now. PR in progress.) |
Define Compound Schema Document and associated concerns
@Relequestual It's was @handrews that brought up these bundling issues when making the I like the option of allowing Let's take this example of a draft-07 schema.
In this example,
So, the values of Now let's interpret the same schema as a draft 2019-09 schema. Now, the reference (the part that can be replaced) is no longer an object, it's a string:
At this point the value of The problem we have with 2019-09 references is that
The way I see it, the only downside to allowing this syntax is that tools can't identify a reference without knowing the vocabulary of the schema it's working with. I only know that In summary, I think it makes sense to allow the value of |
I didn't quite understand this at first, but I'm coming around. I think this should work. How this would be referenced is irrelevant, right? The actual references won't change. They will still reference an identifier. The names of these The only thing I'm not sure of is if the |
Addressing the latest comment only... Kinda.
I don't like it, but I can't think of an alternative solution that doesn't mess with other things (such as "the value of |
I'm a moron. Let me update this comment before reply. Spoke to Henry. It's obvious and I'm dumb. |
My example was a really bad example. It's kind of obvious, we (I) just forgot because you can't do it within a SINGLE resource anymore.
The key of the definition COULD be anything. It doesn't matter. |
Define Compound Schema Document and associated concerns
Define compound schema documents (#936)
PR #914 is blocked on the question of how to validate schema document containing embedded schema resources with differing meta-schemas.
In my view, when loading a schema document, an implementation should be able to recognize embedded schema resources and treat them separately, as if they were
$ref
'd.There are two concerns:
id
.I think we should say no, if you want to use draft-04 with draft 2020-NN, you have to
$ref
it. Since draft-04 did not have vocabularies, there is no way to tell the difference between a custom draft-04 meta-schema relying onid
and a custom meta-schema of another draft, that happens to haveid
next to it either b/c of a typo or someone did a really weird extension.@Relequestual says:
Which I almost follow but it's late and I think with the other points this deserves to be visible in an issue.
The text was updated successfully, but these errors were encountered: