-
-
Notifications
You must be signed in to change notification settings - Fork 266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recursive schema composition #558
Comments
If we were to replace all occurrences of {
"$schema": "http://json-schema.org/draft-07/hyper-schema#",
"$id": "http://json-schema.org/draft-07/hyper-schema#",
"title": "JSON Hyper-Schema",
"allOf": [ { "$ref": "http://json-schema.org/draft-07/schema#" } ],
"properties": {
"base": {
"type": "string",
"format": "uri-template"
},
"links": {
"type": "array",
"items": {
"$ref": "http://json-schema.org/draft-07/links#"
}
}
},
"links": [
{
"rel": "self",
"href": "{+%24id}"
}
]
} This is 24 lines. The current file is 69 lines, and every time we add, remove, or change an applicator those other lines need to be updated. If we also replaced {
"$id": "http://example.com/abcxyz",
"$schema": "http://json-schema.org/draft-07/hyper-schema#",
"allOf": [ {"$schema": "http://json-schema.org/draft-07/hyper-schema#" } ],
"properties": {
"abc": {...},
"links": {
"items": {
"properties": {
"xyz": {...}
}
}
}
}
} Without And of course, if "abc" and "xyz" are schema fields, then without etc. etc. etc. |
I think I understand this, but let me check. So the two schemas are the equivilent of:
However because bar-schema has an |
@Relequestual yes that is the equivalent of bar-schema (with foo-schema more or less inlined). However, the presence or absence of
So even if you are just creating and working with these in-memory, they implicitly have different base URIs so |
Right, but my point was, if I took the two schemas and made one, which would have the equivilent behaviour, then that would be it. WHICH looks like I've dereferenced the ref, but not included the $id. It's not suggesting an alternative, but more that's what this feature is aiming to achive. |
Right- I was responding to your
which to me implies that it would have worked without an |
Note that I've updated the "Alternatives" section in the initial comment with a discussion of #322 (parametrized/higher-order/templatized schemas) as a possible alternative solution. |
It looks like the problem statement is We want to be able to extend a some schema, and have recursive references refer back to the extended version. So what if I'm writing a JSON document, and I want a JSON Schema to be one of the values? {
type: "object",
properties: {
"name": { type: "string" },
"label": { type: "string" },
"range": { $ref: "http://json-schema.org/draft-07/hyper-schema#" }
}
} |
Perhaps there can be an argument that specifies substitutions to make when evaluating sub-schemas: "in sub-schemas, when it refers to < |
@awwright please don't change the problem statement. Or if you must, please explain why your problem statement is better to solve than the one I am stating. Otherwise the answer to "what if I want to do something different than what this issue is talking about" is "file an issue for that and we'll talk about it."
Why would I need to do that? I am following your oft-cited principle of least power, and proposing a feature that solves the use case that I see coming up a lot. And not anything beyond that. I do not need fully templated schemas, nor have I heard a huge clamoring of demand for them. I referenced an issue above about that topic as one of the alternatives, so I am aware of the possibility. We do, however have a real problem around extending recursive schemas. I am not aware of any way to solve the vocabulary problem effectively without being able to easily extend recursive schemas. And there is a lot of demand for multiple vocabulary support. So. I'm trying to solve recursive schema extension. Please do not derail my issue without an extremely compelling justification for why we should abandon this problem in favor of yours. |
@awwright I can see how your use case makes sense on its own, but I don't think it's needed for this. Perhaps when I write it as a PR it will make sense to you, or it will flush out problems pointing in the direction of your proposal. My feeling is that having an external thing that causes one URI to be treated as another is counter-intuitive. Part of the goal with I think there is room for expansion in this direction if we want to do so, by changing Given all of that and the lack of further objections, and given that this is required for #561 vocabularies, which has received support over the past several months from various quarters, I'm marking this accepted and moving to PRs. I do have to apologize for the rather harsh response before. While (as usual) it's no excuse, April was a very difficult month for me personally, for reasons that have since resolved. Hopefully I'll be able to stay a bit more even-keeled. |
Fair enough, thanks! I'll see about prototyping the proposal here and some alternatives. |
So, @awwright spotted a fatal, glaring flaw in this proposal in #589 (comment), namely that embedding a schema with {
"$schema": "http://json-schema.org/draft-08/hyper-schema#",
"type": "object",
"properties": {
"embedded": {"$ref": "http://json-schema.org/draft-08/hyper-schema#"}
}
} every ...and this is why we do reviews, folks! :-P I'm going to write up a few proposals, each in its own comment. Feel free to thumbs-up the comment of whichever proposal(s) you like, particularly if you don't have any further comments on them. And of course further comments/proposals are also welcome. The new proposals will obviously need to be a bit more complex. We should balance that complexity against the current error-prone, tedious process of re-declaring all references to the "base" schema in an extension that is required now. We want it to be as easy as possible to extend meta-schemas, but not at the expense of a feature that hardly anyone will understand. The current situation is verbose and annoying, but straightforward. It would still be possible to support modular vocabularies with the current approach. We'd just prefer it to be more elegant. I have the following requirements for any solution:
I prefer to just solve the recursive extension problem, but that is because I believe it to be simpler. If it turns out that solving a more general form of dynamic reference targets has more or less the same complexity, and there are use cases for it, I am open to expanding the problem space (contrary to what I posted earlier in this issue). |
So... I said I'd post some alternatives but it turns out they all get quite messy. I'm going to work more on the keyword behavioral classification (some of which appears in #602, with more to be filed) and annotation gathering process (#530) which will, I think, clarify some mechanisms that could make this work in a way people will find usable. For now, I am going to close the PR unmerged, and continue with the vocabulary work (#561) without this. While I think |
In chat @handrews pointed out some weaknesses (it's confusing if you don't already know that this uses a URI Reference, or if you use a validator that automatically loads schemas from the filesystem), but for the record I'll post the idea I had: "$alias" keyword, which accepts 2-3 arguments: (1) A schema to reference/apply; (2) a schema to extend/alias to another, which is usually the same as (1); and (3) the schema to apply instead. Note this isn't terribly different than the merge/patch proposal, except less powerful, but much simpler. Here's how hyper-schema could import and extend the validation meta-schema: {
$ref: "http://json-schema.org/draft-07/schema",
$alias: { "http://json-schema.org/draft-07/schema": "http://json-schema.org/draft-07/hyper-schema" }
} The argument could be reduced to an empty string since it's a URI reference: {
$ref: "http://json-schema.org/draft-07/schema",
$alias: { "http://json-schema.org/draft-07/schema": "" }
} Alternatively, define the keyword so that it does both: {
$refAlias: { "http://json-schema.org/draft-07/schema": "" }
} |
THIS WILL NOT BE MERGED. This PR is being used to illustrate a proposal for issue json-schema-org#558. If the root schema of the entry point schema (the schema document at which schema evaluation begins) contains `"$recursiveRoot": true` then that entry point is set throughout the schema evaluation process as the target of all `"$recursiveRef"` references, regardless of their values. Encountering additional `"$recursiveRoot"` keywords in non-entry point schema documents has _no effect_. The keyword MUST be ignored in subschemas, and in non-entry-point root schemas. If the entry point schema did **not** have `"$recursiveRoot": true"`, then `"$recursiveRef"` is evaluated exactly as if it were `"$ref"`. Its value is a URI reference, which is resolved according to the usual rules involving `"$id". ----------- The following changes were made: * schema.json, hyper-schema.json, and the additional example of hyper-operations.json (further extending hyper-schema.json) all have `"$recursiveRoot": true` * links.json does not use `"$recursiveRoot"` * The reference from each extension meta-schema to its "base" is a _normal_ `"$ref"`. Otherwise it would be an infinite loop. * All other schema references become `"$recursiveRef"`, with the same URI Reference value as before * All of the properties and $defs duplicated from schema.json to hyper-schema.json can now be removed Note that there were several odd trailing "#" fragments, which should not be present in `"$id"` in particular, so I dropped those. They are not part of this change I just found them surprising. Also, "propertyNames" had a bug. How does nobody notice this stuff? How do the meta-schemas have an apparently endless stream of bugs in them? UGH.
THIS WILL NOT BE MERGED. This PR is being used to ***illustrate*** a proposal for issue json-schema-org#558. If the root schema of a schema document contains `"$recursiveRoot": true` then that root schema is set for that schema document and any to which it referes as the target of all `"$recursiveRef"` references, regardless of their values. Encountering further `"$recursiveRoot"` keywords in root schemas of referenced documents does **not** further change the target. This will be explained in detail with comments added to the diff. `"$recursiveRoot"` MUST be ignored in subschemas. If no `"$recursiveRoot": true"` has been encountered, then `"$recursiveRef"` is evaluated exactly as if it were `"$ref"`. Its value is a URI reference, which is resolved according to the usual rules involving `"$id". The key point is that someone reading the schema will know that a `"$recursiveRef"` might have its target changed dynamically at runtime, while `"$ref"` never will. ----------- The following changes were made: * schema.json, hyper-schema.json, and the additional example of hyper-operations.json (further extending hyper-schema.json) all have `"$recursiveRoot": true` * links.json does not use `"$recursiveRoot"` * The reference from each extension meta-schema to its "base" is a _normal_ `"$ref"`. Otherwise it would be an infinite loop. * All other schema references become `"$recursiveRef"`, with the same URI Reference value as before * All of the properties and $defs duplicated from schema.json to hyper-schema.json can now be removed Note that there were several odd trailing "#" fragments, which should not be present in `"$id"` in particular, so I dropped those. They are not part of this change I just found them surprising. Also, "propertyNames" had a bug. How does nobody notice this stuff? How do the meta-schemas have an apparently endless stream of bugs in them? UGH.
Thanks, @awwright! I've been thinking about this more since our discussion. I feel that if we really want to go for the aliasing approach, we need to consider a generic syntax such as that proposed by #322 (parametrized/templatized/higher-order schemas). We might want to restrict where it can be used at least at first, but what I like about that proposal is that the thing being replaced needs to opt-in with What concerns me about the #322 proposal is that it raises a lot of questions about keywords needing to allow the I'd like to try again to stay focused on the recursion case, as I still feel that it is better motivated. And I think that "this schema allows recursive extension" is less of a problem for managing schema identity than full-on parameterization. Although I don't have a clear argument for that so I might be wrong. I think a double-opt-in approach is key: It needs to be clear which references are dynamic, and the dynamic target needs to be explicit rather than implicit. The latter point is where We can solve that with a keyword I'm calling Like This provides the double opt-in:
Rather than paste a bunch of examples in here, I have created PR #605 to show how schema.json, hyper-schema.json, links.json, and a hypothetical hyper-operations.json that further extends hyper-schema.json, work with these keywords. This covers several extension and embedding cases (I think all of them, but I might be missing something somewhere). |
I think a better term to use here than "extending" is "refining". JSON Schema is a constraint system. The empty (meta-)schema allows everything. Using Hyper-Schema's meta-schema doesn't really add The hyper-schema meta-schema refines the core/validation meta-schema by constraining those two keywords into a syntax that supports well-defined semantics. This is a bit mind-bending at first, but I've found that when I can get it across, usually by starting from explaining the empty schema, a lot of things seem to click for people. We could explain this in detail with examples on the web site (not just for the |
Very interesting. Given this, if I have some custom schema keywords should I bother creating my own meta-schema with all the complexity it entails? Or just informally document that we've added 2 keywords and explain what they mean? Given that this issue is still open it appears that the easiest way to create a custom meta-schema is to |
@mgwelch at the moment, either option is fine. If you use a validator that pays attention to And yes, copy-pasting the hyper-schema meta-schema is the best option for now. This will improve in draft-08 (I just need to update those PRs with review feedback and do the other part with the |
Thanks @handrews, I really appreciate the feedback. I think I will take a crack at the custom meta-schema but would appreciate a recommendation for a tool you use for validating schemas against custom meta-schemas. I find it hard to google for answers to anything involving meta-schemas. |
@mgwelch most validators will work if you just pass the schema as the instance and the meta-schema as the schema. A few will handle meta-schemas specially (Ajv in JavaScript, for example). |
PRs merged! |
Awesome! And this is a real live example?: https://github.com/json-schema-org/json-schema-spec/blob/master/hyper-schema.json That's nice. |
@mgwelch yup! And yes, it is! As we write fine-grained meta-schemas for vocabularies (assuming that goes the way I expect- you can see a bit of it in #671) you'll see us depend on the feature even more. I don't know how much it will get used outside of that context, but it will be available in conforming validators, so I guess we'll find out :-) |
TL;DR:
Re-using recursive schemas is a challenge.
$recurse
is a specialized version of$ref
with a context-dependent target$recurse
is alwaystrue
(discussed in the "alternatives" section)Example
APPARENTLY MANDATORY DISCLAIMER: This is a minimal contrived example, please do not point out all of the ways in which it is unrealistic or fails to be a convincing use case because you can refactor it. It's just showing the mechanism.
foo-schema:
bar-schema:
The instance:
is valid against the first schema, but not the second.
It is valid against foo-schema because the
"$recurse": true
is in foo-schema, which is the same document that we started processing. Therefore it behaves exactly like"$ref": "#"
. The recursive "foo" works as you'd expect with"$ref": "#"
, and foo-schema doesn't care about "bar" being there (additional properties are not forbidden).However, it is not valid against bar-schema because in that case, the
"$recurse": true
in foo-schema behaves like"$ref": "http://example.com/bar-schema"
, as bar-schema is the document that we started processing. Taking this step by step from the top down:$recurse
being involvedallOf
and$ref
to foo-schema. The top-level instance is an object, so we pass thetype
constraint"$recurse": true. Since we started processing with bar-schema, this is the equivalent of
"$ref": "bar-schema"allOf
and$ref
back to foo-schema, and pass the `"type": "object" constraint"$recurse": true
to go into the next level "foo", and once again this is treated as"$ref": "bar-schema"
Use cases
The primary use case for this meta-schemas. For example, the hyper-schema meta-schema has to re-define all of the applicator keywords from the core and validation meta-schema. And if something wanted to extend hyper-schema, not only would they have to re-declare all of the core applicators a third time, but also re-declare all of the LDO keywords that use
"$ref": "#"
.As we make more vocabularies and encourage more extensions, this rapidly becomes untenable.
I will show what the hyper-schema meta-schema would look like with
$recurse
in a subsequent comment.There are some other use cases in hypermedia with common response formats, but they are all simpler than the meta-schema use case.
Alternatives
Doca's
cfRecurse
This is a simplified version of an extension keyword,
cfRecurse
, used with Doca. That keyword takes a JSON Pointer (not a URI fragment) that is evaluated with respect to the post-$ref
-resolution in-memory data structure. [EDIT: Although don't try it right now, it's broken, long story that is totally irrelevant to the proposal.]If that has you scratching your head, that's part of why I'm not proposing
cfRecurse
's exact behavior.In fact, Doca only supports
""
(the root JSON Pointer) as acfRecurse
value, and no one has ever asked for any other path. The use case really just comes up for us with pure recursion.Specifying any other pointer requires knowing the structure of the in-memory document. And when the whole point is that you don't know what your original root schema (where processing began) will be until runtime, you cannot know that structure.
One could treat the JSON Pointer as an interface constraint- "this schema may only be used with an initial document that has a
/definitions/foo
schema", but that is a lot of complexity for something that has never come up in practice.For this reason,
$recurse
does not take a meaningful value. I chosetrue
becausefalse
ornull
would be counter-intuitive (you'd expect those values to not do things), and a number, string, array, or object would be much more subject to error or misinterpretation.Parametrized schemas
#322 proposes a general schema parametrization feature, which could possibly be used to implement this feature. It would look something like:
Parameterized schema for
oneOf
:Using the parametrized schema:
See #322 for an explanation of how this works.
I'd rather not open the schema parametrization can of worms right now.
$recurse
is a much simpler and easy to implement proposal and meets the core need for meta-schema extensibility. It does not preclude implementing schema parametrization, either in a later draft or as an extension vocabulary of some sort (it makes an interesting test case for vocabulary support, actually).Summary
Runtime resolution (whether
$recurse
or parametrized schemas) is sufficiently new and powerful that I feel we should lock it down to the simplest case with a clear need. We can always extend it later, but it's hard to pull these things back.The text was updated successfully, but these errors were encountered: