Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve/simplify "$recursiveAnchor" and "$recursiveRef" #909

Closed
handrews opened this issue May 3, 2020 · 16 comments · Fixed by #930
Closed

Improve/simplify "$recursiveAnchor" and "$recursiveRef" #909

handrews opened this issue May 3, 2020 · 16 comments · Fixed by #930

Comments

@handrews
Copy link
Contributor

handrews commented May 3, 2020

These keywords depend on the concept of the dynamic scope, the runtime schema traversal structure including which references were followed how often to reach the current schema+instance locations.

Note that @awwright has also filed #907 regarding this same problems space. While I don't currently see how that proposal solves the entire use case, I am very much open to being sold on other proposals for this.


Currently, $recursiveAnchor takes a boolean which MUST be true, and $recursiveRef takes a URI-reference which MUST either be an empty fragment or an absolute-URI. (Note: The text says it MUST be "#", but we actually use absolute-URIs as well so the text is in error.).

The mechanism for how these works is described a complex process of evaluating the $recursiveRef twice with different base URIs, depending on the presence of $recursiveAnchor.

This is why there's a (incorrectly specified) restriction on what kind of URI-references are legal for $recursiveRef. Path references (e.g. foo/bar) when used with this process could produce very confusing results without any clear use case.

It's not actually that important to understand exactly how it works currently. It's probably better to think of this as a new proposal.

We at minimum need to fix the 'MUST be "#"' problem, and it would be nice to make this less complex.


I propose changing $recursiveAnchor so that it takes the same sort of value as $anchor, and produces a plain name fragment identifier, just like $anchor.

The difference between the two is that when a $recursiveAnchor is targeted by a $recursiveRef, the final resolved target will not necessarily be the locally declared fragment, but the fragment of the same name in the outermost dynamic scope that also declares that fragment name with $recursiveAnchor. That's a lot of word salad, so we'll go through an example. But first:

  • $recursiveRefs that do not point to a $recursiveAnchor act just like $ref
  • $refs that point to a $recursiveAnchor work just a if it were $anchor
  • Only when $recursiveRef targets a $recursiveAnchor do they behave differently.

This ensures that all schemas involved (the schema object making the reference, the local subschema with the plain name fragment, and the final resolved subschema with the plain name fragment) all intentionally opt-in to this behavior. We do not want magical behavior involving $ref or $anchor, ever. Most people will use $ref and $anchor (or at least $ref), so they need to be simple and predictable. $recursiveRef and $recursiveAnchor are advanced keywords suitable for a very specific use case.

To see how this works, let's look at:

In all cases, I'm omitting $vocabulary, title, and other inessentials to save space, and of course I'm using the new $recursiveAnchor syntax.

Applicator Vocabulary

I'm only including enough applicators here to get the point across

{
    "$schema": "https://json-schema.org/draft/2019-09/schema",
    "$id": "https://json-schema.org/draft/2019-09/meta/applicator",

    "$recursiveAnchor": "meta",

    "type": ["object", "boolean"],
    "properties": {
        "additionalProperties": { "$recursiveRef": "#meta" },
        "unevaluatedProperties": { "$recursiveRef": "#meta" },
        "properties": {
            "type": "object",
            "additionalProperties": { "$recursiveRef": "#meta" },
            "default": {}
        },  
        "allOf": { "$ref": "#/$defs/schemaArray" },
        "anyOf": { "$ref": "#/$defs/schemaArray" },
        "oneOf": { "$ref": "#/$defs/schemaArray" }
    },
    "$defs": {
        "schemaArray": {
            "type": "array",
            "minItems": 1,
            "items": { "$recursiveRef": "#meta" }
        }
    }
}

Standard Dialect

{
    "$schema": "https://json-schema.org/draft/2019-09/schema",
    "$id": "https://json-schema.org/draft/2019-09/schema",

    "$recursiveAnchor": "meta",

    "allOf": [
        {"$ref": "meta/core"},
        {"$ref": "meta/applicator"},
        {"$ref": "meta/validation"},
        {"$ref": "meta/meta-data"},
        {"$ref": "meta/format"},
        {"$ref": "meta/content"}
    ],  
    "type": ["object", "boolean"],
}

When we evaluate just the applicator meta-schema against a schema, there is only one meta-schema resource involved. In this situation, $recursiveAnchor and $recursiveRef behave exactly like $anchor and $ref.

When we evaluate the standard dialect meta-schema, the $recursive* keywords behave differently, as follows:

  1. We begin evaluating from the root of the standard dialect meta-schema. This is the outermost dynamic scope
  2. Next we follow each allOf branch in turn, most notably the one to "meta/applicator"
  3. Within the applicator meta-schema, we look at specific property schemas and eventually see a "$recursiveRef": "#meta"
  4. We look for the "#meta" fragment declaration in the current schema resource, and find it- it is defined by "$recursiveAnchor", so we have more steps to take.
  5. We look at each resource in the dynamic scope, looking for the outermost resource that also declares a #meta fragment with "$recursiveAnchor"
  6. The dialect meta-schema also has "$recursiveAnchor": "meta", and is the outermost scope that does (it's also the only other scope in this example).
  7. The resolved runtime target of the "$recursiveRef": "#meta" in the applicator vocabulary is the "#meta" fragment in the dialect meta-schema.

The effect here is that since we started from the dialect schema, which declares a "$recursiveAnchor": "meta", and the applicator vocab schema also declares a "$recursiveAnchor": "meta", the "$recursiveRef": "#meta" in the applicator vocab schema resolves to the anchor in the dialect schema. Same plain name fragment, different resource.

@gregsdennis
Copy link
Member

I am not opposed to this. It would allow multiple recursive paths, which could yield some interesting schema sets.

@jdesrosiers
Copy link
Member

The implementation will be a little more complicated, but this should work. The symmetry with $anchor does make it easier to understand.

@handrews
Copy link
Contributor Author

handrews commented May 7, 2020

It occurs to me that these keywords work kind of like setjmp()/longjmp(), with the recursive anchor name serving as the jump environment variable. I'm probably dating myself to even mention them, and I'm not sure if that analogy is helpful, but it did come to mind.

@handrews
Copy link
Contributor Author

handrews commented May 7, 2020

I'm going to proceed with a PR for this b/c it has been reasonably well-received and definitely improves some important things. If #907 or #911 end up being how we want to go, we can still do those instead if that's the right move. It just helps me to go with this for now.

@Relequestual
Copy link
Member

I'm still catching up with the slack chatter, but I'm not sure I fully understand what the change is here.

In your example, if the $recursiveAnchor was simply true, and the $recursiveRef was #, is it not the same behaviour?

It would allow multiple recursive paths, which could yield some interesting schema sets. - @gregsdennis

If I'm understanding right, this change simply allows for multiple recursive anchor targets. Is this correct?

If I am, what's the usecase? I didn't see it here (sorry if I've missed it). I'm not saying there isn't a use case, just I'm not sure what it is right now. If there isn't a clear use case, this seems like adding complexity good reason (Which I don't believe anyone here would do), so I'm going to assume I'm missing something...?

@handrews
Copy link
Contributor Author

handrews commented May 8, 2020

@Relequestual

If I'm understanding right, this change simply allows for multiple recursive anchor targets. Is this correct? If I am, what's the usecase?

That's more of an interesting side effect than the main point. I'll come back to the use case question at the end.

this seems like adding complexity

Boolean form:

  • Unique and therefore unintuitive: anchors in all other contexts/media types are strings, not booleans
  • Relies on the base URI concept, which people really do not understand
  • Unrestricted form (if we allowed things like "$recursiveRef": "foo/bar") produces really weird behavior
  • If you combine two sets of recursive schemas, the $recursiveRef in one set will break (because whichever one is entered first will capture the refs in the other one)

Name form:

  • Matches $anchor syntax exactly, and behavior is closely related
  • Does not rely on base URI concept
  • No restrictions necessary: $recursiveRef targets that are not $recursiveAnchors behave like normal refs, as opposed to other parts of the URI reference e.g. "foo/bar" causing weirder behavior.

The goals are:

  1. Get the base URI crap out: it's both confusing and unnecessary
  2. Make recursive and non-recursive anchors look more alike
  3. Remove restrictions on $recursiveRef values (we were violating the existing restriction in the links schema used by the Hyper-Schema meta-schema)

There is definitely a use case for named anchors vs only #, which is when you cannot put the recursive thing in the resource root. At one point (before we came up with the idea of embedded schema resources, which makes this irrelevant), some were asking us to restrict $schema to the document root. The schema document root is always applied to the instance document root, so you have to put the subschema (which is the thing that really recurses, excluding $schema) somewhere else:

{
    "type": "object",
    "$recursiveRef": "#subschema",
    "properties": {"$schema": {"type": "string", "format": "uri"}},
    "$defs": {
        "subschema": {
            "$recursiveAnchor": "subschema",
            "properties": {
                "additionalProperties": {"$recursiveRef": "#subschema"},
                "not": {"$recursiveRef": "#subschema"},
                ...
            }
        }
    }
}

Now it would be possible to rework this by making it multiple schema resources and using the boolean $recursiveAnchor, but it starts getting more and more convoluted. You could try to allow things like "$recursiveRef": "#/$defs/subschema", but now you're relying on parallel structures in the base vs extended schemas, and that's fragile. Using a named anchor is more robust.

As for multiple recursive anchors at once, if you have two sets of recursive schemas and you want them to refer to each other, this should work intuitively, as opposed to the boolean form where you'll recurse to whichever one you entered first.

Do we need that? Unclear. But I like that it's no longer egregiously broken like it is with the boolean form.

@Relequestual
Copy link
Member

Wow. Totally sold.

I don’t know how the issue was identified, but your response clearly illustrates there’s an issue with an easy solution.

People are going to struggle because of “base URI (I still forget this confuses people) is a good reason to make the change. This is already an advanced feature that’s only intended for meta schema authors.

If you want to do a minimal PR on this, or at least kick it off, I’m happy to pick it up. I should have some time this week!

@jdesrosiers
Copy link
Member

jdesrosiers commented May 12, 2020

I made this change in my implementation and it seem to work fine. It took a while to figure out exactly what I needed to do, but when I figured it out, the actual code change was small and only took a couple minutes. In fact, it was so small a change that it took me a while to convince myself that it worked and I wasn't missing something.

The next issue I'd have to tackle is how to support the 2019-09 version in 2019-09 and the 2020-XX version in 2020-XX. What should happen when a 2020-XX schema extends a 2019-09 schema? If I figure that out, I'll deploy a dialect to https://json-schema.hyperjump.io so people can test it out themselves.

@handrews
Copy link
Contributor Author

@jdesrosiers thanks for trying this out!

I've been struggling a bit with 2019-09 vs 2020-XX. To the best of my knowledge, 2019-09 isn't substantially deployed anywhere. Until the last couple weeks, there was only one implementation.

My ideal approach would actually be to discourage implementing or supporting 2019-09 and asking folks to just update to 2020-XX. (And if they want to support draft-06 and draft-07 that's fine, just skip 2019-09).

2019-09 has served its purpose: it provoked feedback and was a successful proof-of-concept for re-converging with OpenAPI. I don't see any need for it to be a production draft in any sort of ongoing way.

@handrews
Copy link
Contributor Author

It's not like 2020-XX will be a radical departure. If you implemented 2019-09, then rolling that forward to 2020-XX should be pretty simple.

@handrews
Copy link
Contributor Author

@jdesrosiers @Relequestual @gregsdennis the more I think about this the more $dynamicAnchor / $dynamicRef (to align with the concept of "dynamic scope") makes sense. How would y'all feel about that change?

That would also mean that if anyone really wanted to keep 2019-09 as an option, it wouldn't conflict anymore.

@jdesrosiers
Copy link
Member

I'm ok with ditching 2019-09 and/or changing the name.

However, I did figure out a way to support both versions (mostly). If we consider "$recursiveAnchor": true to be equivalent to "$recursiveAnchor": "" a 2020-XX schema can extend a 2019-09 schema. However, not everything is going to be possible the other way around because 2020-XX can do more things than 2019-09 can express.

You can try it out on http://json-schema.hyperjump.io by using the cheat code "$schema": "https://json-schema.org/draft/2020-XX/schema"

@gregsdennis
Copy link
Member

@handrews currently $recursiveAnchor needs to be present at each link in the reference chain. Is that still true with your proposal? Would we have to have $dynamicAnchor present at each link with the same name, or would it suffice to define it once in the "dynamic scope" so that it acts more like the new $anchor (previously a function of $id)?

(I'm fine with calling it $dynamic*.)

@jdesrosiers
Copy link
Member

currently $recursiveAnchor needs to be present at each link in the reference chain.

Wait, is that true? That's not how I implemented it. The dynamic scope gets passed through every reference whether $recrusiveAnchor is present or not.

@handrews
Copy link
Contributor Author

I think @jdesrosiers is right, although there may have been a stage where it was discussed more like @gregsdennis states. Honestly I had to go look it up.

This is part of why I want to revamp these: the idea that every point along an extension needs to signal its participation is the logical conclusion of the cooperative model, but then you get into whether an intervening scope is part of the extension chain or some other scope. For example, the recursion for Hyper-Schema bounces through the links schema, potentially switching among the links schema and various recursive meta-schemas many times.

would it suffice to define it once in the "dynamic scope" so that it acts more like the new $anchor (previously a function of $id)?

@gregsdennis I seriously considered this which is one reason that the PR is taking so long.

After going back and forth, I decided that it was too magical. A schema having a reference with no clear resolution is somewhat confusing- it's hard to tell whether it was intentional or a mistake, although the fact that it's a $dynamicRef would lean towards intentional.

The idea is that the same $dynamicAnchor should be used on schemas that are playing the same role, just with one superseding the other. That is more clear if at least the initial and final schema object have the same name. Otherwise we're kind of back in the situation where a stray boolean $recursiveAnchor could capture the $recursiveRef to an unrelated schema object.

So that initial $dynamicAnchor now identifies which schema is an extension point, and then some later person reading the schemas can probably figure out whether the $dynamicAnchor with the same name elsewhere is actually extending that point or is just an unfortunate naming clash.

@gregsdennis
Copy link
Member

I think @jdesrosiers is right, although there may have been a stage where it was discussed more like @gregsdennis states. Honestly I had to go look it up.

[reads code]

Yeah, I'm not requiring it at every level. Okay. @handrews and I definitely discussed that though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants