Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why do we need a special "Internal References" section? #545

Closed
handrews opened this issue Feb 7, 2018 · 11 comments
Closed

Why do we need a special "Internal References" section? #545

handrews opened this issue Feb 7, 2018 · 11 comments

Comments

@handrews
Copy link
Contributor

handrews commented Feb 7, 2018

I've lost track of the reason for Section 9.2.1 Internal References. Is this intended to produce any difference in actual behavior from just relying on the natural behavior of defining a plain name fragment with $id?

The only thing that I can see that might have any effect is (emphasis added):

which is understood as the schema defined elsewhere in the same document without needing to resolve the fragment against the base URI.

So given this schema, which modifies the 9.2.1 example to have two nested "items", the outer of which sets its own absolute URI $id:

{
    "$id": "http://example.net/root.json",
    "items": {
        "$id": "http://example.net/other.json",
        "items": {
            "type": "array",
            "items": { "$ref": "#item" }
        }
    },
    "definitions": {
        "single": {
            "$id": "#item",
            "type": "integer"
        }
    }
}

If we go by pure RFC 3986 rules, then the $ref resolves to http://example.net/other.json#item which does not appear to exist, as the fully resolve URI of the schema at #/definitions/single is http://example.net/root.json#item.

But the "in the same file" wording could be read to mean "forget base URIs, if there's a plain name fragment in the same physical file, you MUST ignore RFC 3986 and resolve to that fragment even if the fully resolved URIs of the $ref and $id in question are different.

So my questions are:

  1. Is this difference in behavior what is intended?
  2. If so, WHY???????
  3. If we don't have a really, really, really compelling reason for the behavior, can we get rid of it and just rely on the usual interpretation of fragments and URI reference resolution per RFC 3986? $id is hard enough for people to get right without special-casing it.

Paging @awwright @Relequestual @johandorland

@handrews handrews added this to the draft-08 milestone Feb 7, 2018
@handrews
Copy link
Contributor Author

handrews commented Feb 7, 2018

I think the original motivation for this was draft-04 §7.2.4 Inline dereferencing and fragments, which made a point of saying that resolving plain-name fragments defined by id (as $id was then called) was an optional behavior (because "inline dereferencing" was said to be optional, while "canonical dereferencing" was required, but why such a fragment wouldn't be usable via canonical dereferencing was never clear to me- it's a fragment, with resolution rules specified by the media type, it should work just fine).

The closest thing to that concept that we've kept is "which is understood... without needing" bit in:

which is understood as the schema defined elsewhere in the same document without needing to resolve the fragment against the base URI.

Which sounds like an implementation MAY just use the plain name fragment in the document scope, regardless of base URI. But I don't see anything that forbids implementations from resolving such fragments normally, which means that even if the intention is for the difference in behavior demonstrated above to actually work that way, then there are two valid ways to resolve the $ref:

  • understood to be in the same document without needing to consider base URIs
  • resolving using the standard RFC 3986 approach

and those two ways would produce different outcomes.

That really does not seem like a good idea.

@johandorland
Copy link
Collaborator

For this example it wouldn't make a lot of sense to resolve to http://example.net/root.json#item as it would break consistency with all other uses of URL referencing in the spec. I think the example in 9.2.1 is useful to show how $id and $ref work together, but it should be rephrased in a non-ambiguous way.

The part that confuses me the most is

which is understood as the schema defined elsewhere in the same document without needing to resolve the fragment against the base URI.

Both in the $ref and $id the fragment is being resolved against the base URI to form http://example.net/root.json#item.

The in the same document part to me is less confusing as my first thought would be that it refers to this specific example and not as a general rule, although I can definitely see how someone could read it differently.

@handrews
Copy link
Contributor Author

handrews commented Feb 8, 2018

@johandorland the idea that the "same document" part you mention here:

The in the same document part to me is less confusing as my first thought would be that it refers to this specific example and not as a general rule, although I can definitely see how someone could read it differently.

refers to a different, document-oriented process is more clear if you go slog through all of the older specs and look at how the wording evolved.

At minimum, the language is confusing, as the scope of a "document" and the scope identified by a base URI are not, in general, the same.

@awwright
Copy link
Member

awwright commented Feb 8, 2018

@handrews The behavior is intended to be the same as HTML, where you can give any element an id="foo" and then link to it, from any document, using <a href="http://example.com/document.html#foo"></a>

I believe the confusion is over two different, non-overlapping uses of "document". Perhaps a different term would be in order; I wrote "elsewhere in the same document" anticipating that the base URI wouldn't usually change inside documents using this feature.

The text is supposed to be non-normative; it's describing a consequence of behavior normatively specified in the section above:

To name subschemas in a JSON Schema document, subschemas can use
"$id" to give themselves a document-local identifier. This is done
by setting "$id" to a URI reference consisting only of a plain name
fragment (not a JSON Pointer fragment). The fragment identifier MUST
begin with a letter ([A-Za-z]), followed by any number of letters,
digits ([0-9]), hyphens ("-"), underscores ("_"), colons (":"), or
periods (".").

So, no, section 9.2.1. Internal References isn't defining an exception, it's illustrating how section 9.2 as a whole works.

@Relequestual
Copy link
Member

...I wrote "elsewhere in the same document" anticipating that the base URI wouldn't usually change inside documents using this feature.

Not an unreasonable assumption, but indeed this spec can be read both ways and it's unclear which is correct.

Does anyone have the time to look at a number of up to date key implementations and see which way they behave?

@handrews
Copy link
Contributor Author

handrews commented Feb 8, 2018

Does anyone have the time to look at a number of up to date key implementations and see which way they behave?

I think it's more important to set a sane and consistent behavior than keep ambiguous behavior that was incorrectly implemented.

wrote "elsewhere in the same document" anticipating that the base URI wouldn't usually change inside documents using this feature.

There's no reason that base changing would be any less likely with vs without plain name fragments, so we should explain things accordingly.

where you can give any element an id="foo" and then link to it, from any document

Then why is the section called "Internal References" if it involves references from external documents?

it's describing a consequence of behavior normatively specified in the section above

Unfortunately, the current example and wording make this less clear rather than more clear. I'd like to toss this "internal" vs "external" description (it's not even accurate- as you note, these can be used externally, and the external section is talking about uniqueness with URIs rather than a real distinction between internal and external references.

We should have use case-driven examples and stop trying to make a distinction between internal vs external. It's all just plain old RFC 3986 behavior, so let's emphasize that rather than trying to re-state it.

@awwright
Copy link
Member

awwright commented Feb 8, 2018

@handrews Maybe a different term for "Internal References" would be suitable (any ideas?), but the target audience is people who want to reference another schema in the same document by name.

A consequence of how URIs work just happens to be you can reference the schema from outside the document, too. (I would suggest producing separate documents, at separate base URIs, for every one that needs to be externally referenced, though.)

@spenced
Copy link

spenced commented Feb 9, 2018

(I agree that RFC 3986 reference resolution is the right answer.)

From a reader's perspective, the draft is significantly improved by having example schemas with accompanying explanations in sections 9.2 and 9.2.1. In contrast some RFCs are made more compact by not including even one example, at the cost of accessibility.

My approach to breaking schemas into modules is as per @awwright comment above: put them in separate files with a base URI specified only at the root schema (and nowhere else in the same file). I am sure this would be true for many schema authors as it is simple to do and the relationship between URI, file and JSON Schema is then trivial. For this way of working, internal referencing is always within file scope, and in that context 9.2.1 provides a valuable example.

However, I would agree that this example leaves me, and others with the same approach, vulnerable to the trap of forgetting that not all internal references are resolved within file scope (whether developing a JSON Schema implementation, or authoring JSON Schemas). This trap could easily be avoided by: a) assertion that RFC 3986 and only RFC 3986 applies (as proposed in above comments); b) providing an example with nested absolute base URIs and internal $refs, with a mapping of JSON Pointer (where $ref is), $ref string value, resolved URI. To my mind, this example would be similar to the 9.2 example which makes for an excellent test case of normative behaviour.

Removing the current 9.2.1 example from the draft would feel like a retrograde step, as this simpler case is the one that most schemas will inevitably use. It is useful to know that avoiding nested absolute base URIs makes all internal references resolve within file scope. Nested absolute base URIs are the corner case (even if they define the normative behaviour).

This issue has raised a question for me. What is the term for a non-root JSON Schema with an absolute base URI? Is it just a JSON Schema, or does JSON Schema document also apply? Is there a special term for this thing?

@handrews
Copy link
Contributor Author

handrews commented Feb 9, 2018

@spenced We definitely don't want to remove the examples. If we remove 9.2.1 we would find a different home for the example. I also agree that one with nested absolute base URIs is essential- it comes up regularly.

"Removing" 9.2.1 is more about removing the confusing title and language, not the examples.

This issue has raised a question for me. What is the term for a non-root JSON Schema with an absolute base URI? Is it just a JSON Schema, or does JSON Schema document also apply? Is there a special term for this thing?

It's just another subschema. This is one of the reasons for #514 deciding that $ref should be delegation rather than inclusion. Simply inlining a referenced document into the referencing document cannot always be done due to things that work at document scope. Notably, $schema is restricted to root schemas as it cannot change during processing of a single document. So if $schema is the same, you can safely inline, but if not the separate document-ness must be preserved.

@handrews
Copy link
Contributor Author

handrews commented Mar 1, 2018

Fixed by #550 and #551 (thanks, @awwright!)

@johandorland and @spenced further feedback on the new wording is most welcome, please just file new issues for anything that is still unclear, or any confusion that has been introduced with the new wording.

@handrews handrews closed this as completed Mar 1, 2018
@Relequestual
Copy link
Member

The resolution of PR #550 is great. The expanded examples from L:633 which show all possible JSON Pointers are extreamly valuable! Super work!

When multiple schemas try to identify with the same URI, validators SHOULD raise an error condition.

Great clarification... This is a clear possibility if people attempt to completely dereference inline (by inclusion) $ref usage, which is a great argument for delegation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants