Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make $id conform to RFC 3986 suggestion for base URI elements #729

Closed
handrews opened this issue Mar 10, 2019 · 11 comments · Fixed by #780
Closed

Make $id conform to RFC 3986 suggestion for base URI elements #729

handrews opened this issue Mar 10, 2019 · 11 comments · Fixed by #780

Comments

@handrews
Copy link
Contributor

handrews commented Mar 10, 2019

Per RFC 3986 section 4.3:

Some protocol elements allow only the absolute form of a URI without
a fragment identifier. For example, defining a base URI for later
use by relative references calls for an absolute-URI syntax rule that
does not allow a fragment.

Let's please follow this advice and require $id to resolve to an absolute URI, without a fragment. By this I mean that it cannot contain a fragment, but MAY be a relative URI reference (resolved in the same way that it is currently). If it makes things easier, rather than "MUST NOT contain a fragment", we can say that "any fragment MUST be ignored." Either option makes it absolutely clear that fragments in $id have no useful effect.

Currently, there are three behavioral cases of fragments in $id:

  • "$id": "#foo": Plain name fragment definitions. We can replace with "$anchor": "foo", analogous to fragment definition keywords in several other media types. Very easy to explain, clearly separate from any concerns over base URIs or embedded documents.
  • "$id": "https://example.com/root#": Empty fragment in root schema object $ids. This has absolutely no effect compared to the same thing without a fragment at all. It primarily exists as an explicitly supported case because older meta-schema ids were written this way, and we've never bothered to change that.
  • ALL other cases have undefined behavior!

Per section 8.2.3 of the current draft

The effect of using a fragment in "$id" that isn't blank or doesn't
follow the plain name syntax is undefined. [[CREF3: How should an
"$id" URI reference containing a fragment with other components be
interpreted? There are two cases: when the other components match
the current base URI and when they change the base URI. ]]

This note explicitly makes the behavior of JSON Pointer-fragment-only $ids undefined, and notes that we have no idea how fragment-plus-other-component $ids behave, meaning that their behavior is also undefined in the current spec.


So there are several problems with the current behavior which this addresses:

  1. We are currently going against the best practice for base URI elements, which complicates explaining how base URIs work to those that don't already understand them.
  2. By having one form of fragment-only URI references perform a useful function, we give the impression that fragments in $id as a whole should work. However, this is incorrect- every other non-trivial usage has undefined behavior. Although some of it looks like it should work, at least to some people.
  3. Because the two well-defined behaviors (setting the self identifier which is also the base URI vs defining a plane name fragment) are, due to the rules around base URIs and fragments, apparently completely separate from each other, schema authors and implementors find $id confusing. It requires a fairly deep understanding of RFC 3986 to understand why these behaviors can possibly work in the same keyword at all.

Removing fragments from $id (and using $anchor for plain name fragment definition instead) solves these problems.

  • We would be in line with the informal RFC 3986 suggestion regarding base URI elements
  • Saying that fragments in $id MUST be ignored matches how base URIs are computed in RFC 3986
  • We would (not coincidentally) now have a self/base-setting element that behaves analogous to such elements in other media types, where fragments are either forbidden or clearly have no purpose
  • To the relative layperson, there would no longer appear to be two disjoint use cases for $id

Some of the fragment-in-$id cases that technically have undefined behavior happen to be effective no-ops, as with the the empty fragment case. (Recall that the empty fragment is technically a JSON Pointer for the entire document).

So sometimes we see schemas like this generated by tools:

{
    "definitions": {
        "foo": {
            "$id": "#/definitions/foo"
        }
    }
}

which is basically a no-op (note that the $id fragment matches the actual position). So if we say that fragments in $id MUST be ignored, its's still a no-op and that's fine. I would prefer an implementation to raise an error in this case going forward, but I don't feel strongly about that.

My point here is that the only other fragment usage I've seen in the wild doesn't actually do anything.


@awwright if I'm reading #719 (comment) you are more or less OK with this idea? I agree that the names could be better, but keeping $id for the self+base form is less disruptive.

@jdesrosiers you are encouraged to comment on the merits of this if you'd like, but please do not discuss your approach from #724 here, as we have already determined (in #179) that mixing the two discussions confuses everyone.


Everyone: fair warning, I feel very strongly about this. Unlike #726 (Eliminate base URI shadowing), which I think we should do but if we don't that's fine, I'm incredibly fed up with the problems with $id and feel like many of them come from having such a wide rage of syntactically valid values that have undefined semantics.

If you want to keep the current situation of exactly one use of fragments in $id having defined semantics, and all other possible uses having undefined semantics, please make an active case for why such a thing is desirable.

Yes, changing "$id" to "$anchor" is a breaking change. That is unfortunate. But I feel like $id as it stands is so problematic that this is worthwhile. And importantly, this just forbids one part of $id's current syntax, and leaves the rest of $id's well-defined behavior alone.

Although perhaps @Julian can speak to how difficult this would be in his implementation, as I think he found the prior change of id to $id particularly disruptive.

@Julian
Copy link
Member

Julian commented Mar 11, 2019

Although perhaps @Julian can speak to how difficult this would be in his implementation, as I think he found the prior change of id to $id particularly disruptive.

The painful bit is supporting multiple ways to say id across drafts.

That price has been paid, so in some sense the implementation cost is over :)

(Cognitive cost for users not withstanding, though I assume that's the point of this ticket)

@handrews
Copy link
Contributor Author

handrews commented Mar 11, 2019

@Julian yeah, I think having one clear purpose for $id which is in line with documented best practices will be lower cognitive cost for users than trying to convince them that the two usages both flow from the nature of URIs. Which is technically true but not intuitive for the vast majority of people.

@jdesrosiers
Copy link
Member

I think this is a great incremental improvement to $id. I think that the side effect of forcing $anchor to be split off is a positive one.

@ucarion
Copy link
Contributor

ucarion commented Mar 11, 2019

To clarify, this would basically entail changing:

valid <xref target="RFC3986">URI-reference</xref>.

from URI-reference to absolute-URI?

I'm definitely 👍 on this change.

@handrews
Copy link
Contributor Author

@ucarion close!

My choice of wording of "MUST resolve to an absolute URI" leaves room for some kinds of relative references, although we can discuss further changes to that in a separate issue. As you no doubt noticed, I've been filing issues on various minimal steps we can take with $id since the larger issues always founder.

With this particular issue, in terms of the RFC 3986 ABNF rules, the value would be:

$id-value  = absolute-URI / ( relative-part [ "?" query ] )

That appears to be the least awkward way to specify it in the ABNF, as there is no relative-ref-no-fragment production.

The $id in the root schema object would still say that it SHOULD be an absolute-URI (and we could reference the formal production rather than the slightly awkward we we do it now to allow for an empty fragment). It would also be updated to either say that it MUST NOT contain a fragment, or that if a fragment is present, it MUST be ignored.

If you would like to discuss restricting $id to strictly always be absolute-URI, feel free to file that as a separate issue. I think relative references without fragments have some clear and important use cases, but I'd rather not get into that here. I'd be happy to discuss it in a separate issue, though. And if we accept this issue, that would not preclude accepting further modifications.

@ucarion
Copy link
Contributor

ucarion commented Mar 12, 2019

Gotcha, makes sense!

As a nit then, I'm not sure that RFC 3986 URI resolution leaves open the possibility of resolving to an absolute-URI. The spec says:

   This section defines the process of resolving a URI reference within
   a context that allows relative references so that the result is a
   string matching the <URI> syntax rule of Section 3.

I think a better phrasing might be to say that, after resolving the $id, the fragment part MUST be ignored, and SHOULD be empty?

@handrews
Copy link
Contributor Author

handrews commented Mar 12, 2019

@ucarion if we start from restricting the value of $id to the ABNF productions I quoted above, then the result of resolving such values MUST be an absolute-URI. Because resolved references have a schema, and the ABNF forbids a fragment.

The paragraph you are quoting is discussing the general case. But constraining the input constrains the output, so we're fine. Fragments do not magically appear during resolution if they were not there before.

@handrews
Copy link
Contributor Author

@ucarion if, instead of restricting the values to those ABNF productions, we say that the value MAY contain a fragment which MUST be ignored, then you are correct, the resolved URI will in such cases end up having a fragment, and we will need to say that it MUST be ignored.

Either way, it's just a question of getting the wording right once there's something resembling a consensus on forbidding fragments vs requiring them to be ignored.

@handrews
Copy link
Contributor Author

handrews commented Apr 1, 2019

I'm going to write a PR for this, possibly in combination with other $id things as we have several that all affect each other fairly directly.

@handrews
Copy link
Contributor Author

I don't think I have it in me to get this through in draft-08

@handrews
Copy link
Contributor Author

OK, due to it coming up repeatedly over the last month, I'm giving this and #726 a shot in draft-08.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants