Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate and clarify language around duplicate IRIs #1272

Merged
merged 4 commits into from
Sep 6, 2022

Conversation

handrews
Copy link
Contributor

Fixes issue #1271 and addresses one point mentioned in #1059 (comment) (but please do not drag the rest of that issue into this PR, thanks!)

The only actual changed requirement is the forbidding of "$id": "#", and "$id": "" which are confusing and are either pointless (in a document root where they resolve to the retrieval IRI, exactly as if $id was not present) or problematic (in an embedded resource root, where they produce duplicate IRIs for the embedded and containing resource).

This also consolidates the two different places where duplicate IRIs were addressed. Since the more general of the two paragraphs stated that this SHOULD be an error, I kept the SHOULD rather than the MAY that addressed only a subset of the cases. The SHOULD technically covered the subset as well anyway.

I have emphasized that "undefined" here means that the result, whatever it is, will not be interoperable (most likely, implementations will just return the last-seen schema under that IRI, but who knows).

Note that we could continue to allow "$id": "#" and "$id": "" and just assume that the SHOULD regarding duplicate IRIs is enough to discourage that. If that is the consensus then I can remove the commit that handles that part of the change.

In a document root schema, "$id": "#" is a no-op, and in
an embedded schema resource root, it results in the embeddded
resource having the same URI as its context resource.  Given
that the specificaiton says implementations SHOULD raise an error
for duplicate IRIs, there is no reason to allow this form.
Comment on lines 1356 to 1357
valid <xref target="RFC3987">IRI-reference</xref> with a non-empty non-fragment
component. This IRI-reference SHOULD be normalized, and MUST resolve to an
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this change is necessary. The two valid options are presented as an absolute-IRI or an IRI with an empty fragment. Both absolute-IRI and an IRI are by definition non-relative, so we don't need to say "non-empty". It also says it "MUST resolve to", which also indicates a non-relative result.

Copy link
Contributor

@notEthan notEthan Aug 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jdesrosiers: This doesn't sound right. $id can be a relative IRI. If that relative IRI is empty (apart from fragment), it resolves to the same absolute IRI as the parent schema (or the same as its retrieval IRI if it's at the root, but that part isn't a problem). Then a subschema has the same id as its parent. Forbidding the empty non-fragment portion disallows this id collision.

@handrews: The wording is awkward, "with a non-empty non-fragment component". Maybe a new sentence would be better:

If present, the value for this keyword MUST be a string, and MUST represent a valid IRI-reference. [unchanged]
This IRI-reference MUST NOT be empty or consist only of an empty fragment.
[or]
This IRI-reference MUST NOT have an empty non-fragment component.

This IRI-reference SHOULD be normalized ...

Copy link
Contributor Author

@handrews handrews Aug 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jdesrosiers I'm not sure I understand. "$id": "foo/bar" is valid, both before and after this change. It is a path-only relative IRI-reference that, resolved against a base IRI (say, https://example.com/schemas/base) produces an absolute-IRI (https://example.com/schemas/foo/bar).

@notEthan Thanks, I think it like "This IRI-reference MUST NOT be empty or consist only of an empty fragment." I checked RFC 3986 and it uses the term "empty" to describe relative references in §4.4 Same-Document Reference.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't saying that $id can't be relative. I was talking about the result of resolving the $id against the base IRI. I thought this change was trying to protect against case where you don't have a baseURI and therefore the IRI can't be fully resolved. I was pointing out that the current wording already protects against that case.

If that relative IRI is empty (apart from fragment), it resolves to the same absolute IRI as the parent schema

Oh, ok, now I understand where this change is coming from. But, I still don't think it's necessary because we talk about not allowing duplicate identifiers more generally elsewhere. IMO, it's already covered and doesn't need to be repeated here.

Copy link
Contributor Author

@handrews handrews Aug 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this change was trying to protect against case where you don't have a baseURI and therefore the IRI can't be fully resolved. I was pointing out that the current wording already protects against that case.

Ah, I see. Yeah, that is already handled and not improved (or made worse) by this change.

we talk about not allowing duplicate identifiers more generally elsewhere. IMO, it's already covered and doesn't need to be repeated here.

Cool- yeah, that's the main question (for one of the commits in this PR- the rest of it remains valid as it clarifies that "already covered" part that, as currently written, kinda contradicts itself in a confusing way).

I'd like to leave this open for probably another week and see if we get any more opinions from other @json-schema-org/spec-team folks (or anyone else, I just can't tag the whole universe). I don't feel strongly on this point, so if no one advocates for the change I'll strip it out and we can go with just the clean-up of the duplicate IRI stuff.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One option might be to note that, while syntactically allowed, "" and "#" effectively produce duplicate IRIs which are considered errors per §whatever:

Note that while the empty IRI and the empty fragment-only IRI are legal values, they produce duplicate IRIs when used in an embedded resource, which is considered an error per 8.2.3.

That's probably enough to discourage the values without requiring people to detect and block them. I'm currently leaning towards this option.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be a little odd to include a MUST NOT with the intent of heading off a route to duplicate IRIs, when the language on duplicate IRIs itself isn't a MUST NOT ("implementations SHOULD raise an error condition").

But do highlight that they are errors in the section where
that is likely to be the most relevant.
@handrews handrews changed the title Consolidate and clarify language around duplicate IRIs, forbid "$id" values that produce them without any benefit Consolidate and clarify language around duplicate IRIs Aug 21, 2022
@handrews
Copy link
Contributor Author

Based on feedback so far I have removed the forbidding of certain $id values (and changed the title of this PR to match that). I did add extra wording in the section on subschemas with $id about how empty IRIs are problematic, as it seems useful to do that there. It links to the new section on duplicates.

@handrews handrews linked an issue Aug 22, 2022 that may be closed by this pull request
@handrews handrews merged commit c99fb84 into json-schema-org:main Sep 6, 2022
@handrews handrews deleted the dup-ids branch September 9, 2022 06:28
@gregsdennis gregsdennis added clarification Items that need to be clarified in the specification and removed Type: Bug labels Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clarification Items that need to be clarified in the specification core
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consolidate language around duplicate schema IRIs
4 participants