Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WARC 1.1: Introduce record-id BNF grammar rule for consistency with examples #24

Closed
wants to merge 1 commit into from

Conversation

ato
Copy link
Member

@ato ato commented Sep 17, 2015

In the examples and in all popular implementations, URIs in the WARC-Target-URL and WARC-Profile fields are not surrounded by "<" and ">" characters. This change makes the grammar consistent with practice by removing "<" and ">" from the basic uri rule and introducing a new record-id rule for the fields WARC-Record-ID, WARC-Concurrent-To, WARC-Refers-To, WARC-Warcinfo-ID and WARC-Segment-Origin-ID.

Fixes #23

In the examples and in all popular implementations, URIs in the
WARC-Target-URL and WARC-Profile fields are not surrounded by
"<" and ">" characters.  This change makes the grammar consistent
with practice by removing "<" and ">" from the basic `uri` rule and
introducing a new `record-id` rule for the fields WARC-Record-ID,
WARC-Concurrent-To, WARC-Refers-To, WARC-Warcinfo-ID and
WARC-Segment-Origin-ID.

Fixes iipc#23
@kris-sigur
Copy link
Member

Makes sense to me.

I wonder if it is appropriate to include some kind of "errata" as well to address how this was mishandled in the previous standard?

@anjackson
Copy link
Member

I added a Document History section with this kind of thing in mind, but maybe a dedicated Errata bit would be better?

https://github.com/iipc/warc-specifications/blob/gh-pages/specifications/warc-format/warc-1.1/index.md#document-history

@ato
Copy link
Member Author

ato commented Sep 18, 2015

My experience in this area is very limited, but in most of the standards I have read the errata is a separate document associated with the version containing the error. eg #25

Revisions I've seen note changes if there are compatibility concerns in a "Changes since 1.0" section or just inline where the relevant item is discussed. For example:

In version 1.0 of the WARC standard the uri grammar rule was defined incorrectly with respect to the examples in the specification and with common implementations. For compatiblity implementations may choose to accept but should never emit URIs surrounded by '<' and '>' in the WARC-Target-URL and WARC-Profile fields.

@anjackson, should I add a document history entry to this pull request? I'd be happy to do so. I wasn't sure if it would cause problems when merging and whether the date should refer to now or the date of merging.

@ato ato changed the title Introduce record-id BNF grammar rule for consistency with examples WARC 1.1: Introduce record-id BNF grammar rule for consistency with examples Sep 18, 2015
@anjackson anjackson modified the milestone: The WARC Format 1.1 Oct 20, 2015
@saraaubry
Copy link

The following changes have been integrated in the revised ISO draft during the ISO working group meeting on November 16-17, 2015:

in section 4 file and record model, change the definition of uri and add a note:
uri = <'URI' per RFC3986>

NOTE: in WARC 1.0 standard (ISO 28500:2009), uri was defined as "<" <'URI' per RFC3986> ">". This rule has been changed to meet requests from implementers.

@saraaubry
Copy link

Included in WARC 1.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants