-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added duration to the LinkedResource object type #421
Conversation
@wareid it would be good to add duration values to the flatland example (Example 69). Do you have those values? Maybe @HadrienGardeur has them? |
@mattgarrish I am not sure this PR should be merged before or after the draft is republished as /TR. Per our resolution this could/should be done at any time... |
@iherman here are the duration of the tracks for the Flatland example in another syntax: https://github.com/readium/webpub-manifest/blob/master/examples/Flatland/manifest.json |
@wareid I have hit a problem/question. I have taken over the text on the value of duration from the audiobook draft, namely:
(More exactly, I have added the reference to the rfc explicitly, but otherwise it is the same.) However, I am not sure I understand. If I look at the rfc, I see that it allows things like The schema.org text refers to the ISO format only, without any reference to the NPT. It also what it uses in the examples. So… isn't it true that the correct specification text should say, for the value of the term:
|
Yes, the Schema.org definition of Hadrien gave a link that shows the ReadiumWebPubManifest If I remember correctly, a reference to RFC 7826 ( https://tools.ietf.org/html/rfc7826 ); as mentioned by Ivan; was introduced because Media Fragments references RFC 2326 ( https://tools.ietf.org/html/rfc2326 ) which RFC 7826 explicitly supersedes. All being said, we've got to pick one or the other :)
|
Thanks @danielweck...
I agree we must do this. My vote would be to go with the first approach. We have tried to align, as much as we can, with schema.org, we even refer to https://schema.org/duration, which tells me that we do not really have a choice there. I do realize the syntax is more clumsy, but… |
Publication is done now, i.e., if the other issues are solved, and the PR is approved, we can also merge. |
@iherman the choice of a number of seconds (decimal figures allowed) has been a pragmatic choice made by the Readium community: everybody understands such format, and media durations are never in the range of years, months or days (this is part of what the ISO 8601 format is targeting). We had this discussion before in #307 and the conclusion was "after discussion with schema, we will use NPT (Media Fragments) until they confirm or inform us otherwise". Note that we should be more precise and state we use npt-sec (not npt-hhmmss). |
@llemeurfr, if that is the WG final decision, than that is what we will do. However, in this case, we should not refer to schema.org, nor should we use the same term to avoid problems with JSON-LD processors (which default unqualified terms to schema.org). |
why? I am a bit wary of the direction. If we use NPT from an RFC, then we should use the NPT as a whole. Just cherry picking part of a standard is a source for problems later imho. |
After all why not ... the syntax is so that the parser can easily make the difference, and for expressing the expected duration of a textual ebook, hours + minutes make sense. npt-sec = 1DIGIT [ "." DIGIT ] exemples: |
Also, the conclusion of #307 was that we would request schema.org to allow this NPT notation as an alternate syntax on Duration, which would then allow us to use the schema.org property (either |
I.e., it must be done on the schema.org mailing list. But there has to be a good argument for it, e.g., what is wrong with the ISO formalism, evidence that it does not work, etc. It would make the case stronger if we had evidence of widespread usage of NPT on the Web as opposed to the ISO version. My gut feeling is that it will be a difficult call to convince schema.org to allow for a dual syntax. |
Well, W3C Media Fragments URI is a web specification that references the RFC, not the ISO duration format, so even without usage metrics the Schema.org choice is arguably less "Webby" :) |
...and the W3C SMIL standard also utilises NPT notation for clock values. |
For the sake of completion / exactitude, here is the spec. for the SMIL clock values syntax: https://www.w3.org/TR/SMIL3/smil-timing.html#q22 ...and note that ISO 8601 is actually also utilized in SMIL, albeit for wallclock values which are much less common (and not used in EPUB3 Media Overlays, for example): https://www.w3.org/TR/SMIL3/smil-timing.html#Timing-WallclockSyncValueSyntax |
(Trying to move forward...) I would propose the following steps of actions
@llemeurfr @danielweck @wareid @GarthConboy @TzviyaSiegman @mattgarrish : WDYT? |
SGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. I will make the corresponding changes to the audio spec.
But... before I merge, we need an alternative name to |
some proposals : timeLength, contentLength, consumptionTime, escapeTime, TimeToEnd |
|
@danielweck beat me to it, but |
@llemeurfr I was wondering about that, but, e.g., "12:05:35.3" is a valid NPT value, and that is not a number... What we could do is to say that the user MAY use a number instead of a string (and, e.g., the canonicalization would turn it into a string). This can easily be done, but I am not sure it is good idea to complicate things (implementation may accept a number under the motto "be liberal what you accept"...) |
@iherman, you're right, I was forgetting the npt-hhmmss variant. So string be it ... |
@iherman Late to the party, and this is a long thread. What is the objection to simply stating the duration in seconds? why would we need or want to have a formatted string? |
At the moment, the value is defined to be NPT as defined by rfc7826. If we do refer to a standard in this area, then we must take it fully, i.e., we should allow for "12:05:35.3" as well, which is a string. From an editor's perspective, I can see two different possibilities
I am not an expert in audio books; my gut feeling is that allowing the (1) may be more comfortable for authors, hence I would prefer to go down that way. If we are able to harmonize the duration value with schema.org, it looks like schema.org processors accept a number where they expect a string with a number (e.g., for copyrightYear), so (1) is not a problem in this sense. (I made some test on the Structured Data Testing Tool.) Cc: @mattgarrish |
B.t.w. the diff file can be extracted here: https://tinyurl.com/y3suyhcm (It takes a while, though...) |
@iherman https://developer.mozilla.org/en-US/docs/Web/API/HTMLMediaElement/duration
https://docs.microsoft.com/en-us/windows/desktop/wmp/media-duration
https://developer.android.com/reference/java/time/Duration
We use the word |
@iherman there is some overlap between this ticket and #420, and I apologize for contributing to that. I think there is a distinction between It may be more appropriate to format |
I think it is a false equivalency to compare (1) types used in APIs to read and/or write values, with (2) syntaxes used in authoring and interchange formats. They do not serve the same purpose. Developers of processing / playback engines for timed media (i.e. audio/video, SVG animations, SMIL, etc.) are indeed very likely to normalize clock values into internal, implementation-specific data types that lend themselves to efficient calculations (such as: number of seconds with decimal point, subject to the constraints of floating point arithmetics). There is prior art at W3C (e.g. SMIL, Media Fragments URI) which borrows from other RFC and ISO specifications. I think WebPub should support such well-established, expressive syntactical constructs based on strings of characters, whilst also allowing JSON numbers (floating point) to directly represent units of seconds. The net advantage of the latter is of course performance (no parsing / serialization costs), and; if I understand correctly; the normative definition of "canonical" WebPub can choose this. References: SMIL: Media Fragment URIs: And just for fun, here are HTTP links to a 3-minutes-long MP3 audio file, starting at 2 minutes 30 seconds and 500 milliseconds, using equivalent Media Fragment URI syntaxes:
(works in all modern web browsers) |
@danielweck I think we have different interpretations of the use and purpose of this My understanding is this:
*update: It occurs to me that maybe the value in the TOC should be |
OK. I am finally on the same page here. As commented on #420 (comment) the ISO 8601 "duration" format would be the most appealing to me, if it allows for seconds-based expressions such as My primary concern/objection would be the need to format seconds into I think the media fragments discussion is more appropriate to a ToC/readingOrder discussion ** Update: |
I am not an expert, i.e., I will follow the choice between NPT and ISO as agreed by the Working Group. |
@geoffjukes let's imagine that you have to explain to the whole publishing industry, including audiobook studios, that media duration (a track in audio speak) is not expressed as 130.5 (in sc) but as "PT130.5S" and that is normal because ... ISO. How many eyebrows will you face? |
@llemeurfr from https://www.ietf.org/rfc/rfc2326.txt:
It doesn't read like it is intended for how you are proposing to use it. I'll go with what the majority decides (now I know I can comfortably express seconds as a decimal in both NPT and ISO 8601) but my preference would be ISO, and conformance with schema.org |
This issue was discussed in a meeting.
View the transcriptdurationWendy Reid: Issue: #420 Wendy Reid: Open Pull Request: #421 Wendy Reid: should duration be required? … duration will become part of the core spec Ivan Herman: there are 2 issues … one is, if we take duration for one resource, the question that came up was what is the format for the value? … one possibility would be to use the ISO 8601 Value, which is used by schema … or we could use RFC 7826 … which is used in the media world … the majority seem to favor the RFC value, as it’s more readable than ISO … we had an issue #307 a while ago where that was decided, but it wasn’t clean in the doc … we can reinforce this decision … this is one part of it Wendy Reid: I’ve tried to talk to danbri about this, no response yet … the RFC value fits better with what we want to do, especially if we also want to reference media fragments … can we merge the PR and close this issue? Ivan Herman: do we want to make a new resolution, or the already decided one? Wendy Reid: let’s stick with the NPT? Geoff Jukes: I’m confused about the intent … specifying the duration of the resource … it’s the file, effectively … that duration is only specified in seconds … never anything else … my concern is that putting in media fragments at the resource level doesn’t make sense … if the intent is to conform to schema.org, and we should just use ISO … so why NPT instead of using a double? … I don’t know why it’s a consideration Wendy Reid: media fragments will be a thing, although maybe more in TOC etc than in resources Geoff Jukes: that’s not describing a resource, but metadata Wendy Reid: we don’t want two different formats for these things Geoff Jukes: I disagree … and I’m having trouble with these very long discussions … I think of a media fragment as a different thing than a resource Deborah Kaplan: geoffjukes: +1 for calling out our confusing conversations as confusing. Thanks. Ivan Herman: the NPT format is defined in a way that it can have only a number, which is seconds … the author may choose to use raw seconds Tzviya Siegman: i missed last week. is it possible to summarize the discussion? Ivan Herman: in a way we jumped ahead … we have make a choice between ISO and RFC … then during the discussion a third option became possible, just taking the number of seconds … those are the three options … so the question is which of the three? Geoff Jukes: in addition to that, what is the desire to conform to schema? Is that a design principle? Ivan Herman: we want the contents of the manifest to be accessible to the knowledge graph … it’s mostly important for bibliographic metadata Geoff Jukes: the desire to conform to schema is high, so we can obtain cross-vendor parsing capability … is that correct? Ivan Herman: yes Laurent Le Meur: here we are speaking on duration of resource, not duration of audiobook. It’s not a property of a book. … so it’s not tied to a need to express audiobook metadata for schema … so we could use seconds, with a name other than duration (like runtime or length) … and the audiobook industry would be happy with that Geoff Jukes: I’d be happy with a new thing called length or whatever, that’s just a double … it’s what we already do Ivan Herman: to be clear I am just a messenger … whatever the group decides is fine Tim Cole: the decision could be made, that in this community we would use duration but constrain the value of seconds … i think this is OK … it could be enforced via a context document … we could also define our own property, and connect it to duration … there are ways to express constrained versions of other properties Ivan Herman: I don’t think that works … schema uses the ISO format, and it doesn’t allow a simple number … a number can be a subset of RFC, but not of ISO Wendy Reid: the reason we were leaning on RFC it has only two ways to express time, including only seconds Ivan Herman: I would propose to move on … we define that property to have a value being a float consisting of the number of seconds … with a new term like length Proposed resolution: The time length property (unnamed) will only use a float consisting of the number of seconds of the resource. (Wendy Reid) Ivan Herman: +1 Laurent Le Meur: +1 Dave Cramer: +1 Franco Alvarado: +1 Tim Cole: +1 Geoff Jukes: +1 Joshua Pyle: +1 Marisa DeMeglio: 0 Avneesh Singh: +-0 Avneesh Singh: no strong opinion :) … waiting for feedback from media sync people Wendy Reid: does this impact sync media Marisa DeMeglio: I don’t think so… this is just properties of resources Ben Schroeter: +1 Brady Duga: Abstain (don’t plan to use the value) Marisa DeMeglio: this issue doesn’t need to get more complicated Resolution #2: The time length property (unnamed) will only use a float consisting of the number of seconds of the resource. Ivan Herman: the other issue that came up is more controversial … there may be a notion of duration of the whole audiobook … it turned out that having that as book-level metadata is something that implementors may ignore … they may deduce that from the individual resources … but it may be helpful as a hint to the user, as a value in the catalog etc … what I did, mostly to generate discussion, was to … put a global property there, with the same format … primarily defined for a user interface … do we need this, or should we remove it from the PR doc? Laurent Le Meur: I would say that the audiobook schema.org object supports the duration with ISO 8601 … it’s there and it’s optional, and it is what we want … it’s descriptive metadata … we could just adopt this and move on Ivan Herman: +1 to laurent Geoff Jukes: +1 to laurent Wendy Reid: simple descriptive metadata Proposed resolution: Schema.org’s Duration will be a required metadata descriptor for audiobooks (Wendy Reid) Ivan Herman: +1 Ben Schroeter: +1 Laurent Le Meur: I thought the idea was to keep it optional, as in schema.org Proposed resolution: Schema.org’s Duration will be a recommended metadata descriptor for audiobooks (Wendy Reid) Laurent Le Meur: and it’s a ‘duration’ property (of type ‘Duration’) Ivan Herman: +1 Laurent Le Meur: +1 Marisa DeMeglio: is this a different property? Ivan Herman: yes Marisa DeMeglio: -1 Deborah Kaplan: +1 Tim Cole: +1 Joshua Pyle: +1 Wendy Reid: this is schema.org duration descriptor for audio book, the length of the entire work, the sum of all the parts Laurent Le Meur: see https://schema.org/Audiobook Laurent Le Meur: and https://schema.org/duration Geoff Jukes: it’s not the same concept … it might be the sum of all resources, but it might be different, for example if there’s non-book audio resources Ivan Herman: +1 to geoffjukes Geoff Jukes: +1 Geoff Jukes: so it’s ok to have a different name and format, and it’s good for it to be in schema.org so it’s universally digestable Wendy Reid: it would be called duration, it would be the total length of the book, provided by the publisher Deborah Kaplan: are we voting on making this required? Ivan Herman: there is a mess-up … there are 2 things here … one is, what is the global descriptive metadata, and what value it takes … and the only resolution we are proposing is to use duration with ISO as in schema.org … and then there’s the question of whether this metadata item is required Geoff Jukes: I would happy for it to be required … we have to send it to our publishers/distributors Wendy Reid: when I said required I meant for the audiobook profile Laurent Le Meur: q for geoffjukes. Why is it required? Geoff Jukes: when we send ONIX we include runtime … and they like to cross-reference to check they make sure they got the right book … if it’s not required we’ll supply it anyway Tzviya Siegman: +1 to limited metadata! Dave Cramer: The web platform requires very little metadata, we should require the important things (title, author), this does not seem like required metadata … I suggest we make it optional Ivan Herman: +1 to dauwhe Laurent Le Meur: +1 to dauwhe Wendy Reid: for an audiobook it’s almost as important as title … for a user to understand what they’re getting into … to find out if it’s abridged or unabridged … or if my phone will keep it … I think it should be required Ivan Herman: there is no requirement to provide metadata for the number of book pages … but the same argument applies, ish … I agree it is recommended … but “must” is too far Tzviya Siegman: I hate to prolong this discussion … when we were deciding on EPUB metadata, lots of people said that title should be required … but then you get into lots of nuance with what titles means, but most systems don’t pay attention … we should look into how systems work with information about length … and how this will play out in the real world … maybe the implementors can tell us more about this information is used Dave Cramer: It strikes me as many of the arguments for the utility of the information is about file size not chronological duration, this information can be useful, but requiring them is not traditionally how the web works … we run into issues of validation … are we then going to get to a point where validators takes the values and compares them … requiring this is complicated Laurent Le Meur: I agree on principle we shouldn’t require descriptive metadata … and we should keep properties required for user agent functioning or content identification … so we should recommend this, underlining all the advantages of using this Brady Duga: when considering required metadata, we should ask if it’s impossible to create a book without this metadata. … if it’s not impossible, we shouldn’t require it Wendy Reid: I’m OK with recommended, even though y’all are completely wrong :) Bill Kasdorf: vendors can still require it Ivan Herman: we have 2 resolutions to take … we never closed the previous resolution Proposed resolution: duration is a descriptive metadata for WP, whose value is the ISO format (as used in schema.org). It is optional. (Ivan Herman) Ivan Herman: +1 Tim Cole: +1 Laurent Le Meur: +1 Bill Kasdorf: +1 Deborah Kaplan: +1 Geoff Jukes: +1 Ben Schroeter: +1 Dave Cramer: +1 Brady Duga: +1 Garth Conboy: +1 George Kerscher: +1 Resolution #3: duration is a descriptive metadata for WP, whose value is the ISO format (as used in schema.org). It is optional. Joshua Pyle: +1 Proposed resolution: Schema.org duration value is recommended metadata for the audiobooks profile (Wendy Reid) Ivan Herman: +1 Laurent Le Meur: +1 Marisa DeMeglio: +1 Tim Cole: +1 Bill Kasdorf: +1 Ben Schroeter: +1 Deborah Kaplan: +1 Resolution #4: Schema.org duration value is _recommended_ metadata for the audiobooks profile Wendy Reid: can we move on and never speak of this again? Ivan Herman: no … I will make the edits according to the resolutions, is it OK to then merge? Wendy Reid: +1 Ivan Herman: and then the PR and the issue can be closed then? everyone: YES |
@mattgarrish @wareid @TzviyaSiegman @GarthConboy I have made the changes we discussed yesterday on the call.
If you guys agree, we should merge asap; there are already merge conflicts because, in the meantime, @mattgarrish also made some changes. (@mattgarrish, can you take care of those conflicts? I am not sure I understand, the conflict is on a WebIDL; I tried to take over the latest version to this document, but it still sees a conflict...)
|
Any particular reason for changing the URLs in the quick reference? I thought we were leaving those until the naming was resolved, or did I miss somewhere that it has been? More why I ask is that we seem to be falling into the same trap we did with the epub structure vocab with inconsistent naming. The schema.org properties use camelcased names, but we also have a mix of hyphenated names (accessibility-report, privacy-policy) and lowercase (pagelist). Would be good to standardize on camelcasing. |
|
Because these are relationships that aren't yet standardized, not properties, so they were defined as URLs. I believe they all have notes in their respective definitions that we are looking to have them integrated in the appropriate vocabulary so the URLs don't have to be used (i.e., into IANA). If you have to use the URL in the The bigger question might be why do we intermingle properties and relationships as though they're the same? It might be something to rethink. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't notice anything with the actual inclusion of duration, so let's take up the properties/relationships in the other issue I just opened.
This is the implementation of the relevant resolution on the meeting of the 4th of April.
This PR only adds the resource level duration. The book level term is not yet decided, see #420, more exactly the proposed solution. If that is also decided, I am happy to add a book level value, too.