-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alternate serialization for media overlays #88
Comments
Interesting. So you are proposing 2 solutions to point to a Media Overlay "node" from the publication manifest:
In the first solution, why would we select a given resource type (text here) as primary rather than another (e.g. audio)? publisher's choice? |
The issue with this format is that Media overlay nodes are split is N different and small json files. |
I don't have a strong opinion on publication level vs reading order mostly because I'm not enough of an expert on media overlays.
Is there really a primary resource though? I don't think so. If the text is displayed and I can hear it at the same time, it feels pretty equal to me.
I really dislike that idea, sorry... It can already get pretty messy with the fact that a Link Object can contain arrays of Link Objects through What you're proposing is much much worse IMO and completely disconnected from the concept of a Link Object (since a collection can represent pretty much anything). |
In your example: {
"metadata": {
"role": ["chapter"]
}
} What would |
I don't think there's anything in schema.org that would be a good fit for In the JSON Schema, this would be a string + an enum with all known values for roles. |
In many commercial mainstream EPUB3 Media Overlays, the SMIL files are indeed tiny (typically: illustrated fixed layout children's "read aloud" books, with minimal amounts of synchronized text/audio). In Readium1 these SMIL files are parsed eagerly (ahead of rendering time) into their JSON equivalent (there are C++ and JS parsers, depending on the target platforms). These generated payloads are used to populate an in-memory Javascript data model that represents the state of the publication at runtime. There is a concrete real benefit in having the entirely of the timing tree (i.e. aggregated SMIL trees) loaded in memory. This is used in the Readium1 implementation to support a linear timeline bar that the user can drag from 0-100% (actually, zero-time to total-duration indicated by the top-level publication metadata, or alternatively by the sum of all reading-order SMILs). The MO engine then maps this linear time representation to a structural position inside the SMIL timing tree, simply by scanning the loaded MO model. This also makes it easier to handle skippability and escapability during playback. The obvious drawback is some upfront parsing cost, and increased memory consumption (this latter point is not such a big deal on modern devices though ... MO is not really meant for low-end e-ink devices). In my opinion, the benefits far outweigh the drawbacks. That being said, what differentiates Readium2 is the clear architectural facet of backend/server side state (where the MO models are populated just as in Readium1), combined with the client side runtime which may load SMIL timing trees just-in-time (i.e. the additional HTTP request to MO links for individual HTML spine items / chapters). In that Readium2 case, a MO playback engine implementation will probably want to load the timing tree for the entire publication anyway, for the reasons I mentioned before. Therefore, the top-level MO link in the publication manifest will have to deliver a data model similar to the one generated by Readium1 (i.e. a simple array-like aggregation of contiguous parsed SMIL trees, no need to be smart by somehow merging timing trees). |
@danielweck so you're suggesting that streamers and publication servers should merge all the SMIL together and only serve a publication-level link to the media overlay JSON document? The initial discussion is slightly different since the focus was on syntax and (potentially) authoring. |
No, I am quite happy continuing to serve individual SMIL "chapters" (i.e. mirroring exactly the logical organisation of an EPUB3 Media Overlays), in addition to the full-spine aggregated SMILs. I am however suggesting that the JSON syntax of the full-spine multi-SMIL consists simply in an array-like combination of each individual referenced SMIL in the original EPUB3 Media Overlays (no attempt to make a smart merge of the SMIL timing trees). The potential downside of this approach, is that the edge-case mapping "multi HTML contiguous spine items => single SMIL" can produce redundant data, unless some smart trimming of the timing tree is performed beforehand at the parser level (but note that extracting SMIL timing containers from a SMIL tree is a bit like attempting to split CSS style definitions ... the context is important and easy to break, due to semantically/structurallly-meaningful nested sequence containers). |
As discussed at the conference call:
|
The SMIL |
Let's try a deeper (more typical) SMIL timing tree, with intermediary SORRY, POSTED TOO QUICKLY (WILL PUBLISH AGAIN) |
Here is the example from the specification:
<smil xmlns="http://www.w3.org/ns/SMIL" xmlns:epub="http://www.idpf.org/2007/ops" version="3.0">
<body>
<!-- a chapter -->
<seq id="id1" epub:textref="chapter1.xhtml#s01" epub:type="chapter">
<!-- the section title -->
<par id="id2">
<text src="chapter1.xhtml#section1_title"/>
<audio src="chapter1_audio.mp3"
clipBegin="0:23:23.84"
clipEnd="0:23:34.221"/>
</par>
<!-- some sentences in the chapter -->
<par id="id3">
<text src="chapter1.xhtml#text1"/>
<audio src="chapter1_audio.mp3"
clipBegin="0:23:34.221"
clipEnd="0:23:59.003"/>
</par>
<par id="id4">
<text src="chapter1.xhtml#text2"/>
<audio src="chapter1_audio.mp3"
clipBegin="0:23:59.003"
clipEnd="0:24:15.000"/>
</par>
<!-- a figure -->
<seq id="id7" epub:textref="chapter1.xhtml#figure">
<par id="id8">
<text src="chapter1.xhtml#photo"/>
<audio src="chapter1_audio.mp3"
clipBegin="0:24:18.123"
clipEnd="0:24:28.764"/>
</par>
<par id="id9">
<text src="chapter1.xhtml#caption"/>
<audio src="chapter1_audio.mp3"
clipBegin="0:24:28.764"
clipEnd="0:24:50.010"/>
</par>
</seq>
<!-- more sentences in the chapter (outside the figure) -->
<par id="id12">
<text src="chapter1.xhtml#text3"/>
<audio src="chapter1_audio.mp3"
clipBegin="0:25:45.515"
clipEnd="0:26:30.203"/>
</par>
<par id="id13">
<text src="chapter1.xhtml#text4"/>
<audio src="chapter1_audio.mp3"
clipBegin="0:26:30.203"
clipEnd="0:27:15.000"/>
</par>
</seq>
</body>
</smil>
<html xmlns="http://www.w3.org/1999/xhtml"
xmlns:epub="http://www.idpf.org/2007/ops"
xml:lang="en"
lang="en">
<head>
<title>Media Overlays Example of EPUB Content Document</title>
</head>
<body id="sec1">
<section id="sectionstart" epub:type="chapter">
<h1 id="section1_title">The Section Title</h1>
<p id="text1">The first phrase of the main text body.</p>
<p id="text2">The second phrase of the main text body.</p>
<figure id="figure">
<img id="photo"
src="photo.png"
alt="a photograph for which there is a caption" />
<figcaption id="caption">The photo caption</figcaption>
</figure>
<p id="text3">The third phrase of the main text body.</p>
<p id="text4">The fourth phrase of the main text body.</p>
</section>
</body>
</html> |
This is the resulting JSON: UPDATED to enclose {
"metadata": {
"duration": "0:27:15.000",
"role": [
"body"
]
},
"links": [
{
"href": "chapter1.xhtml",
"type": "application/xhtml+xml"
}
],
"children": [
{
"metadata": {
"id": "id1",
"role": [
"chapter"
]
},
"links": [
{
"href": "chapter1.xhtml#s01",
"type": "application/xhtml+xml"
}
],
"children": [
{
"metadata": {
"id": "id2"
},
"links": [
{
"href": "chapter1.xhtml#section1_title",
"type": "application/xhtml+xml"
},
{
"href": "chapter1_audio.mp3#t=0:23:23.84,0:23:34.221",
"type": "audio/mpeg"
}
]
},
{
"metadata": {
"id": "id3"
},
"links": [
{
"href": "chapter1.xhtml#text1",
"type": "application/xhtml+xml"
},
{
"href": "chapter1_audio.mp3#t=0:23:34.221,0:23:59.003",
"type": "audio/mpeg"
}
]
},
{
"metadata": {
"id": "id4"
},
"links": [
{
"href": "chapter1.xhtml#text2",
"type": "application/xhtml+xml"
},
{
"href": "chapter1_audio.mp3#t=0:23:59.003,0:24:15.000",
"type": "audio/mpeg"
}
]
},
{
"metadata": {
"id": "id7"
},
"links": [
{
"href": "chapter1.xhtml#figure",
"type": "application/xhtml+xml"
}
],
"children": [
{
"metadata": {
"id": "id8"
},
"links": [
{
"href": "chapter1.xhtml#photo",
"type": "application/xhtml+xml"
},
{
"href": "chapter1_audio.mp3#t=0:24:18.123,0:24:28.764",
"type": "audio/mpeg"
}
]
},
{
"metadata": {
"id": "id9"
},
"links": [
{
"href": "chapter1.xhtml#caption",
"type": "application/xhtml+xml"
},
{
"href": "chapter1_audio.mp3#t=0:24:28.764,0:24:50.010",
"type": "audio/mpeg"
}
]
}
]
},
{
"metadata": {
"id": "id12"
},
"links": [
{
"href": "chapter1.xhtml#text3",
"type": "application/xhtml+xml"
},
{
"href": "chapter1_audio.mp3#t=0:25:45.515,0:26:30.203",
"type": "audio/mpeg"
}
]
},
{
"metadata": {
"id": "id13"
},
"links": [
{
"href": "chapter1.xhtml#text4",
"type": "application/xhtml+xml"
},
{
"href": "chapter1_audio.mp3#t=0:26:30.203,0:27:15.000",
"type": "audio/mpeg"
}
]
}
]
}
]
} |
Note that the above conversion does not translate the time/clock values to seconds (so we can easily cross-reference with the original EPUB3 SMIL). |
Note that the above JSON introduces the |
Note that |
Note that the |
Note that the initial |
This alternative syntax proposal purely relies on the "link" object's UPDATE: added empty {
"metadata": {
"duration": "0:27:15.000"
},
"links": [
{
"role": [
"body"
],
"href": "chapter1.xhtml",
"type": "application/xhtml+xml",
"children": [
{
"id": "id1",
"role": [
"chapter"
],
"href": "chapter1.xhtml#s01",
"type": "application/xhtml+xml",
"children": [
{
"id": "id2",
"href": "#",
"children": [
{
"href": "chapter1.xhtml#section1_title",
"type": "application/xhtml+xml"
},
{
"href": "chapter1_audio.mp3#t=0:23:23.84,0:23:34.221",
"type": "audio/mpeg"
}
]
},
{
"id": "id3",
"href": "#",
"children": [
{
"href": "chapter1.xhtml#text1",
"type": "application/xhtml+xml"
},
{
"href": "chapter1_audio.mp3#t=0:23:34.221,0:23:59.003",
"type": "audio/mpeg"
}
]
},
{
"id": "id4",
"href": "#",
"children": [
{
"href": "chapter1.xhtml#text2",
"type": "application/xhtml+xml"
},
{
"href": "chapter1_audio.mp3#t=0:23:59.003,0:24:15.000",
"type": "audio/mpeg"
}
]
},
{
"id": "id7",
"href": "chapter1.xhtml#figure",
"type": "application/xhtml+xml",
"children": [
{
"id": "id8",
"href": "#",
"children": [
{
"href": "chapter1.xhtml#photo",
"type": "application/xhtml+xml"
},
{
"href": "chapter1_audio.mp3#t=0:24:18.123,0:24:28.764",
"type": "audio/mpeg"
}
]
},
{
"id": "id9",
"href": "#",
"children": [
{
"href": "chapter1.xhtml#caption",
"type": "application/xhtml+xml"
},
{
"href": "chapter1_audio.mp3#t=0:24:28.764,0:24:50.010",
"type": "audio/mpeg"
}
]
}
]
},
{
"id": "id12",
"href": "#",
"children": [
{
"href": "chapter1.xhtml#text3",
"type": "application/xhtml+xml"
},
{
"href": "chapter1_audio.mp3#t=0:25:45.515,0:26:30.203",
"type": "audio/mpeg"
}
]
},
{
"id": "id13",
"href": "#",
"children": [
{
"href": "chapter1.xhtml#text4",
"type": "application/xhtml+xml"
},
{
"href": "chapter1_audio.mp3#t=0:26:30.203,0:27:15.000",
"type": "audio/mpeg"
}
]
}
]
}
]
}
]
} |
Thanks for these examples @danielweck.
|
@HadrienGardeur |
There are existing publications that use a valid EPUB3 "design" / authoring pattern in NavDoc, in order to partition / categorize TOC links. For example, Children's Litterature This results in a hierarchy of |
An alternative to that would be a default |
I added empty |
I enclosed |
Based on our recent discussions, I think we can close this issue. We need to keep an eye on the W3C CG and make sure that we align with it. Can you open an issue specifically for that @danielweck ? |
Follow-up: #109 |
We've iceboxed the work on media overlays for some time but I'd like to re-start discussions on this by proposing a new serialization for them.
Instead of having a separate syntax for MO, I'd like to explore the ability to represent them using our existing model for RWPM, which means:
metadata
,links
and subcollections)textref
andaudioref
), we would use the Link Object which opens the door to a lot of things (text + audio + video or two text references in different languages)Here's an example in this proposed syntax where the text is paired with both audio and a video:
These media overlays could either be referenced directly at a publication level:
But they could also be referenced as
alternate
resources in thereadingOrder
:Any thoughts on this? cc @danielweck @llemeurfr
The text was updated successfully, but these errors were encountered: