Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternate serialization for media overlays #88

Closed
HadrienGardeur opened this issue Mar 19, 2019 · 27 comments
Closed

Alternate serialization for media overlays #88

HadrienGardeur opened this issue Mar 19, 2019 · 27 comments

Comments

@HadrienGardeur
Copy link
Contributor

HadrienGardeur commented Mar 19, 2019

We've iceboxed the work on media overlays for some time but I'd like to re-start discussions on this by proposing a new serialization for them.

Instead of having a separate syntax for MO, I'd like to explore the ability to represent them using our existing model for RWPM, which means:

  • each media overlay node is a full collection (with potentially metadata, links and subcollections)
  • instead of specialized elements (textref and audioref), we would use the Link Object which opens the door to a lot of things (text + audio + video or two text references in different languages)

Here's an example in this proposed syntax where the text is paired with both audio and a video:

{
  "metadata": {
    "role": ["chapter"]
  },
  "links": [
    {
      "href": "chapter1.html",
      "type": "text/html"
    }
  ],
  "children": [
    {
      "links": [
        {
          "href": "chapter1.html#par1",
          "type": "text/html"
        }, 
        {
          "href": "chapter1.mp3#t=0,20",
          "type": "audio/mpeg"
        },
        {
          "href": "chapter1.webm#t=0,26",
          "type": "video/webm"
        }
      ]
    },
    {
      "links": [
        {
          "href": "chapter1.html#par2",
          "type": "text/html"
        }, 
        {
          "href": "chapter1.mp3#t=20,28",
          "type": "audio/mpeg"
        },
        {
          "href": "chapter1.webm#t=26,37",
          "type": "video/webm"
        }
      ]
    }
  ]
}

These media overlays could either be referenced directly at a publication level:

"links": [
  {
    "rel": "alternate",
    "href": "overlay.json",
    "type": "application/media-overlay+json",
  }
]

But they could also be referenced as alternate resources in the readingOrder:

{
  "href": "chapter1.html",
  "type": "text/html",
  "alternate": [
    {
      "href": "overlay1.json",
      "type": "application/media-overlay+json"
    }
  ]
}

Any thoughts on this? cc @danielweck @llemeurfr

@llemeurfr
Copy link
Contributor

llemeurfr commented Mar 20, 2019

Interesting. So you are proposing 2 solutions to point to a Media Overlay "node" from the publication manifest:

  • specify each Media Overlay Node = Json file as href in each item of the reading order, or
  • choose a "primary" resource type (chapter1.html in your exemple) and specify it in each item of the reading order, plus add a Media Overlay Node = Json file as an "alternate" property.

In the first solution, why would we select a given resource type (text here) as primary rather than another (e.g. audio)? publisher's choice?

@llemeurfr
Copy link
Contributor

The issue with this format is that Media overlay nodes are split is N different and small json files.
It would be much more compact if the reading order was able to handle not only Link Objects, but also "composite" objects like ... a collection.

@HadrienGardeur
Copy link
Contributor Author

HadrienGardeur commented Mar 20, 2019

See w3c/pwpub#44 (comment)

I don't have a strong opinion on publication level vs reading order mostly because I'm not enough of an expert on media overlays.

In the first solution, why would we select a given resource type (text here) as primary rather than another (e.g. audio)?

Is there really a primary resource though? I don't think so. If the text is displayed and I can hear it at the same time, it feels pretty equal to me.

It would be much more compact if the reading order was able to handle not only Link Objects, but also "composite" objects like ... a collection.

I really dislike that idea, sorry... It can already get pretty messy with the fact that a Link Object can contain arrays of Link Objects through alternate or children.

What you're proposing is much much worse IMO and completely disconnected from the concept of a Link Object (since a collection can represent pretty much anything).

@danielweck
Copy link
Member

In your example:

{
  "metadata": {
      "role": ["chapter"]
  }
}

What would role map to? A new JSON Schema type?
https://github.com/readium/webpub-manifest/tree/master/schema

@HadrienGardeur
Copy link
Contributor Author

I don't think there's anything in schema.org that would be a good fit for role in this context, so it would be mapped to a URL of our own.

In the JSON Schema, this would be a string + an enum with all known values for roles.

@danielweck
Copy link
Member

In many commercial mainstream EPUB3 Media Overlays, the SMIL files are indeed tiny (typically: illustrated fixed layout children's "read aloud" books, with minimal amounts of synchronized text/audio).
However, reflowable publications converted from DAISY Digital Talking Books; or nowadays also natively authored as EPUB3; usually consist in large sentence-level SMILs, with many hours of audio playback.
There are rare edge cases, but the vast majority of MO content is authored using the 1-to-1 mapping of HTML documents, SMIL and audio files ("one spine item in the reading order => one SMIL file => one audio file"). Occasionally "several SMIL files => a single audio file", but that is just an implementation detail that does not affect the model we are discussing here. In principle there may be "several contiguous HTML files in the reading order => a single SMIL" but frankly I have personally not come across this authoring practice in the real world.

In Readium1 these SMIL files are parsed eagerly (ahead of rendering time) into their JSON equivalent (there are C++ and JS parsers, depending on the target platforms). These generated payloads are used to populate an in-memory Javascript data model that represents the state of the publication at runtime.

There is a concrete real benefit in having the entirely of the timing tree (i.e. aggregated SMIL trees) loaded in memory. This is used in the Readium1 implementation to support a linear timeline bar that the user can drag from 0-100% (actually, zero-time to total-duration indicated by the top-level publication metadata, or alternatively by the sum of all reading-order SMILs). The MO engine then maps this linear time representation to a structural position inside the SMIL timing tree, simply by scanning the loaded MO model. This also makes it easier to handle skippability and escapability during playback.

The obvious drawback is some upfront parsing cost, and increased memory consumption (this latter point is not such a big deal on modern devices though ... MO is not really meant for low-end e-ink devices). In my opinion, the benefits far outweigh the drawbacks.

That being said, what differentiates Readium2 is the clear architectural facet of backend/server side state (where the MO models are populated just as in Readium1), combined with the client side runtime which may load SMIL timing trees just-in-time (i.e. the additional HTTP request to MO links for individual HTML spine items / chapters).

In that Readium2 case, a MO playback engine implementation will probably want to load the timing tree for the entire publication anyway, for the reasons I mentioned before. Therefore, the top-level MO link in the publication manifest will have to deliver a data model similar to the one generated by Readium1 (i.e. a simple array-like aggregation of contiguous parsed SMIL trees, no need to be smart by somehow merging timing trees).

@HadrienGardeur
Copy link
Contributor Author

@danielweck so you're suggesting that streamers and publication servers should merge all the SMIL together and only serve a publication-level link to the media overlay JSON document?

The initial discussion is slightly different since the focus was on syntax and (potentially) authoring.

@danielweck
Copy link
Member

No, I am quite happy continuing to serve individual SMIL "chapters" (i.e. mirroring exactly the logical organisation of an EPUB3 Media Overlays), in addition to the full-spine aggregated SMILs. I am however suggesting that the JSON syntax of the full-spine multi-SMIL consists simply in an array-like combination of each individual referenced SMIL in the original EPUB3 Media Overlays (no attempt to make a smart merge of the SMIL timing trees). The potential downside of this approach, is that the edge-case mapping "multi HTML contiguous spine items => single SMIL" can produce redundant data, unless some smart trimming of the timing tree is performed beforehand at the parser level (but note that extracting SMIL timing containers from a SMIL tree is a bit like attempting to split CSS style definitions ... the context is important and easy to break, due to semantically/structurallly-meaningful nested sequence containers).

@danielweck
Copy link
Member

danielweck commented Apr 3, 2019

As discussed at the conference call:

  1. base URL to resolve the link object href which are possibly relative paths (no need for self link, just the originating URL/path for the JSON resource?)

  2. mandatory media / content type for link objects in the links tuple, so that a typical Media Overlay processing agent can discover the "text" and "audio" pairs (equivalent to the leaves in the SMIL timing tree).

  3. example with typical deep nested media pairs/tuples (par SMIL node), descendants of seq SMIL nodes which represent the structural/semantics of targeted HTML documents. See: https://github.com/readium/webpub-manifest/blob/master/schema/link.schema.json#L57 (rootchildren relates to the extensibility mechanism for sub-collections: https://github.com/readium/webpub-manifest/blob/master/schema/publication.schema.json#L70 )

@danielweck
Copy link
Member

The SMIL body is a seq time container root, that can carry its own role and back-reference (textref) to the HTML document it maps to (structural/semantics). So the children array is problematic in that respect.

@danielweck
Copy link
Member

danielweck commented Apr 3, 2019

Let's try a deeper (more typical) SMIL timing tree, with intermediary seq containers (used to structurally and semantically map with the targeted HTML document) all the way down to the par media pairs / tree leaves (audio, text).

SORRY, POSTED TOO QUICKLY (WILL PUBLISH AGAIN)

@danielweck
Copy link
Member

Here is the example from the specification:
http://www.idpf.org/epub/31/spec/epub-mediaoverlays.html#sec-media-overlays-structure

chapter1.smil:

<smil xmlns="http://www.w3.org/ns/SMIL" xmlns:epub="http://www.idpf.org/2007/ops" version="3.0">

    <body>



        <!-- a chapter -->

        <seq id="id1" epub:textref="chapter1.xhtml#s01" epub:type="chapter">



            <!-- the section title -->

            <par id="id2">

                <text src="chapter1.xhtml#section1_title"/>

                <audio src="chapter1_audio.mp3"

                       clipBegin="0:23:23.84"

                       clipEnd="0:23:34.221"/>

            </par>



            <!-- some sentences in the chapter -->

            <par id="id3">

                <text src="chapter1.xhtml#text1"/>

                <audio src="chapter1_audio.mp3"

                       clipBegin="0:23:34.221"

                       clipEnd="0:23:59.003"/>

            </par>

            <par id="id4">

                <text src="chapter1.xhtml#text2"/>

                <audio src="chapter1_audio.mp3"

                       clipBegin="0:23:59.003"

                       clipEnd="0:24:15.000"/>

            </par>



            <!-- a figure -->

            <seq id="id7" epub:textref="chapter1.xhtml#figure">

                <par id="id8">

                    <text src="chapter1.xhtml#photo"/>

                    <audio src="chapter1_audio.mp3"

                           clipBegin="0:24:18.123"

                           clipEnd="0:24:28.764"/>

                </par>

                <par id="id9">

                    <text src="chapter1.xhtml#caption"/>

                    <audio src="chapter1_audio.mp3"

                           clipBegin="0:24:28.764"

                           clipEnd="0:24:50.010"/>

                </par>

            </seq>



            <!-- more sentences in the chapter (outside the figure) -->

            <par id="id12">

                <text src="chapter1.xhtml#text3"/>

                <audio src="chapter1_audio.mp3"

                       clipBegin="0:25:45.515"

                       clipEnd="0:26:30.203"/>

            </par>

            <par id="id13">

                <text src="chapter1.xhtml#text4"/>

                <audio src="chapter1_audio.mp3"

                       clipBegin="0:26:30.203"

                       clipEnd="0:27:15.000"/>

            </par>



        </seq>

    </body>

</smil>

chapter1.xhtml:

<html xmlns="http://www.w3.org/1999/xhtml" 

      xmlns:epub="http://www.idpf.org/2007/ops" 

      xml:lang="en" 

      lang="en">

    <head>

        <title>Media Overlays Example of EPUB Content Document</title>

    </head>

    <body id="sec1">

        <section id="sectionstart" epub:type="chapter">

            <h1 id="section1_title">The Section Title</h1>

            <p id="text1">The first phrase of the main text body.</p>

            <p id="text2">The second phrase of the main text body.</p>

            <figure id="figure">

                <img id="photo" 

                     src="photo.png" 

                     alt="a photograph for which there is a caption" />

                <figcaption id="caption">The photo caption</figcaption>

            </figure>

            <p id="text3">The third phrase of the main text body.</p>

            <p id="text4">The fourth phrase of the main text body.</p>

        </section>

    </body>

</html>

@danielweck
Copy link
Member

danielweck commented Apr 3, 2019

This is the resulting JSON:

UPDATED to enclose role and id in metadata.

{
    "metadata": {
        "duration": "0:27:15.000",
        "role": [
            "body"
        ]
    },
    "links": [
        {
            "href": "chapter1.xhtml",
            "type": "application/xhtml+xml"
        }
    ],
    "children": [
        {
            "metadata": {
                "id": "id1",
                "role": [
                    "chapter"
                ]
            },
            "links": [
                {
                    "href": "chapter1.xhtml#s01",
                    "type": "application/xhtml+xml"
                }
            ],
            "children": [
                {
                    "metadata": {
                        "id": "id2"
                    },
                    "links": [
                        {
                            "href": "chapter1.xhtml#section1_title",
                            "type": "application/xhtml+xml"
                        },
                        {
                            "href": "chapter1_audio.mp3#t=0:23:23.84,0:23:34.221",
                            "type": "audio/mpeg"
                        }
                    ]
                },
                {
                    "metadata": {
                        "id": "id3"
                    },
                    "links": [
                        {
                            "href": "chapter1.xhtml#text1",
                            "type": "application/xhtml+xml"
                        },
                        {
                            "href": "chapter1_audio.mp3#t=0:23:34.221,0:23:59.003",
                            "type": "audio/mpeg"
                        }
                    ]
                },
                {
                    "metadata": {
                        "id": "id4"
                    },
                    "links": [
                        {
                            "href": "chapter1.xhtml#text2",
                            "type": "application/xhtml+xml"
                        },
                        {
                            "href": "chapter1_audio.mp3#t=0:23:59.003,0:24:15.000",
                            "type": "audio/mpeg"
                        }
                    ]
                },
                {
                    "metadata": {
                        "id": "id7"
                    },
                    "links": [
                        {
                            "href": "chapter1.xhtml#figure",
                            "type": "application/xhtml+xml"
                        }
                    ],
                    "children": [
                        {
                            "metadata": {
                                "id": "id8"
                            },
                            "links": [
                                {
                                    "href": "chapter1.xhtml#photo",
                                    "type": "application/xhtml+xml"
                                },
                                {
                                    "href": "chapter1_audio.mp3#t=0:24:18.123,0:24:28.764",
                                    "type": "audio/mpeg"
                                }
                            ]
                        },
                        {
                            "metadata": {
                                "id": "id9"
                            },
                            "links": [
                                {
                                    "href": "chapter1.xhtml#caption",
                                    "type": "application/xhtml+xml"
                                },
                                {
                                    "href": "chapter1_audio.mp3#t=0:24:28.764,0:24:50.010",
                                    "type": "audio/mpeg"
                                }
                            ]
                        }
                    ]
                },
                {
                    "metadata": {
                        "id": "id12"
                    },
                    "links": [
                        {
                            "href": "chapter1.xhtml#text3",
                            "type": "application/xhtml+xml"
                        },
                        {
                            "href": "chapter1_audio.mp3#t=0:25:45.515,0:26:30.203",
                            "type": "audio/mpeg"
                        }
                    ]
                },
                {
                    "metadata": {
                        "id": "id13"
                    },
                    "links": [
                        {
                            "href": "chapter1.xhtml#text4",
                            "type": "application/xhtml+xml"
                        },
                        {
                            "href": "chapter1_audio.mp3#t=0:26:30.203,0:27:15.000",
                            "type": "audio/mpeg"
                        }
                    ]
                }
            ]
        }
    ]
}

@danielweck
Copy link
Member

Note that the above conversion does not translate the time/clock values to seconds (so we can easily cross-reference with the original EPUB3 SMIL).

@danielweck
Copy link
Member

Note that the above JSON introduces the id property, to preserve the SMIL info.

@danielweck
Copy link
Member

danielweck commented Apr 3, 2019

Note that metadata duration is not present in the original SMIL, we derive it from the EPUB OPF package metadata and expose it here as first-class citizen to avoid levels of indirection when retrieving the Media Overlays JSONs.

@danielweck
Copy link
Member

Note that the children JSON keys/properties are not in the "link" object, instead they are part of the extensibility mechanism. I will publish an alternative proposal that leverages the "link" object's own children property. https://github.com/readium/webpub-manifest/blob/6930a12439d7b36f2302d1ef233a6ad41b4854d6/schema/link.schema.json#L57

@danielweck
Copy link
Member

Note that the initial children array contains only one child (i.e. the root of the SMIL tree). The role = body was added for illustration purposes, but the original SMIL in fact does not explicitly have this epub:type.

@danielweck
Copy link
Member

danielweck commented Apr 3, 2019

This alternative syntax proposal purely relies on the "link" object's children property to express hierarchy, and maps directly to the SMIL tree of seq (with the initial body) and par leaves:

UPDATE: added empty href (#) to link objects, to pass JSON Schema validation.

{
    "metadata": {
        "duration": "0:27:15.000"
    },
    "links": [
        {
            "role": [
                "body"
            ],
            "href": "chapter1.xhtml",
            "type": "application/xhtml+xml",
            "children": [
                {
                    "id": "id1",
                    "role": [
                        "chapter"
                    ],
                    "href": "chapter1.xhtml#s01",
                    "type": "application/xhtml+xml",
                    "children": [
                        {
                            "id": "id2",
                            "href": "#",
                            "children": [
                                {
                                    "href": "chapter1.xhtml#section1_title",
                                    "type": "application/xhtml+xml"
                                },
                                {
                                    "href": "chapter1_audio.mp3#t=0:23:23.84,0:23:34.221",
                                    "type": "audio/mpeg"
                                }
                            ]
                        },
                        {
                            "id": "id3",
                            "href": "#",
                            "children": [
                                {
                                    "href": "chapter1.xhtml#text1",
                                    "type": "application/xhtml+xml"
                                },
                                {
                                    "href": "chapter1_audio.mp3#t=0:23:34.221,0:23:59.003",
                                    "type": "audio/mpeg"
                                }
                            ]
                        },
                        {
                            "id": "id4",
                            "href": "#",
                            "children": [
                                {
                                    "href": "chapter1.xhtml#text2",
                                    "type": "application/xhtml+xml"
                                },
                                {
                                    "href": "chapter1_audio.mp3#t=0:23:59.003,0:24:15.000",
                                    "type": "audio/mpeg"
                                }
                            ]
                        },
                        {
                            "id": "id7",
                            "href": "chapter1.xhtml#figure",
                            "type": "application/xhtml+xml",
                            "children": [
                                {
                                    "id": "id8",
                                    "href": "#",
                                    "children": [
                                        {
                                            "href": "chapter1.xhtml#photo",
                                            "type": "application/xhtml+xml"
                                        },
                                        {
                                            "href": "chapter1_audio.mp3#t=0:24:18.123,0:24:28.764",
                                            "type": "audio/mpeg"
                                        }
                                    ]
                                },
                                {
                                    "id": "id9",
                                    "href": "#",
                                    "children": [
                                        {
                                            "href": "chapter1.xhtml#caption",
                                            "type": "application/xhtml+xml"
                                        },
                                        {
                                            "href": "chapter1_audio.mp3#t=0:24:28.764,0:24:50.010",
                                            "type": "audio/mpeg"
                                        }
                                    ]
                                }
                            ]
                        },
                        {
                            "id": "id12",
                            "href": "#",
                            "children": [
                                {
                                    "href": "chapter1.xhtml#text3",
                                    "type": "application/xhtml+xml"
                                },
                                {
                                    "href": "chapter1_audio.mp3#t=0:25:45.515,0:26:30.203",
                                    "type": "audio/mpeg"
                                }
                            ]
                        },
                        {
                            "id": "id13",
                            "href": "#",
                            "children": [
                                {
                                    "href": "chapter1.xhtml#text4",
                                    "type": "application/xhtml+xml"
                                },
                                {
                                    "href": "chapter1_audio.mp3#t=0:26:30.203,0:27:15.000",
                                    "type": "audio/mpeg"
                                }
                            ]
                        }
                    ]
                }
            ]
        }
    ]
}

@HadrienGardeur
Copy link
Contributor Author

Thanks for these examples @danielweck.

  1. In your first example, both id and role should be in metadata, otherwise they'll be treated as subcollections by the schema and won't be valid.
  2. In your second example, you introduce id and role in the Link Object. This wouldn't be rejected by the schema but they would also be undefined under our current model. In general we try to minimize such extensions directly in the Link Object through the use of properties instead.
  3. There's another issue with the second example since many Link Objects do not have a href which would be invalid according to our schema.

@danielweck
Copy link
Member

@HadrienGardeur
Yes, to illustrate the challenge of preserving information from the original SMIL timing containers (body, seq and par), I intentionally used the id and role (epub:type) properties in the link object ... which are unfortunately not supported "natively": https://github.com/readium/webpub-manifest/blob/master/schema/link.schema.json (and rel isn't semantically correct either)

@danielweck
Copy link
Member

There are existing publications that use a valid EPUB3 "design" / authoring pattern in NavDoc, in order to partition / categorize TOC links. For example, Children's Litterature
https://github.com/IDPF/epub3-samples/blob/master/30/childrens-literature/EPUB/nav.xhtml#L11

This results in a hierarchy of link objects with intermediary containers that lack the href property. This is handled correctly at rendering time in r2-testapp-js and readium-desktop (aka Thorium). Not sure about other platforms.

@HadrienGardeur
Copy link
Contributor Author

This results in a hierarchy of link objects with intermediary containers that lack the href property. This is handled correctly at rendering time in r2-testapp-js and readium-desktop (aka Thorium). Not sure about other platforms.

An alternative to that would be a default href, for example #.

@danielweck
Copy link
Member

I added empty href (#) to link objects, to pass JSON Schema validation.

@danielweck
Copy link
Member

I enclosed role and id in metadata.

@HadrienGardeur
Copy link
Contributor Author

Based on our recent discussions, I think we can close this issue. We need to keep an eye on the W3C CG and make sure that we align with it.

Can you open an issue specifically for that @danielweck ?

@HadrienGardeur HadrienGardeur unpinned this issue May 28, 2019
@danielweck
Copy link
Member

Follow-up: #109

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants