Skip to content
This repository has been archived by the owner on Jun 27, 2023. It is now read-only.

Information content of the abstract manifest #12

Closed
dauwhe opened this issue Jun 27, 2017 · 26 comments
Closed

Information content of the abstract manifest #12

dauwhe opened this issue Jun 27, 2017 · 26 comments

Comments

@dauwhe
Copy link

dauwhe commented Jun 27, 2017

What information is required for an abstract manifest? [edited to add items from comments]

  1. An identifier for the web publication, which should be a URL
  2. Some way of saying that this URL represents a web publication.
  3. Some way of identifying the constituent resources of the web publication.
  4. Some way of providing a preferred order of (some of) the constituent resources in case there is more than one
  5. Some way of being able to add more complex metadata to a publication. (Not clear to my mind whether we would define a minimally required set of metadata, but the slot should be there.)
  6. Locating table of contents or other navigation structure

What else? I think we should distinguish required information from "nice to have" information.

@GarthConboy
Copy link

I'd also throw in:

-- Reading order
-- Basic metadata (yes, a can of worms we'll need to open)

Re the #1 and #2 just above in Dave's original issue, it seems they may want to be pre-manifest -- defined before the manifest is found, or be the actual path to the manifest (or to a "first file" that can be rendered, but also somehow points to the manifest).

@iherman
Copy link
Member

iherman commented Jun 27, 2017

  1. Some way of providing a preferred order of (some of) the constituent resources in case there is more than one
  2. Some way of being able to add more complex metadata to a publication. (Not clear to my mind whether we would define a minimally required set of metadata, but the slot should be there.)

@iherman
Copy link
Member

iherman commented Jun 27, 2017

(Wow. I just said the same thing as Garth just in other words. I swear we did not conspire...)

@mattgarrish
Copy link
Member

What is meant by required here? Must always be present or must be accounted for in the design? This is why I wasn't sure at the f2f if navigation constituted a top-level or lower-level consideration.

A standardized means of locating the table of contents seems critical to me, even if it's optional to define and there are no epub-like rules on its construction.

@GarthConboy
Copy link

The updated #6 in the first panel says "Locating table of contents or other navigation structure", we should also consider:

-- Do we need such a Nav file (likely yes for A11Y)
-- Should it be in the Manifest or pointed-to by the Manifest (I could see an argument for all eggs in one basket -- though the machine readable or renderable discussion will arise)

@dauwhe
Copy link
Author

dauwhe commented Jun 28, 2017

Do we need such a Nav file (likely yes for A11Y)

See #14

Should it be in the Manifest or pointed-to by the Manifest

Interesting question. I know Hadrien has proposed including section titles in a JSON manifest, but I have major concerns about possible reader-facing text in JSON (especially given that there's a standard html way to do this stuff).

@HadrienGardeur
Copy link

I know Hadrien has proposed including section titles in a JSON manifest, but I have major concerns about possible reader-facing text in JSON (especially given that there's a standard html way to do this stuff).

IMO the Navigation Document in EPUB 3 is a failed experiment. Most EPUB 3 documents that I've seen end up including at least two HTML table of contents:

  • a nice looking one, included in the spine and not marked as being a Navigation Document
  • a basic one, used as the Navigation Document

Most EPUB 3 reading systems do not render these Navigation Documents either, they simply parse them, extract the info and display things using their own UI.

This is a typical example of "spec purity" (the beauty of the Navigation Document) vs real world usage (no one is rendering these documents and we end up with more redundancy instead of less).

Readium (1, JS and 2) ended up parsing the info in the Navigation Document and providing a JSON output instead, which is much easier for developers to work with.

In the Readium Web Publication Manifest:

  • there is absolutely zero requirement for a table of contents (I strongly believe that we shouldn't force a ToC on single resource publications that won't need one)
  • all the different ToC types that exist in EPUB are parsed (NCX, landmarks in OPF and Navigation Document) and exposed in a consistent way (collections) in the manifest
  • we also keep links to the Navigation Document in spine or resources and identify them as such using a rel value

@HadrienGardeur
Copy link

To go back to the initial question, in Readium we separate clearly the abstract model with the minimal requirements for a manifest.

The abstract model has three core concepts:

  • metadata (based on JSON-LD)
  • links
  • collections (identified by a role, can aggregate metadata, links and other collections)

For each core concept, we make sure that:

  • the requirements are very basic
  • the model is flexible and powerful enough to allow the expression of complex use cases
  • a number of extensibility points are available and clearly identified

The basic requirements for a manifest are then based on that model:

  • a manifest should at least contain a title in its metadata
  • it should at least contain a link to itself, identified by the self relation
  • it should contain at least one resource in its spine collection, which contains the key resources for a publication in reading order

@llemeurfr
Copy link

llemeurfr commented Jul 3, 2017

An identifier for the web publication, which should be a URL

Better, an IRI because a) may be a urn (up to the publisher to choose, the Web doesn't care) and b) i18n is important. A URL to the origin is also important but should be another property.

@WSchindler
Copy link
Contributor

I would like to add:
7. language(s) used in the WP - the plural is due to the fact that we will have publications such as parallel texts (original + one or more translations), bilingual dictionaries which contain 1-n languages . The language used has also implications for rendering (e.g. "ltr" vs "rtl", vertical layout)

@HadrienGardeur
Copy link

Language and direction (ltr vs rtl) should be two separate metadata. Agree that we need to allow more than one language.

@lrosenthol
Copy link

lrosenthol commented Jul 3, 2017 via email

@llemeurfr
Copy link

Re. URL vs IRI, after reading https://www.w3.org/International/wiki/IRIStatus, I must admit that this seems like a can of dirty warms. Apart from trying to allow for an extended i18n of publication identifiers, there is still the question of URNs allowed or not as global identifiers. For instance, I spotted that most @HadrienGardeur's Manifest samples use isbn urns as identifiers.

@HadrienGardeur
Copy link

HadrienGardeur commented Jul 5, 2017

@llemeurfr you're mixing up two different concept regarding the Readium Web Publication Manifest.

Keep in mind that we started this work in the context of BFF and that for Readium-2 we mostly ingest EPUB files.

The only requirement in the draft document for the Readium WebPub Manifest is to always provide a self link. In the context of a Web Publication it makes perfect sense: if a publications lives on the Web, we need a URL that can point to its manifest.

Here's a basic example using the Readium WebPub Manifest model:

"@context": "http://readium.org/webpub/default.jsonld",
"metadata": {
  "title": "The Master and Margarita"
},
"links": [
  {"rel": "self", "href": "http://example.com/manifest.json", "type": "application/webpub+json"}
],
"spine": [
  {"href": "http://example.com/chapter1", "type": "text/html"}
]

If the publication has an additional identifier, this can be provided in its metadata:

"metadata": {
  "title": "The Master and Margarita",
  "identifier": "urn:isbn:9780141180144"
}

That second identifier is not a requirement in the Readium model, and we can't expect all Web Publications to have such an identifier either.

The reason why most of our current samples have URNs (mostly for ISBNs or UUIDs) is because we ingest EPUB files or provide samples for books where ISBNs are very common.

@dauwhe
Copy link
Author

dauwhe commented Jul 5, 2017

I would like to add:
7. language(s) used in the WP - the plural is due to the fact that we will have publications such as parallel texts (original + one or more translations), bilingual dictionaries which contain 1-n languages . The language used has also implications for rendering (e.g. "ltr" vs "rtl", vertical layout)

My only concern is that HTML already has mechanisms for describing the language(s) of content. What happens when a user agent opens an HTML page declared with language A, finds a rel=manifest link, follows it, and sees language B declared?

@HadrienGardeur
Copy link

My only concern is that HTML already has mechanisms for describing the language(s) of content. What happens when a user agent opens an HTML page declared with language A, finds a rel=manifest link, follows it, and sees language B declared?

The manifest declares the language for the publication, while HTML is meant to declare the language for that resource.
The UA would simply set the default to language B but override that option with language A as it displays or interacts with that HTML page.

@llemeurfr
Copy link

you're mixing up two different concept regarding the Readium Web Publication Manifest.

That's right. If a Web publication is copied to another website, this value will not be modified. Therefore a possible definition of the self link is "The original location of the Web Publication", which can be aligned with Requirement 8 for Web Publications: "There should be a way to uniquely identify a Web Publication."

@HadrienGardeur
Copy link

From RFC5988:

o Relation Name: self
o Description: Conveys an identifier for the link's context.
o Reference: [RFC4287]

@WSchindler
Copy link
Contributor

It's of course true that via @lang or @xml:lang, you may define the language(s) used in your HTML. I still think that the point of entry for a UA consuming a WP would be the manifest where it would be helpful to find an information on the languages used in the WP. If you have a Chinese-English dictionary, it is IMO no trivial task to prepare the rendering.

@lrosenthol
Copy link

lrosenthol commented Jul 5, 2017 via email

@lrosenthol
Copy link

lrosenthol commented Jul 5, 2017 via email

@HadrienGardeur
Copy link

Actually, I would expect the UA to completely ignore the language settings (A, in this case) in the manifest - and only concern itself with the actual resource being processed/rendered (B, in this case). The language (or languages) in the manifest have no bearing on the actual content - they are
(IMO) informational only.

While rendering content, sure I fully agree. But a UA can provide additional services on top of it, for example dictionaries or search. The publication metadata can be useful in that regard.

@mattgarrish
Copy link
Member

I would expect the UA to completely ignore the language settings
(A, in this case) in the manifest

I agree it's informative and must not be used for rendering content (or metadata), but the same question about value has been raised in epub revisions and the case has been made that it does have uses (e.g., pre-loading tts languages, offering access to dictionaries, etc.).

@lrosenthol
Copy link

lrosenthol commented Jul 5, 2017 via email

@HadrienGardeur
Copy link

It could indeed be useful - and whether a UA chooses to use it for that or
not is (IMO) out of scope for our work.

Defining the UA behavior is out of scope, but making sure that it has relevant info needed is definitely within scope.

@dauwhe
Copy link
Author

dauwhe commented Jul 5, 2017

This issue was moved to w3c/wpub#6

@dauwhe dauwhe closed this as completed Jul 5, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants