-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What does a WPub URL resolve to? #94
Comments
Do we need to require toc in this, or do we only need to require that the link resolve to a resource that is considered the primary entry point for the publication, and that must include a link to the manifest? It sounds like we're mandating the presence of a table of contents, when not every publication will need one. Using the table of contents, or ensuring the landing document has a clear link to one, is perhaps only best practice in the case of multi-document publications? |
Yeah, this is one of my fundamental questions. If I point my browser at the URL of a WP, what happens?
Strongly agree that the URL must resolve to an HTML document, and that this HTML document must contain a link to the manifest (or contain the manifest; I think that's still an open question). Like @mattgarrish, I'm a little less certain about requiring this document to contain a TOC. Perhaps if it doesn't contain a TOC, and it is a multiple-document publication, it must link to a TOC? Sadly Do we require that this be the first document in the default reading order? I think yes. In retrospect, the whole "begin reading" thing in EPUB felt like an evasion. Saying that [1] front matter is so important that it must come at the front of the book, and [2] that it is so unimportant that we don't want the reader to actually see it, is just avoiding responsibility for designing your own content appropriately. |
To repeat some arguments, just for the records, against option 2:
My conclusion on option 2 that it should not be adopted. |
[edited] Ivan, are you talking about option 2? That seems to be the one where some publication metadata is expressed in HTML. |
Oh bugger, I wanted to say Option 2! I will edit the comment... (Lesson: never say anything serious before breakfast!) |
There's a precedent however in web apps, where HTML’s |
Per the TOC (or not) in option 3: indeed, maybe the TOC is not the ideal solution, although, in many cases it looks like the natural fit. The current draft lists three required information items: title, list of web publication resources, and a default reading order. I would not want the title appearing in the landing page (at least not being the information item) for the same reasons as in my comment. Not sure whether the list of resources and the reading order should appear there. To say 'the landing page is the first document in the reading order may not be good either: the content may not be in HTML. We had this discussion on the TPAC F2F of publications consisting of drawings or audio files only. For those cases, the TOC is helpful for non WPUB aware browsers. I would say: the landing page contains the TOC (through a (The landing page may still include other information that publisher wishes to provide. This is not for us to define.) |
Maybe there are such apps in the wild, and I do not think what they do is correct. Maybe, but only maybe, the Web Platform WG will, at some point in the distant future, pave that particular cowpath (I cannot judge how wide this usage is), but I do not think this WG should adopt an approach which is in violation with the specifications as of today. |
(Just to make it clear: my personal vote goes firmly for option 3...) |
I'm kind of torn on this, as we're already doing similar in allowing certain information to be harvested from the content (title, language, etc.). It's (arguably) useful information for legacy browsers that don't support the publication. But I agree to the extent that we should not be confusing people that it's either/or. Maintaining metadata in multiple places is always a disaster in waiting. WAM gives precedence to manifest metadata and we should do the same. I prefer Ivan's approach, but can live with what we've done. At a minimum, we can never stop people from expressing whatever metadata they want wherever they want. |
Considering the 4 scenarios I think option 3 seems the best solution to accommodate all types of clients. From a user perspective, we would always need some kind of landing page for the WP. Typically, it should contain a TOC (via the |
Ah, this helps clarify that I have a concern here. If this HTML document is not the first document in the default reading order, then a WP-aware user agent will present an entirely different resource to the user than a non-WP-aware user agent. Quick example: if my WP has this (very common) order:
I might set things up so that the URL for the WP resolves to I agree that having the URL resolve to a TOC is ideal. You can always hide it visually. I would also argue that something in the WP has to be HTML (or something that supports links). I don't see how you make a JSON manifest + audio/image files only work in option 3, because otherwise a non-aware UA cannot access the content through the WP URL. |
But why enforce this? What does it accomplish, really? What if I want my landing page to be the cover page and clicking on the cover page takes you to the table of contents? |
Why would the UA change the document you've navigated to? If the reading order forces you away from the resource you want to view, that's a very bad thing. |
I'm only saying I don't think that using That said, my vote goes to option 3. I'm just thinking that it doesn't necessarily precludes mixing it with option 2 to widen UA support, like web apps do: have an authoritative external manifest, but also fallback to HTML metadata when relevant. |
I suppose that's what I'm wondering, but perhaps this is more a lifecycle & implementation question. A WP-aware UA opens the HTML resource at the WP URL, and processes the manifest. What is it obligated to do then? I suppose it could present a "begin reading" button that would cause a navigation. Maybe it just creates a navigation/personalization overlay (as Readium Cloud Viewer does). Maybe my point is that it would be silly to have the URL point to something other than what you want the reader to see first. Even RFC 6919 doesn't have formal language for that :) |
Right, this is where we (probably) won't find consensus. I'm of the mind that the user agent shouldn't change the resource you're on when it initiates the reading experience. But if it does, I also don't have any problem with that as user's will decide the fate of such a feature. I just don't think such a change should happen without prompting, in which case both the vanilla and enhanced experiences are ultimately the same, there's just an extra opt-in from the user that changes one scenario. I think it would also help to make clear that we're trying to accommodate many reading scenarios somewhere (browser, app (in browser), polyfilled publication, publication as app), but let's not discuss that in this thread. |
What I think we should avoid is that the same information item could be expressed in different places. I am afraid of the possible confusion that would result from that (what if they are defined in different places? Which one has priority? What if the information are conflicting? etc.) To be more specific: we have a set of information items that must be serialized. My mental model is:
In other words, there is no ambiguity on where a specific information item is defined. If this model works that we must choose which information item is covered by (1). The TOC seems to be the best choice.. |
I agree and disagree. There is one piece of information that must appear on the landing page: the link to the manifest. It can appear elsewhere, but whether it has to we keep changing course on (I preferred our original recommendation, so don't object to the subsequent PR that's appeared.) My fears may be overblown, granted, but I think when we stray into mandating content we're going down a potentially restrictive and unwanted path. I'm not going to lie down in the road over this, though. |
So do we have rough consensus that:
|
I am not sure we need to express metadata level information as meta elements, as I agree with Ivan that seem like appropriating file level metadata. But there is also the option of simply encoding all the metadata in HTML, by (ab)using classes (or another mechanism). That gives us full markup in the metadata, as well as CSS styling. And it means the content can be suppressed for display via CSS. So the landing page could have the text of the title, list of authors, etc that are also displayed directly to the user without duplication (there isn't an author in the metadata and an author on the landing page). [edit] Though, that said, I could live with either 2 or 3. |
Oops, I meant to acknowledge @bduga - apologies! |
From the cheap seats at the back of #W3CTPAC ... (over several sessions) ... I agree with @mattgarrish. In my experience with the development of the IIIF manifest spec, the link to the JSON is of course the most important. Other useful information is the position to enter the document's structure, if not the beginning, allowing multiple entry points with different behavior. |
What we have in the infoset is that you can specify the toc as a property or you can reference the html nav element that contains it. I think that's optimal, as it doesn't require the toc to be in any particular place, and doesn't necessitate duplication. It also allows different tocs for content and rs if you so desire. |
At this point I believe we indeed have a consensus as described in Dave's comment, and we do not fully agree what the landing page would really contain. I would propose, just for the sake of moving ahead, to put that into the document (in the form of a PR) and open a separate issue on, specifically, "MUST the TOC appear on the landing page" which we will have to cover at some point. |
I completely agree with @mattgarrish. Further, we are not writing a spec for purchasing publications. As long as we don't hinder the sales and distribution process, which we are not (perhaps even helping it), I think we are fin. |
If we require to return a page that contains:
... then sure, this could be a problem. My point is that this won't always be the case, and as you've pointed out that there will be a certain number of indirect steps before we actually get to the content itself. |
But we don't require anything to be returned. All we say is that at that at the address URL is an html page of some kind with a link element pointing to the manifest. There's no guarantee it's world readable. I agree that we shouldn't try to over-prescribe what content has to be at that location. Advice about having navigation aids is still sound, as I could be following a link in to the publication on a user agent that doesn't support web publications, but haven't we moved away from saying any certain content should/must be present? |
Are we sure that the manifest itself won't be behind a redirect or 4xx code of its own? |
We don't need to re-specify all the various HTTP status code use cases. What we MUST (hehe) specify is what comes back when the URL for a Web Publication is returned. If we want this to work in the browsers of today--which don't know what a Web Publication is or what it should do with it--then the URL for a Web Publication MUST be an HTML document. If we can agree to that much, then we can file issues for all the other things that have been mentioned in here--but let's do that after we make some progress. So...
|
I fear we are going into circles here, and I don't think we'll come to something useful as they are multiple ways to create a paywall (most are client side ways to hide content, but some are based on server side redirects, or so it seems). How about: |
No, it doesn't mean that. The URL is the https://abonnes.lemonde.fr/ and, after some turnaround the server will reply on that request withthe content of that WP's landing/starting/whatever page. (B.t.w., the example shows that it is perfectly possible to set up a paywall around a WP. The fact that, maybe, some of today's setup does not function that way is not relevant.)
First of all, we agreed that it is not required to put content from publication. Only the link to the manifest. And I am not convinced it is a problem. (I am not talking about a never-on-the-Web-PWP, we have discussed that before and it is for a later discussion.) Hadrien, let me turn things around (yes, I am sneaky). What is your answer to the question of the original issue, and the problem statement of @prototypo? |
@llemeurfr, I guess
should say
With that change I agree with what you say but, as @BigBlueHat said:
|
Yes, this is the same rabbit hole we went down looking at the lifecycle. The more we try to solve specific scenarios the more will pop up, or more exceptions we'll think of. |
I think we need to remember what we are documenting here. We are not writing a spec for a bookstore. We are writing a spec for Web Publications. As long as we do not disrupt the supply chain, I don't think discussions of pay walls etc are relevant, and I think we can close this issue with @mattgarrish #101. |
I'm about to push a change that swaps landing page for entry page, but thoughts welcome on that prose. |
👍 to pushing ahead with #101 and making that the foundation going forward. Once #101 is merged, I'd consider this particular issue closed and new ones (about All The Things) can be filed separately. Great work as ever @mattgarrish! |
@TzviyaSiegman that's not what we've been discussing, these questions are all relevant to what can and cannot be present in such a page.
At this point, I feel that the only thing we can say is that the WPUB URL points to an HTML document. This document:
But these are all MAY, I don't think any of these items can be a MUST. |
@HadrienGardeur let's file those as separate issues and discuss them there. Knowing what we get back from the URL that identifies a Web Publication is a necessary foundational step to any of those. |
@BigBlueHat identification and discovery are two different concepts, mashing them together is tricky and can lead to various problems. IMO the only thing that can identify that a given resource is within a WP is the manifest itself. |
@HadrienGardeur that's not what's being discussed here. We're discussing what one gets back when requesting the identifying URL of the Web Publication. That has to be HTML if we're going to teach browsers new tricks. |
@BigBlueHat and this is the one thing that we agree on. |
@HadrienGardeur excellent! 😁 That's all this issue was meant to be about afaict. Everything else deserves it's own discussion space. Let's ship #101 and go file those other issues. 😄 |
@BigBlueHat #101 goes beyond that and I've made a comment stating exactly why. |
update landing page to match issue discussion in #94
Let's keep sight of the fact that these are not final decisions for a specification that is about to ship. Accepting this only puts it on the radar for comments in the FPWD, which is what we need to get out shortly. That's why I didn't use the "c" word and only cited numbers. If this is the wrong decision, I have faith in the crowd. If you open new issue(s) stating why you believe these requirements are wrong, I'll also add a pointer to the section, Hadrien. |
FWIW, I would amend @HadrienGardeur list above
with only a three character change... the final MAY should be a MUST, as we must be able to find the Publication (resource list, et al) from said Publication's landing page. |
It strikes me as wrong at some fundamental level that a page that doesn't belong to any publication can initiate any publication, which is what anything less than a must seems to suggest. It just feels like there's a trust problem inherent in such a scenario. Am I getting spoofed? Will that page appear in the publication until I navigate somewhere else? |
How is that any different from the way RSS/Atom or PWA work? I could look at a presentation page for a Web App that won't be in the Web App itself that will trigger the Chrome install banner for example. |
Installing an app isn't the same thing as instantiating the publication, though. There's a scope on the web app, so any resource outside that scope couldn't pretend to belong to the app when the app is launched. But what I find confusing in where I see this going is that sometimes a web page that links to a manifest is part of the publication and when you initiate the publication you initiate that page. But then some other times you have a link to initiate the publication but the page you're on is not part of the publication. And you seemingly can't rely on the resource list to check this, since it has no requirement for completeness of resources or even top-level resources. So how does a user agent know when to initiate the current page, or when it's an indirection to something else? That's the hole that this opens in my mind, perhaps more than the security, and that hole exists to some extent independent of this issue. But once you have that hole, and because a page on one domain can instantiate a web publication on another, when does the user agent doubt the legitimacy of initiating the page? |
We need to stop overusing this issue, so not cutting off discussion by closing, but please start a new one so we can focus the discussion. |
Problem Statement
We have decided that a WPub with be identified with a URL. What should that URL resolve to?
This issue suggests four scenarios and evaluates their ramifications in the hopes of driving toward consensus. A conclusion section at the bottom of this post suggests an answer to the question posed.
Scenarios
The working group has considered several possible answers to this question. They mostly fall into one of four scenarios:
Each of these scenarios are considered in turn as they would be viewed by four types of clients:
Scenario 1: JSON Manifest
If a WPub URL resolves to a JSON manifest file, the four clients may be expected to act like this:
Scenario 1 seems inconvenient for both old and new clients with the exception of new WPub aware user agents.
Scenario 2: HTML TOC & Metadata
If a WPub URL resolves to an HTML file that contains both a table of contents (TOC) and other metadata, the four clients may be expected to act like this:
Scenario 2 would be handled cleanly for old and new clients, but could cause difficulties when old clients are presented with metadata they cannot understand. There is also some danger of overloading or stretching the use of HTML to define the necessary metadata within an HTML document. New clients would need to parse the metadata out of the HTML to operate upon it.
Scenario 3: HTML TOC & JSON Manifest
If a WPub URL resolves to an HTML file containing a TOC and which links to a JSON manifest file, the four clients may be expected to act like this:
Scenario 3 seems to cleanly handle old and new clients in appropriate ways. Old clients could follow their noses to the components of a WPub, and new clients could easily load the JSON object to efficiently access metadata.
Scenario 4: Binary WPub
If a WPub URL resolves to a binary file, the four clients may be expected to act like this:
Scenario 4 could lead to confusion for both old and new clients unless a new MIME type is registered.
My Conclusion
Given the various pros and cons, Scenario 3 (the URL to a WPub resolves to an HTML TOC document, which in turn links to a JSON manifest) seems to most easily interoperate with the existing Web and provides a clean upgrade path to WPub-aware clients.
Acknowledgements
Thanks to @BigBlueHat @TzviyaSiegman @iherman @tcole3 @bdugas @GarthConboy and others from the Publishing WG for discussions which led to this issue and its discussion.
The text was updated successfully, but these errors were encountered: