Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What does a WPub URL resolve to? #94

Closed
prototypo opened this issue Nov 8, 2017 · 92 comments
Closed

What does a WPub URL resolve to? #94

prototypo opened this issue Nov 8, 2017 · 92 comments

Comments

@prototypo
Copy link
Contributor

Problem Statement

We have decided that a WPub with be identified with a URL. What should that URL resolve to?

This issue suggests four scenarios and evaluates their ramifications in the hopes of driving toward consensus. A conclusion section at the bottom of this post suggests an answer to the question posed.

Scenarios

The working group has considered several possible answers to this question. They mostly fall into one of four scenarios:

  1. A WPub URL resolves to a JSON manifest file
  2. A WPub URL resolves to an HTML file that contains both a table of contents (TOC) and other metadata
  3. A WPub URL resolves to an HTML file containing a TOC and which links to a JSON manifest file
  4. A WPub URL resolves to a binary package (e.g. a ZIP file or SQLite database file)

Each of these scenarios are considered in turn as they would be viewed by four types of clients:

  • An existing search engine bot
  • A WPub aware search engine bot
  • A reader using an existing user agent
  • A reader using a WPub aware user agent

Scenario 1: JSON Manifest

If a WPub URL resolves to a JSON manifest file, the four clients may be expected to act like this:

  • An existing search engine bot (path 1A in the image below) will not index JSON files.
  • A WPub aware search engine bot (path 1B in the image below) will need to index all JSON files on the Web to determine which ones represent WPubs.
  • A reader using an existing user agent (path 1C in the image below) will be presented with a raw JSON representation.
  • A reader using a WPub aware user agent (path 1D in the image below) will be able to see and operate upon the WPub as intended.

Scenario 1 seems inconvenient for both old and new clients with the exception of new WPub aware user agents.

wpub-canonicalresourcejson

Scenario 2: HTML TOC & Metadata

If a WPub URL resolves to an HTML file that contains both a table of contents (TOC) and other metadata, the four clients may be expected to act like this:

  • An existing search engine bot (path 2A in the image below) will encounter and index the HTML page without modification.
  • A WPub aware search engine bot (path 2B in the image below) will be able to determine that the HTML page represents a WPub, but will need to parse the entire file in a WPub-specified manner to extract metadata.
  • A reader using an existing user agent (path 2C in the image below) will see a TOC containing links to the components of the WPub.
  • A reader using a WPub aware user agent (path 2D in the image below) will be able to see and operate upon the WPub as intended.

wpub-canonicalresourcehtml

Scenario 2 would be handled cleanly for old and new clients, but could cause difficulties when old clients are presented with metadata they cannot understand. There is also some danger of overloading or stretching the use of HTML to define the necessary metadata within an HTML document. New clients would need to parse the metadata out of the HTML to operate upon it.

Scenario 3: HTML TOC & JSON Manifest

If a WPub URL resolves to an HTML file containing a TOC and which links to a JSON manifest file, the four clients may be expected to act like this:

  • An existing search engine bot (path 3A in the image below) will encounter and index the HTML page without modification.
  • A WPub aware search engine bot (path 3B in the image below) will be able to determine that the HTML page represents a WPub, and follow the provided link to the metadata if it wishes.
  • A reader using an existing user agent (path 3C in the image below) will see a TOC containing links to the components of the WPub.
  • A reader using a WPub aware user agent (path 3D in the image below) will be able to see and operate upon the WPub as intended.

wpub-canonicalresourcehtmljson

Scenario 3 seems to cleanly handle old and new clients in appropriate ways. Old clients could follow their noses to the components of a WPub, and new clients could easily load the JSON object to efficiently access metadata.

Scenario 4: Binary WPub

If a WPub URL resolves to a binary file, the four clients may be expected to act like this:

  • An existing search engine bot (path 4A in the image below) will either be aware of the MIME type for the binary file and attempt to handle it in an inappropriate manner, or it will need to be modified to handle a new MIME type.
  • A WPub aware search engine bot (path 4B in the image below) will either be confused by the overloading of an existing MIME type, or handle a new MIME type correctly.
  • A reader using an existing user agent (path 4C in the image below) will probably be asked to download the binary.
  • A reader using a WPub aware user agent (path 4D in the image below) will either be confused by the overloading of an existing MIME type, or handle a new MIME type correctly.

Scenario 4 could lead to confusion for both old and new clients unless a new MIME type is registered.

wpub-canonicalresourcebinary

My Conclusion

Given the various pros and cons, Scenario 3 (the URL to a WPub resolves to an HTML TOC document, which in turn links to a JSON manifest) seems to most easily interoperate with the existing Web and provides a clean upgrade path to WPub-aware clients.

Acknowledgements

Thanks to @BigBlueHat @TzviyaSiegman @iherman @tcole3 @bdugas @GarthConboy and others from the Publishing WG for discussions which led to this issue and its discussion.

@mattgarrish
Copy link
Member

Do we need to require toc in this, or do we only need to require that the link resolve to a resource that is considered the primary entry point for the publication, and that must include a link to the manifest?

It sounds like we're mandating the presence of a table of contents, when not every publication will need one. Using the table of contents, or ensuring the landing document has a clear link to one, is perhaps only best practice in the case of multi-document publications?

@dauwhe
Copy link
Contributor

dauwhe commented Nov 8, 2017

Yeah, this is one of my fundamental questions. If I point my browser at the URL of a WP, what happens?

> GET /MobyDick/ HTTP/1.1
> Host: www.example.com
> User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106 Safari/537.36 OPR/38.0.2220.41
> Accept: */*

Strongly agree that the URL must resolve to an HTML document, and that this HTML document must contain a link to the manifest (or contain the manifest; I think that's still an open question).

Like @mattgarrish, I'm a little less certain about requiring this document to contain a TOC. Perhaps if it doesn't contain a TOC, and it is a multiple-document publication, it must link to a TOC? Sadly rel=contents isn't super-standard, but rel=index might be appropriate? Or is it enough that the manifest include a link to a TOC?

Do we require that this be the first document in the default reading order? I think yes. In retrospect, the whole "begin reading" thing in EPUB felt like an evasion. Saying that [1] front matter is so important that it must come at the front of the book, and [2] that it is so unimportant that we don't want the reader to actually see it, is just avoiding responsibility for designing your own content appropriately.

@iherman
Copy link
Member

iherman commented Nov 8, 2017

To repeat some arguments, just for the records, against option 2:

  • If the metadata used the HTML elements like <meta>, <link>, or even <title>, that would be, in my view, in violation to the HTML spec. That spec clearly says "The meta element can represent document-level metadata". I.e., using that element for the Publication, instead of the for the containing document would be misusing those for a different purpose. We should not do that.
  • It would be possible, to include the whole metadata in a <script> with the type set for javascript but, in practice, this may be a stretch. For example, no HTML authoring tool that I know would make that easy to do. Also, the JSON metadata should then me made in a way (using JSON-LD tricks) that the subject for the metadata in question is the Publication and the the enclosing document itself.

My conclusion on option 2 that it should not be adopted.

@dauwhe
Copy link
Contributor

dauwhe commented Nov 8, 2017

[edited] Ivan, are you talking about option 2? That seems to be the one where some publication metadata is expressed in HTML.

@iherman
Copy link
Member

iherman commented Nov 8, 2017

Oh bugger, I wanted to say Option 2! I will edit the comment...

(Lesson: never say anything serious before breakfast!)

@rdeltour
Copy link
Member

rdeltour commented Nov 8, 2017

If the metadata used the HTML elements like , , or even <title>, that would be, in my view, in violation to the HTML spec. That spec clearly says "The meta element can represent document-level metadata". I.e., using that element for the Publication, instead of the for the containing document would be misusing those for a different purpose. We should not do that.

There's a precedent however in web apps, where HTML’s meta elements are very frequently used to represent app-level metadata.
So I don't think the "violation of the HTML spec" is very significant. Said spec can be updated to pave the web apps cowpaths.

@iherman
Copy link
Member

iherman commented Nov 8, 2017

Per the TOC (or not) in option 3: indeed, maybe the TOC is not the ideal solution, although, in many cases it looks like the natural fit.

The current draft lists three required information items: title, list of web publication resources, and a default reading order. I would not want the title appearing in the landing page (at least not being the information item) for the same reasons as in my comment. Not sure whether the list of resources and the reading order should appear there.

To say 'the landing page is the first document in the reading order may not be good either: the content may not be in HTML. We had this discussion on the TPAC F2F of publications consisting of drawings or audio files only. For those cases, the TOC is helpful for non WPUB aware browsers.

I would say: the landing page contains the TOC (through a <nav> with some predefined attributes of some sort) if it exists for the publication. If it does not, then the landing page is, essentially, empty as far as the Publication is concerned, and there is no TOC in the publication. I would not expect this to be a very frequent situation.

(The landing page may still include other information that publisher wishes to provide. This is not for us to define.)

@iherman
Copy link
Member

iherman commented Nov 8, 2017

There's a precedent however in web apps, where HTML’s meta elements are very frequently used to represent app-level metadata.
So I don't think the "violation of the HTML spec" applies is very significant. Said spec can be updated to pave the web apps cowpaths.

Maybe there are such apps in the wild, and I do not think what they do is correct. Maybe, but only maybe, the Web Platform WG will, at some point in the distant future, pave that particular cowpath (I cannot judge how wide this usage is), but I do not think this WG should adopt an approach which is in violation with the specifications as of today.

@iherman
Copy link
Member

iherman commented Nov 8, 2017

(Just to make it clear: my personal vote goes firmly for option 3...)

@mattgarrish
Copy link
Member

I do not think this WG should adopt an approach which is in violation with the specifications as of today

I'm kind of torn on this, as we're already doing similar in allowing certain information to be harvested from the content (title, language, etc.). It's (arguably) useful information for legacy browsers that don't support the publication.

But I agree to the extent that we should not be confusing people that it's either/or. Maintaining metadata in multiple places is always a disaster in waiting. WAM gives precedence to manifest metadata and we should do the same.

I prefer Ivan's approach, but can live with what we've done. At a minimum, we can never stop people from expressing whatever metadata they want wherever they want.

@WSchindler
Copy link

Considering the 4 scenarios I think option 3 seems the best solution to accommodate all types of clients. From a user perspective, we would always need some kind of landing page for the WP. Typically, it should contain a TOC (via the <nav> element) to enable the user to visually access the different parts of the WP - i.e. those that are part of the reading order - either sequentially or in a random order. If it doesn't contain a TOC and has more than one constituent documents, we would still need some starting point.
Would you agree that any resource listed in the reading order should have a link to the one and only manifest of a WP which contains all the information for a WP-aware user agent to properly consume and render a WP?

@dauwhe
Copy link
Contributor

dauwhe commented Nov 8, 2017

To say 'the landing page is the first document in the reading order may not be good either: the content may not be in HTML. We had this discussion on the TPAC F2F of publications consisting of drawings or audio files only. For those cases, the TOC is helpful for non WPUB aware browsers.

Ah, this helps clarify that I have a concern here. If this HTML document is not the first document in the default reading order, then a WP-aware user agent will present an entirely different resource to the user than a non-WP-aware user agent. Quick example: if my WP has this (very common) order:

cover.html (embedded in html)
title-page.html
toc.html
chapter-1.html
chapter-2.html

I might set things up so that the URL for the WP resolves to cover.html and that contains the link to the manifest. But what if the manifest says that the first item of the default reading order is chapter-1.html? Then a non-aware UA would present the cover, and a WP-aware UA would present ch1? I find that very confusing.

I agree that having the URL resolve to a TOC is ideal. You can always hide it visually.

I would also argue that something in the WP has to be HTML (or something that supports links). I don't see how you make a JSON manifest + audio/image files only work in option 3, because otherwise a non-aware UA cannot access the content through the WP URL.

@mattgarrish
Copy link
Member

the landing page contains the TOC

But why enforce this? What does it accomplish, really?

What if I want my landing page to be the cover page and clicking on the cover page takes you to the table of contents?

@mattgarrish
Copy link
Member

a WP-aware UA would present ch1

Why would the UA change the document you've navigated to? If the reading order forces you away from the resource you want to view, that's a very bad thing.

@rdeltour
Copy link
Member

rdeltour commented Nov 8, 2017

Maybe there are such apps in the wild, and I do not think what they do is correct. (...) I do not think this WG should adopt an approach which is in violation with the specifications as of today.

I'm only saying I don't think that using meta would necessarily be a violation. I mean, application-name is part of the standard metadata names defined in the HTML spec, and can hardly be taken for document-level metadata.
It's a blurry line, you can always argue that a piece of metadata applies to the document when it gives info on the larger object (app, publication) this document belongs to.

That said, my vote goes to option 3. I'm just thinking that it doesn't necessarily precludes mixing it with option 2 to widen UA support, like web apps do: have an authoritative external manifest, but also fallback to HTML metadata when relevant.

@dauwhe
Copy link
Contributor

dauwhe commented Nov 8, 2017

Why would the UA change the document you've navigated to? If the reading order forces you away from the resource you want to view, that's a very bad thing.

I suppose that's what I'm wondering, but perhaps this is more a lifecycle & implementation question. A WP-aware UA opens the HTML resource at the WP URL, and processes the manifest. What is it obligated to do then? I suppose it could present a "begin reading" button that would cause a navigation. Maybe it just creates a navigation/personalization overlay (as Readium Cloud Viewer does).

Maybe my point is that it would be silly to have the URL point to something other than what you want the reader to see first. Even RFC 6919 doesn't have formal language for that :)

@mattgarrish
Copy link
Member

A WP-aware UA opens the HTML resource at the WP URL, and processes the manifest. What is it obligated to do then?

Right, this is where we (probably) won't find consensus. I'm of the mind that the user agent shouldn't change the resource you're on when it initiates the reading experience. But if it does, I also don't have any problem with that as user's will decide the fate of such a feature. I just don't think such a change should happen without prompting, in which case both the vanilla and enhanced experiences are ultimately the same, there's just an extra opt-in from the user that changes one scenario.

I think it would also help to make clear that we're trying to accommodate many reading scenarios somewhere (browser, app (in browser), polyfilled publication, publication as app), but let's not discuss that in this thread.

@iherman
Copy link
Member

iherman commented Nov 8, 2017

the landing page contains the TOC

But why enforce this? What does it accomplish, really?

What I think we should avoid is that the same information item could be expressed in different places. I am afraid of the possible confusion that would result from that (what if they are defined in different places? Which one has priority? What if the information are conflicting? etc.)

To be more specific: we have a set of information items that must be serialized. My mental model is:

  1. One, and only one, of the information item appears in the landing page, and the landing page only
  2. All other information items are serialized in JSON (as we decided) and are, therefore, part of the manifest (waybill?) file that is linked from the landing page.

In other words, there is no ambiguity on where a specific information item is defined.

If this model works that we must choose which information item is covered by (1). The TOC seems to be the best choice..

@mattgarrish
Copy link
Member

One, and only one, of the information item appears in the landing page, and the landing page only

I agree and disagree. There is one piece of information that must appear on the landing page: the link to the manifest. It can appear elsewhere, but whether it has to we keep changing course on (I preferred our original recommendation, so don't object to the subsequent PR that's appeared.)

My fears may be overblown, granted, but I think when we stray into mandating content we're going down a potentially restrictive and unwanted path. I'm not going to lie down in the road over this, though.

@dauwhe
Copy link
Contributor

dauwhe commented Nov 8, 2017

So do we have rough consensus that:

  1. The URL of a WP must resolve to a resource (the "landing page") which
  2. must contains a link to the manifest and
  3. should contain a <nav>?

@bduga
Copy link

bduga commented Nov 8, 2017

I am not sure we need to express metadata level information as meta elements, as I agree with Ivan that seem like appropriating file level metadata. But there is also the option of simply encoding all the metadata in HTML, by (ab)using classes (or another mechanism). That gives us full markup in the metadata, as well as CSS styling. And it means the content can be suppressed for display via CSS. So the landing page could have the text of the title, list of authors, etc that are also displayed directly to the user without duplication (there isn't an author in the metadata and an author on the landing page).

[edit] Though, that said, I could live with either 2 or 3.

@prototypo
Copy link
Contributor Author

Oops, I meant to acknowledge @bduga - apologies!

@azaroth42
Copy link

From the cheap seats at the back of #W3CTPAC ... (over several sessions) ...

I agree with @mattgarrish. In my experience with the development of the IIIF manifest spec, the link to the JSON is of course the most important. Other useful information is the position to enter the document's structure, if not the beginning, allowing multiple entry points with different behavior.

@GarthConboy
Copy link
Contributor

GarthConboy commented Nov 10, 2017

I'm (also) okay with options #2 and #3. Per Dave's "should contain a <nav>?" above, if a "<nav>" is a MUST somewhere, it seems it's better to be a MUST in a particular place such that we don't introduce opportunities for it two be in multiple places and having the various instances fight.

@mattgarrish
Copy link
Member

What we have in the infoset is that you can specify the toc as a property or you can reference the html nav element that contains it. I think that's optimal, as it doesn't require the toc to be in any particular place, and doesn't necessitate duplication. It also allows different tocs for content and rs if you so desire.

@iherman
Copy link
Member

iherman commented Nov 10, 2017

At this point I believe we indeed have a consensus as described in Dave's comment, and we do not fully agree what the landing page would really contain. I would propose, just for the sake of moving ahead, to put that into the document (in the form of a PR) and open a separate issue on, specifically, "MUST the TOC appear on the landing page" which we will have to cover at some point.

@TzviyaSiegman
Copy link
Contributor

These are problems that the publisher has to solve if they want to lock up their content, and that there are probably better solutions for than the address.

I completely agree with @mattgarrish. Further, we are not writing a spec for purchasing publications. As long as we don't hinder the sales and distribution process, which we are not (perhaps even helping it), I think we are fin.

@HadrienGardeur
Copy link

But is it all that big a problem? If publishers set the primary entry point, which is what this allows, it enables them to put in place the necessary prompts for obtaining the content for those who don't have access.

If we require to return a page that contains:

  • content from the publication (first resource in reading order and/or TOC)
  • and a link to the manifest

... then sure, this could be a problem. My point is that this won't always be the case, and as you've pointed out that there will be a certain number of indirect steps before we actually get to the content itself.

@mattgarrish
Copy link
Member

But we don't require anything to be returned. All we say is that at that at the address URL is an html page of some kind with a link element pointing to the manifest. There's no guarantee it's world readable.

I agree that we shouldn't try to over-prescribe what content has to be at that location. Advice about having navigation aids is still sound, as I could be following a link in to the publication on a user agent that doesn't support web publications, but haven't we moved away from saying any certain content should/must be present?

@HadrienGardeur
Copy link

But we don't require anything to be returned. All we say is that at that at the address URL is an html page of some kind with a link element pointing to the manifest. There's no guarantee it's world readable.

Are we sure that the manifest itself won't be behind a redirect or 4xx code of its own?
Even if it isn't, do we want to link to a manifest if all of its resources return a 401?

@BigBlueHat
Copy link
Member

We don't need to re-specify all the various HTTP status code use cases. What we MUST (hehe) specify is what comes back when the URL for a Web Publication is returned.

If we want this to work in the browsers of today--which don't know what a Web Publication is or what it should do with it--then the URL for a Web Publication MUST be an HTML document.

If we can agree to that much, then we can file issues for all the other things that have been mentioned in here--but let's do that after we make some progress.

So...

  • A Web Publication MUST be identifiable by a URL unique to the Web Publication (not one that also points to a constituent part)
  • A request of that URL MUST return an HTML document (because if we don't do that, we're not building within the extensible Web, but for non-Web things like ebook readers only)
  • ...and here's where we all start to disagree and where more things need to be determined. 😸

@llemeurfr
Copy link
Contributor

llemeurfr commented Nov 14, 2017

I fear we are going into circles here, and I don't think we'll come to something useful as they are multiple ways to create a paywall (most are client side ways to hide content, but some are based on server side redirects, or so it seems).

How about:
A WP has a Start page (ok for start), with a link to the JSON manifest. It is listed in WP resources, but may or may not be in "reading order". There may be some authorization needed (paywall, authentication) to see its (full) content. If such authorization has not been acquired, it would be logical that the access to the WP manifest is also unautorized (403); and even if it is authorized for any reason, access to the resources of the publication should still be unauthorized (if not, why building a protected access?). This is up to the web server to enforce such protection.

@iherman
Copy link
Member

iherman commented Nov 14, 2017

@HadrienGardeur

I do not see anything in this scheme that would contradict referring to https://abonnes.lemonde.fr/ as the address for that specific WP, and for which the lemonde.fr server will return (eventually, possibly after the login process) that landing page (for a lack of a better word for now).

@iherman, but this means that following the WPub URL won't directly open the publication itself and/or provide a link to the manifest.

No, it doesn't mean that. The URL is the https://abonnes.lemonde.fr/ and, after some turnaround the server will reply on that request withthe content of that WP's landing/starting/whatever page.

(B.t.w., the example shows that it is perfectly possible to set up a paywall around a WP. The fact that, maybe, some of today's setup does not function that way is not relevant.)

If we require to return a page that contains:

  • content from the publication (first resource in reading order and/or TOC)
  • and a link to the manifest

... then sure, this could be a problem.

First of all, we agreed that it is not required to put content from publication. Only the link to the manifest. And I am not convinced it is a problem. (I am not talking about a never-on-the-Web-PWP, we have discussed that before and it is for a later discussion.)

Hadrien, let me turn things around (yes, I am sneaky). What is your answer to the question of the original issue, and the problem statement of @prototypo?

@iherman
Copy link
Member

iherman commented Nov 14, 2017

@llemeurfr, I guess

if it is authorized for any reason, access to the resources of the publication should be un-authorized

should say

if it is unauthorized for any reason, access to the resources of the publication should be un-authorized

With that change I agree with what you say but, as @BigBlueHat said:

We don't need to re-specify all the various HTTP status code use cases. What we MUST (hehe) specify is what comes back when the URL for a Web Publication is returned.

@mattgarrish
Copy link
Member

With that change I agree with what you say but, as @BigBlueHat said:

We don't need to re-specify all the various HTTP status code use cases. What we MUST (hehe) specify is what comes back when the URL for a Web Publication is returned.

Yes, this is the same rabbit hole we went down looking at the lifecycle. The more we try to solve specific scenarios the more will pop up, or more exceptions we'll think of.

@TzviyaSiegman
Copy link
Contributor

I think we need to remember what we are documenting here. We are not writing a spec for a bookstore. We are writing a spec for Web Publications. As long as we do not disrupt the supply chain, I don't think discussions of pay walls etc are relevant, and I think we can close this issue with @mattgarrish #101.

@mattgarrish
Copy link
Member

I'm about to push a change that swaps landing page for entry page, but thoughts welcome on that prose.

@BigBlueHat
Copy link
Member

👍 to pushing ahead with #101 and making that the foundation going forward.

Once #101 is merged, I'd consider this particular issue closed and new ones (about All The Things) can be filed separately.

Great work as ever @mattgarrish!

@HadrienGardeur
Copy link

@TzviyaSiegman that's not what we've been discussing, these questions are all relevant to what can and cannot be present in such a page.

Hadrien, let me turn things around (yes, I am sneaky). What is your answer to the question of the original issue, and the problem statement of @prototypo?

At this point, I feel that the only thing we can say is that the WPUB URL points to an HTML document.

This document:

  • MAY be within the boundaries of the publication
  • MAY also be in the reading order
  • MAY as well contain a link to the manifest

But these are all MAY, I don't think any of these items can be a MUST.

@BigBlueHat
Copy link
Member

@HadrienGardeur let's file those as separate issues and discuss them there. Knowing what we get back from the URL that identifies a Web Publication is a necessary foundational step to any of those.

@HadrienGardeur
Copy link

@BigBlueHat identification and discovery are two different concepts, mashing them together is tricky and can lead to various problems.

IMO the only thing that can identify that a given resource is within a WP is the manifest itself.

@BigBlueHat
Copy link
Member

IMO the only thing that can identify that a given resource is within a WP is the manifest itself.

@HadrienGardeur that's not what's being discussed here. We're discussing what one gets back when requesting the identifying URL of the Web Publication. That has to be HTML if we're going to teach browsers new tricks.

@HadrienGardeur
Copy link

@BigBlueHat and this is the one thing that we agree on.

@BigBlueHat
Copy link
Member

@HadrienGardeur excellent! 😁 That's all this issue was meant to be about afaict. Everything else deserves it's own discussion space.

Let's ship #101 and go file those other issues. 😄

@HadrienGardeur
Copy link

@BigBlueHat #101 goes beyond that and I've made a comment stating exactly why.

mattgarrish added a commit that referenced this issue Nov 14, 2017
update landing page to match issue discussion in #94
@mattgarrish
Copy link
Member

Let's keep sight of the fact that these are not final decisions for a specification that is about to ship. Accepting this only puts it on the radar for comments in the FPWD, which is what we need to get out shortly. That's why I didn't use the "c" word and only cited numbers. If this is the wrong decision, I have faith in the crowd.

If you open new issue(s) stating why you believe these requirements are wrong, I'll also add a pointer to the section, Hadrien.

@GarthConboy
Copy link
Contributor

FWIW, I would amend @HadrienGardeur list above

MAY be within the boundaries of the publication
MAY also be in the reading order
MAY as well contain a link to the manifest

with only a three character change... the final MAY should be a MUST, as we must be able to find the Publication (resource list, et al) from said Publication's landing page.

@mattgarrish
Copy link
Member

It strikes me as wrong at some fundamental level that a page that doesn't belong to any publication can initiate any publication, which is what anything less than a must seems to suggest. It just feels like there's a trust problem inherent in such a scenario. Am I getting spoofed? Will that page appear in the publication until I navigate somewhere else?

@HadrienGardeur
Copy link

How is that any different from the way RSS/Atom or PWA work?

I could look at a presentation page for a Web App that won't be in the Web App itself that will trigger the Chrome install banner for example.
There are other requirements (@dauwhe has listed some of them, but each browser has different expectations), but this is not fundamentally any different.

@mattgarrish
Copy link
Member

mattgarrish commented Nov 15, 2017

Installing an app isn't the same thing as instantiating the publication, though. There's a scope on the web app, so any resource outside that scope couldn't pretend to belong to the app when the app is launched.

But what I find confusing in where I see this going is that sometimes a web page that links to a manifest is part of the publication and when you initiate the publication you initiate that page. But then some other times you have a link to initiate the publication but the page you're on is not part of the publication. And you seemingly can't rely on the resource list to check this, since it has no requirement for completeness of resources or even top-level resources.

So how does a user agent know when to initiate the current page, or when it's an indirection to something else? That's the hole that this opens in my mind, perhaps more than the security, and that hole exists to some extent independent of this issue.

But once you have that hole, and because a page on one domain can instantiate a web publication on another, when does the user agent doubt the legitimacy of initiating the page?

@mattgarrish
Copy link
Member

We need to stop overusing this issue, so not cutting off discussion by closing, but please start a new one so we can focus the discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests