Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define target top-level origin for a navigation request #5491

Merged
merged 5 commits into from
Jun 5, 2020

Conversation

shivanigithub
Copy link
Contributor

@shivanigithub shivanigithub commented Apr 27, 2020

This defines the top-level origin for a navigation request based on its target browsing context. This will then be used for cache partitioning changes in whatwg/fetch#943


/browsers.html ( diff )
/browsing-the-web.html ( diff )
/webappapis.html ( diff )
/window-object.html ( diff )
/workers.html ( diff )

@shivanigithub
Copy link
Contributor Author

@annevk , PTAL, thanks!

@domenic
Copy link
Member

domenic commented Apr 27, 2020

Since environment settings object is a subclass of environment, I think we also need to delete the field from environment settings objects, right?

@shivanigithub
Copy link
Contributor Author

Since environment settings object is a subclass of environment, I think we also need to delete the field from environment settings objects, right?

The top-level origin in the environment settings object is initialized based on the final origin after navigation commits. It is also different in definition: "...at the time this settings object was set up" vs "...at the time this environment object was set up".
Can we let it be there and behave like an overridden method since it changes its value and definition? (Not too sure of spec rules for inheritance)

@domenic
Copy link
Member

domenic commented Apr 27, 2020

I think that's pretty confusing. Maybe we could use different names for the two fields?

@shivanigithub
Copy link
Contributor Author

A different name "target top-level origin" in the environment class, sounds good. Will update.

@shivanigithub
Copy link
Contributor Author

Update the name to "target top-level origin" in the environment class

@annevk
Copy link
Member

annevk commented Apr 28, 2020

Will they be different though? It seems problematic if they can be different.

@shivanigithub
Copy link
Contributor Author

Will they be different though? It seems problematic if they can be different.

I think the only difference would be in the case of top-level navigation request the origin is created using url's origin while in all other cases (subframes and subresources), it is created using determine the origin (which also includes sandbox flags)

@annevk
Copy link
Member

annevk commented Apr 28, 2020

Thanks! So if you navigate to https://site-a.example you would use that as the top-level site for the HTTP cache, despite the response perhaps indicating that it uses an opaque origin. This means that sandboxed content can poison the HTTP cache (and other things).

This might well be reasonable, but we should document this tidbit.

If we already use that as the key for the top-level document, I'm not sure it makes sense to use a different key for embedded documents though. Or am I missing something else?

@annevk annevk added the security/privacy There are security or privacy implications label Apr 28, 2020
@shivanigithub
Copy link
Contributor Author

Thanks! So if you navigate to https://site-a.example you would use that as the top-level site for the HTTP cache, despite the response perhaps indicating that it uses an opaque origin. This means that sandboxed content can poison the HTTP cache (and other things).

That's right that the navigation request itself does not use the opaque origin as the key. But I don't see how sandboxed content can poison the HTTP cache.

This might well be reasonable, but we should document this tidbit.
Sure, I will try to document it in this change.

If we already use that as the key for the top-level document, I'm not sure it makes sense to use a different key for embedded documents though. Or am I missing something else?
I think after document commit, since it is known that the origin is opaque, we might as well use that information for the network partition to respect the privacy/security boundaries implied by opaqueness. (Additionally, Chromium's implementation does not persist opaque origin partitions in the http cache, since they won't be able to be reused anyways)

@shivanigithub
Copy link
Contributor Author

(Replying the 2nd part of the earlier response again for better formatting)

This might well be reasonable, but we should document this tidbit.

Sure, I will try to document it in this change.

If we already use that as the key for the top-level document, I'm not sure it makes sense to use a different key for embedded documents though. Or am I missing something else?

I think after document commit, since it is known that the origin is opaque, we might as well use that information for the network partition to respect the privacy/security boundaries implied by opaqueness. (Additionally, Chromium's implementation does not persist opaque origin partitions in the http cache, since they won't be able to be reused anyways)

@annevk
Copy link
Member

annevk commented Apr 29, 2020

So you navigate to https://example.com/sandboxed. This uses ("https", "example.com") as top-level site. But then once you get a response you realize the top-level site is an opaque origin instead. Does this mean you would start using new connections and such for subsequent requests?

What I mean by poisoned is that /sandboxed will be in the ("https", "example.com") cache, but that doesn't really seem avoidable.

@shivanigithub
Copy link
Contributor Author

So you navigate to https://example.com/sandboxed. This uses ("https", "example.com") as top-level site. But then once you get a response you realize the top-level site is an opaque origin instead. Does this mean you would start using new connections and such for subsequent requests?

That's the idea. Implementation wise, Chromium code currently uses the initial origin (without the opaque information that came in the response), but work is ongoing to be able to use the final origin.

What I mean by poisoned is that /sandboxed will be in the ("https", "example.com") cache, but that doesn't really seem avoidable.

Agree that it is not avoidable.

@annevk
Copy link
Member

annevk commented Apr 29, 2020

Okay, that does seem like a better model and ensures better isolation, even if it has some setup weirdness and comes at the cost of more complexity.

@hober @johnwilander @englehardt any feedback on this (see #5491 (comment) for an example)?

@shivanigithub
Copy link
Contributor Author

Added the target top-level origin comment

Copy link
Member

@annevk annevk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, modulo nits below, but I'd really like @hober @johnwilander @englehardt to weigh in before we merge this as this is a pretty significant architecture decision.

source Outdated
data-dfn-for="environment">target top-level origin</dfn></dt>
<dd><p>The <span>origin</span> of the <span
data-x="concept-environment-target-browsing-context">target browsing context</span>'s
<span>top-level browsing context</span> at the time this environment object was set up.</p></dd>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't describe the nuance of sometimes using creation URL. It also suggests browsing contexts have an origin, but that's not true and since it's a common misconception we should not add instances of non-normative descriptions suggesting that I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably also add a warning here that this is almost always the wrong field to be using. I think it's only needed for networking really. (And maybe service workers, but it seems weird for service workers to create documents that are not same-origin with the service worker.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Added that this is only to be used for navigation requests. Let me know if you thing something more needs to be added in terms of the warning.

source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
@shivanigithub
Copy link
Contributor Author

Addressed feedback, PTAL, thanks!

@shivanigithub
Copy link
Contributor Author

@annevk : PTAL at the latest patch, thanks!
@hober @johnwilander @englehardt : PTAL as per comment , thanks!

@johnwilander
Copy link

So you navigate to https://example.com/sandboxed. This uses ("https", "example.com") as top-level site. But then once you get a response you realize the top-level site is an opaque origin instead. Does this mean you would start using new connections and such for subsequent requests?

What I mean by poisoned is that /sandboxed will be in the ("https", "example.com") cache, but that doesn't really seem avoidable.

A few questions so I understand the example:

  1. Does "you navigate" imply any user activity here is it just "a navigation occurs?"
  2. Are we talking about site as in SameSite? If so, the protocol is not included afaik. It's just the registrable domain or eTLD+1. We may want to partition on protocol too but then we should have a name for it or say "site and protocol."
  3. If we're going with an example URL, I'd prefer to include a subdomain and whether it is included in the partition or not. Which is it here?
  4. I assume the top level site being sandboxed is for child windows opened from a sandboxed, opaque origin. True? Are there more cases? When it comes to top level navigations, most think of regular navigations and not window.open(). Is there use in calling out how top frames can become sandboxed?

@shivanigithub
Copy link
Contributor Author

So you navigate to https://example.com/sandboxed. This uses ("https", "example.com") as top-level site. But then once you get a response you realize the top-level site is an opaque origin instead. Does this mean you would start using new connections and such for subsequent requests?
What I mean by poisoned is that /sandboxed will be in the ("https", "example.com") cache, but that doesn't really seem avoidable.

A few questions so I understand the example:

  1. Does "you navigate" imply any user activity here is it just "a navigation occurs?"

It implies an actual navigation.

  1. Are we talking about site as in SameSite? If so, the protocol is not included afaik. It's just the registrable domain or eTLD+1. We may want to partition on protocol too but then we should have a name for it or say "site and protocol."

This change only computes the top-level origin and its consumers will derive a site from it, if needed. For instance, the http cache partitioning changes here makes use of "obtain a site" to derive scheme://eTLD+1, given the top-level origin.

  1. If we're going with an example URL, I'd prefer to include a subdomain and whether it is included in the partition or not. Which is it here?

As mentioned above, here the complete origin will be computed including any subdomain.

  1. I assume the top level site being sandboxed is for child windows opened from a sandboxed, opaque origin. True? Are there more cases? When it comes to top level navigations, most think of regular navigations and not window.open(). Is there use in calling out how top frames can become sandboxed?

Have called out in line 86125 that origin is a function of the response and thus not completely created when we are just dealing with the request. I'm ok with either going with this general statement or going into the details if there is an already existing spec section that can be linked to for this special case. @annevk wdyt?

@annevk
Copy link
Member

annevk commented May 14, 2020

We may want to partition on protocol too but then we should have a name for it or say "site and protocol."

There's been a recent set of changes to standards (HTML and URL primarily) where we decided to name that concept site (defined at https://html.spec.whatwg.org/#site).

We still have "schemelessly same site" to do a comparison that ignores the scheme, but no standalone concept that matches what is commonly understood as eTLD+1 (registrable domain or origin). I strongly suspect we won't need it going forward, but if a case presents itself we could add it.

If we're going with an example URL, I'd prefer to include a subdomain and whether it is included in the partition or not. Which is it here?

As @shivanigithub mentions this defines the origin. We can then derive a key from that for usage in partitioning efforts as desired, most likely using "obtain a site" on that top-level origin. But by having the concept itself be the origin we can also allow certain features to have a stricter boundary as might be needed for security.

Taking sandboxing into account would ultimately yield a different site, hence me raising this with you all.

I assume the top level site being sandboxed is for child windows opened from a sandboxed, opaque origin. True? Are there more cases?

A document can sandbox itself using CSP's sandbox functionality. And yes, a framed sandboxed document would typically popup sandboxed documents as well. To answer @shivanigithub's question, I think it's probably useful to note in the note that sandboxing is the reason the two top-level origins can diverge. I think it's the only reason for now, but who knows what the future brings.

@shivanigithub
Copy link
Contributor Author

Added sandboxing as the reason for the two top-level origins to diverge.

@annevk
Copy link
Member

annevk commented May 26, 2020

I pushed a commit that I think captures the intent of the design I outlined in #5558. It was a lot more work than I anticipated however.

source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
@annevk
Copy link
Member

annevk commented May 27, 2020

I tried to clean up the wording a bit to be more similar to what already existed. And also made it more explicit about the null cases.

I think what it does not handle well are the cases where you navigate an embedded browsing context to a response (rather than a request) as in that case there reportedly isn't any reserved environment to copy top-level state from. I'm not sure how accurate that is however as it would mean it would bypass service workers and such, as I understand it. I filed #5577 on that.

@annevk annevk requested a review from domenic May 27, 2020 12:02
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
@annevk
Copy link
Member

annevk commented Jun 3, 2020

I'm going to squash this and then rebase on top of #5583 as that's landed. And will then address remaining comments.

An environment's top-level origin is null during the initial navigation (before the response arrived) and otherwise represents the origin of the top-level document. It is currently implementation-defined for non-dedicated workers, but hopefully that can be sorted soon.

An environment's top-level creation URL is the URL of the top-level document. It is null for workers as they do not need the concept.

Needed for whatwg/fetch#943 and whatwg#5558.
@domenic
Copy link
Member

domenic commented Jun 3, 2020

Note to self: after this is done, I can try to tackle appropriately creating Windows before Documents and tying them together.

Copy link
Member

@domenic domenic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started reviewing but then realized some of my previous comments weren't yet addressed; sorry for jumping the gun. But I did have one question.

source Show resolved Hide resolved
@annevk annevk requested a review from domenic June 4, 2020 14:56
@domenic
Copy link
Member

domenic commented Jun 4, 2020

@annevk this doesn't build, so it's a bit hard to review; could you fix that?

@annevk
Copy link
Member

annevk commented Jun 4, 2020

Sorry, I had missed that Travis was down or some such. It's good now according to PR Preview.

Copy link
Member

@domenic domenic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for the description of top-level origin still being a bit confusing. Aside from that, everything flows really beautifully now; great stuff.

Also I'm kind of lost on why it's OK for top-level creation URL to be null for workers and worklets? Is that just because we're not planning on using it for those cases? It seems a bit weird from a theoretical perspective.

source Show resolved Hide resolved
@annevk
Copy link
Member

annevk commented Jun 4, 2020

To restate the design in #5558 from memory, to ensure it still makes sense to me, you all, and is accurate:

For documents we need top-level creation URL for a) determining whether we're in a secure context in any document and b) when we haven't created the top-level document yet (we're in the process of fetching the top-level document which allocates all kinds of network caches).

These use cases don't apply to workers. And for non-dedicated workers it would be a racy value, even more so if there's no partitioning as then the origin of that value could be different too.

Top-level origin we need in all environments (including workers) for fetching and deciding upon network caches. It's null in one scenario, which is where we use top-level creation URL instead (b above).

@domenic
Copy link
Member

domenic commented Jun 4, 2020

Thanks, that's helpful and makes sense. Maybe we should put something like that in the spec, although we could certainly wait until all the pieces fall into place to actually use the values, and maybe also until we figure out the double-keying strategy for shared/service workers.

@annevk
Copy link
Member

annevk commented Jun 5, 2020

@shivanigithub could you take a look as well?

@shivanigithub
Copy link
Contributor Author

Thanks @annevk , this looks good to me.

@domenic
Copy link
Member

domenic commented Jun 5, 2020

LGTM too, let's merge this!! (Will let you do the squashing and commit message.)

@annevk annevk merged commit 31b264a into whatwg:master Jun 5, 2020
@annevk
Copy link
Member

annevk commented Jun 5, 2020

I should add here for the record that Mozilla supports a distinct key for sandboxed origins (i.e., the switch that is possible between the initial top-level navigation and the moment where the response arrives).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
security/privacy There are security or privacy implications
Development

Successfully merging this pull request may close these issues.

4 participants