Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A way to specify a pattern of destination URLs to hit/skip SW #1454

Open
kinu opened this issue Jul 31, 2019 · 23 comments
Open

A way to specify a pattern of destination URLs to hit/skip SW #1454

kinu opened this issue Jul 31, 2019 · 23 comments

Comments

@kinu
Copy link
Contributor

kinu commented Jul 31, 2019

We often hear a demand for specifying certain set of destination URLs to be intercepted by SW (e.g. #1026, #1373), and this is yet another (lightweight) proposal to address a subset of these using HTTP response headers (therefore it's ephemeral & scoped only to the current navigation).

Proposal: Service-Worker-Fetch-Scope header

  • Allows website to return Service-Worker-Fetch-Scope: <url-scope> HTTP response header to main resource requests, so that they can express what URLs should be hit by their SW for the service worker client that is instantiated by the main resource. <url-scope> can be either “none” or a scope URL.
  • When this is specified, the service-workers mode of all subresource fetch requests from the service worker client that do NOT prefix-match the given scope
    is set to ”none”, i.e. this makes only the subresource requests that match the scope hit the Service Worker.

Examples:

  • Example 1: A site returns Service-Worker-Fetch-Scope: ‘/sw_cache/’ to document requests so that only the subresources whose URL start with /sw_cache/ are intercepted by the SW.
  • Example 2: A site starts to return Service-Worker-Fetch-Scope: “none” after a certain period so that it can skip SW after that. (This could be slightly more efficient than scripting this in SWs as UA can skip routing requests entirely if UA can see it in NavPreload response before/during starting a SW) [EDIT: This use case is a bit hand-wavy, maybe we should focus on the first use-case only]

Alternative can be a header that works opposite, e.g. Service-Worker-Ignore-Fetch or something, while I've heard some sentimenet that allowlisting urls could be handier.

Technically this can be a subset of Declarative Routing proposal (#1373) and can potentially be subsumed by the proposal if we decide to implement it. Motivation of this proposal is to see if we can have a smaller, incremental iteration that can be easier to experiment with.

@kinu
Copy link
Contributor Author

kinu commented Jul 31, 2019

/cc @jakearchibald @n8schloss @wanderview @mattto a potential alternative approach I've mentioned in the other thread. Want to know if this could be useful for experimenting something

@n8schloss
Copy link

This sounds really good to me! I think it will fit our use case and allow for rapid testing.

@asutherland
Copy link

Is this just about navigation preload or are there use-cases where the ServiceWorker would intentionally not respondWith() to the initial navigation (non-subresource) request?

@n8schloss
Copy link

This is about subresource requests. Even if the SW is going to just going to respondWith the subresource request, the introduction of the SW onto the critical path for all these resources introduces a bottleneck and can slow things down.

@asutherland
Copy link

My confusion is about when the web server gets an opportunity to provide the "Service-Worker-Fetch-Scope" in a way that impacts a given registration and the expected uses cases around it.

If the ServiceWorker invokes respondWith() on the navigation fetch and there's no navigation preload, then there's no network request for the server to send overrides. If there's navigation preload, that does provide an opportunity for the server to impact things. It also works if the browser is offline.

If the ServiceWorker doesn't respondWith to the navigation request, there's also an opportunity to tie the headers to the registration, but the website is now broken if the user is offline because the page didn't load. I'm wondering if this is a use-case that's envisioned or if it's really just about the navigation preload scenario used by mega-sites.

@mkruisselbrink
Copy link
Collaborator

If the ServiceWorker invokes respondWith() on the navigation fetch and there's no navigation preload, then there's no network request for the server to send overrides.

I'm not sure how that follows? The response it passes to respondWith can have this header added, either because the server added it when the resource the SW returns was cached/fetched from the server, or because the SW explicitly adds the header to the response it is returning?

@asutherland
Copy link

I'm parsing bullet 1's reference of "main resource request" to be a non-subresource/navigation request. Do I have that backwards?

@mkruisselbrink
Copy link
Collaborator

I think that is right. The non-subresource/navigation request, that is presumably intercepted by a service worker because if the page wouldn't be controlled subresources wouldn't be intercepted either. And the response to that request can set this header to influence which sub-resource requests will bypass the controlling service worker and instead go directly to the network?

@asakusuma
Copy link

I interpreted the proposal the same way @mkruisselbrink did.

In other words, either the service worker code itself can add the Service-Worker-Fetch-Scope response header, or the server can add the header when providing a page to be cached.

I like the proposal. It would be nice if there was a similar lightweight API that allowed bypassing the service worker for navigation requests too.

@kinu
Copy link
Contributor Author

kinu commented Aug 1, 2019

Yeah, the intention is that the response header (to a non-subresource/navigation request) can be set by the service worker or from the network / server (it can be also cached in the cache storage), and it affects all the subresource requests for the client afterwards.

@asutherland
Copy link

Okay, so as I understand it, the general use case we're trying to satisfy are the use cases discussed in #1195 and #1026 where a (mega)site is using the ServiceWorker for latency optimization. The sites have no interest in involving the ServiceWorker if it doesn't make things faster.

  • Sometimes this means bypassing some fetches, but not all fetches. Which is why being able to specify a prefix for which the ServiceWorker will be involved is very useful.
  • Sometimes this means disabling the ServiceWorker in its entirety until the ServiceWorker has brought its local cache up-to-date.
  • These mechanisms avoid needing to register new ServiceWorkers and skipWaiting() and claim() and all of the complexities those entail.

From an implementation perspective, there's 2 key things going on:

  1. Adds a hidden state variable to the registration that is effectively either special value "all" (current behavior), a string prefix, or a special value "none". This state variable impacts the "handle fetch" algorithm.
  2. Adds a processing step to "fetch" somewhere after HTTP fetch step 5 that looks for the header "Service-Worker-Fetch-Scope" and enqueues a task to update the new registration variable for the controlling registration of a client, or if the fetch was from a ServiceWorker, to the registration to which the ServiceWorker belongs.

The 2nd part seems fairly problematic. It adds complexity to fetch and would seem to result in a lot of potential for ordering races, plus the ability for a ServiceWorker to accidentally disable itself by caching a magic Response with the "Service-Worker-Fetch-Scope" header and serving it up. And it's not clear that it adds much more beyond exposing an API to the ServiceWorkers to manipulate the new hidden prefix.

In particular, for the case where a ServiceWorker needs to bring itself up-to-date, it seems like the ServiceWorker should know and be able to decide when it's up-to-date. And it also seems like the ServiceWorker would need to know it's not up-to-date in order to start updating itself. So why not just have the server tell the SW it's out-of-date in the navigation preload response and then the ServiceWorker uses an API to manipulate the state variable at that point, and then set it back when it's updated.

It does sound reasonable to think about exposing such a mechanism via API (so the 1st half of the implementation plus API exposure instead of response headers) if we aren't able to gain traction on static routing at TPAC.

@kinu
Copy link
Contributor Author

kinu commented Aug 5, 2019

Yes, this is trying to solve the problems that are same/similar to that of #1195 and #1126.

For implementation I was imagining that we'd add a hidden state variable to the service worker client (e.g. window or workers) but not to the registrations. The state should be determined and fixed when the client is instantiated (e.g. when a navigation commits for a frame in chrome's implementation for window cases), therefore should not have a racy situation with SW registration update. Could something like this make sense to you?

Reg: the risk of a SW accidentally caches the magic Response with the header, we can probably make a minor modification to the proposal so that on storing the response to CacheStorage the response header should be discarded / ignored? Then the header would always only affect the current response (which might come either from the server or might be modified by the service worker on-the-fly).

Reg: the possibility to have a similar mechanism via API: I agree that it'd be reasonable to also explore a potential API surface for this kind of mechanism.

@wanderview
Copy link
Member

wanderview commented Aug 5, 2019

Just want to note that while all the examples here use a path only, we also need to be sure to support full URL entries since subresources can be cross-origin. This is a difference from the existing SW scope concept.

@kinu
Copy link
Contributor Author

kinu commented Aug 7, 2019

@wanderview yep that's right, URL can be cross-origin and full URL entries should be supported. Thanks for clarifying!

@jakearchibald
Copy link
Contributor

@asutherland if I'm understanding the proposal correctly, the state would sit with the client, not the registration.

https://fetch.spec.whatwg.org/#http-fetch - before step 3, if it's a subresource request, we'd look at the request's client, which would contain data about the URLs the service worker should handle.

@jakearchibald
Copy link
Contributor

jakearchibald commented Aug 9, 2019

@mattto and I chatted about this, so here's a lump of thoughts:


I assume that URLs are resolved relative to the page?


Since the value can be a url or a 'special value' like "none", we need a way of differentiating between the two. We could probably use the same rules are module specifiers. As in, we treat it as a URL if it's one of the following:

  • A full non-relative URL. As in, it doesn't throw an error when put through new URL(url).
  • Starts with /.
  • Starts with ./.
  • Starts with ../.

Otherwise we treat it as an enum, eg "none".


What happens with:

Service-Worker-Fetch-Scope: foo

Is this discarded as an unknown value, or does it activate the feature with no matching URLs (same as 'none')?


Is this allowed?

Service-Worker-Fetch-Scope: /imgs/, /script/

…and will fire fetch events for subresources starting /imgs/ or /script/, or will it treat the whole thing as one URL /imgs/, /script/?


We probably need to think of a name that doesn't include 'scope', as it may be confused with service worker scope. But meh bikeshedding.


We need to make sure this header is processed before any headers that trigger subresource fetches, eg Link.


I assume that this would only work as a genuine header, not some <meta> equivalent.


I guess this will work for other client types like workers?


It's difficult to express "bypass the service worker for urls starting /video/". I guess you could support Ignore-Fetch and Fetch-Scope, but if both headers are used in the same request it could get pretty complicated.


In cases where we inherit the controller of the parent document (eg about:blank, srcdoc etc) does it also inherit the fetch scope rules?


Can this be feature detected in any way?


The ergonomics of adding a header to a response from the cache or network aren't totally friendly:

addEventListener('fetch', event => {
  event.respondWith((async function() {
    if (event.request.mode === 'navigate') {
      const response = await fetch(event.request);
      const responseCopy = new Response(response.body, response);
      responseCopy.headers.set('Service-Worker-Fetch-Scope', '/profile/');
      return responseCopy;
    }
    
    return fetch(event.request);
  })());
});

With the declarative routes proposal, I tied the state to the service worker. This means the same thing that specifies the routes, also specifies the handling:

addEventListener('install', event => {
  event.router.add({ url: { startsWith: '/video/' } }, 'network');
});

addEventListener('fetch', event => {
  // You will never see a request for /video/* here
});

The header-based proposal doesn't give the same guarantees, and I'm worried this will create some unexpected gotchas:

addEventListener('fetch', event => {
  event.respondWith((async function() {
    if (event.request.mode === 'navigate') {
      const response = await fetch(event.request);
      const responseCopy = new Response(response.body, response);
      responseCopy.headers.set('Service-Worker-Fetch-Scope', '/profile/');
      return responseCopy;
    }
    
    // Will you see subresource requests for /profile/* here?
  })());
});

In this example it looks like I'm forcing all controlled pages to have a fetch scope of /profile/, and I can imagine developers assuming they won't have to handle subresource requests to anywhere else. However, this isn't the case.

The controlled page may have been served by an earlier version of the service worker (due to skipWaiting), no service worker, or another registration (due to clients.claim). So there's no certainty around the rules the client is following.


If you end up with items in the cache with the Service-Worker-Fetch-Scope header, you might end up applying rules unintentionally.

@kinu
Copy link
Contributor Author

kinu commented Aug 13, 2019

Thanks, all good feedback. I felt that following might need more discussions among others:

  • feature detection
  • potential race with Service Worker state
  • potential undesirable policy application with cached headers

More comments inline:

I assume that URLs are resolved relative to the page?

Yes that's my current thinking.

Since the value can be a url or a 'special value' like "none", we need a way of differentiating between the two. We could probably use the same rules are module specifiers. As in, we treat it as a URL if it's one of the following:

  • A full non-relative URL. As in, it doesn't throw an error when put through new URL(url).
  • Starts with /.
  • Starts with ./.
  • Starts with ../.

Otherwise we treat it as an enum, eg "none".

What happens with:

Service-Worker-Fetch-Scope: foo

Is this discarded as an unknown value, or does it activate the feature with no matching URLs (same as 'none')?

I'd vote for discarding but can be cool with either.

Is this allowed?

Service-Worker-Fetch-Scope: /imgs/, /script/

…and will fire fetch events for subresources starting /imgs/ or /script/, or will it treat the whole thing as one URL /imgs/, /script/?

Didn't mention this in the initial post as I didn't have strong opinion. (Could be open to either)

We probably need to think of a name that doesn't include 'scope', as it may be confused with service worker scope. But meh bikeshedding.

We need to make sure this header is processed before any headers that trigger subresource fetches, eg Link.

I assume that this would only work as a genuine header, not some <meta> equivalent.

I guess this will work for other client types like workers?

It's difficult to express "bypass the service worker for urls starting /video/". I guess you could support Ignore-Fetch and Fetch-Scope, but if both headers are used in the same request it could get pretty complicated.

Error out and ignore if both are given?

In cases where we inherit the controller of the parent document (eg about:blank, srcdoc etc) does it also inherit the fetch scope rules?

That sounds reasonable.

Can this be feature detected in any way?

Good question. Maybe add some special request header to indicate that?

The ergonomics of adding a header to a response from the cache or network aren't totally friendly:

addEventListener('fetch', event => {
  event.respondWith((async function() {
    if (event.request.mode === 'navigate') {
      const response = await fetch(event.request);
      const responseCopy = new Response(response.body, response);
      responseCopy.headers.set('Service-Worker-Fetch-Scope', '/profile/');
      return responseCopy;
    }
    
    return fetch(event.request);
  })());
});

With the declarative routes proposal, I tied the state to the service worker. This means the same thing that specifies the routes, also specifies the handling:

addEventListener('install', event => {
  event.router.add({ url: { startsWith: '/video/' } }, 'network');
});

addEventListener('fetch', event => {
  // You will never see a request for /video/* here
});

The header-based proposal doesn't give the same guarantees, and I'm worried this will create some unexpected gotchas:

addEventListener('fetch', event => {
  event.respondWith((async function() {
    if (event.request.mode === 'navigate') {
      const response = await fetch(event.request);
      const responseCopy = new Response(response.body, response);
      responseCopy.headers.set('Service-Worker-Fetch-Scope', '/profile/');
      return responseCopy;
    }
    
    // Will you see subresource requests for /profile/* here?
  })());
});

In this example it looks like I'm forcing all controlled pages to have a fetch scope of /profile/, and I can imagine developers assuming they won't have to handle subresource requests to anywhere else. However, this isn't the case.

The controlled page may have been served by an earlier version of the service worker (due to skipWaiting), no service worker, or another registration (due to clients.claim). So there's no certainty around the rules the client is following.

Yep, you're right that there could be a race. My impression around skipWaiting has been that it inherently adds some race, and therefore it could be probably okay to have the race like this, but maybe not. I'm interested in learning how concerning/critical does this race look to you (and everyone)!

If you end up with items in the cache with the Service-Worker-Fetch-Scope header, you might end up applying rules unintentionally.

I'm wondering if making this header always stripped away when cached could introduce more or less confusion.

@jakearchibald
Copy link
Contributor

Is this discarded as an unknown value, or does it activate the feature with no matching URLs (same as 'none')?

I'd vote for discarding but can be cool with either.

Yeah, discarding seems good.

Is this allowed?

Service-Worker-Fetch-Scope: /imgs/, /script/

…and will fire fetch events for subresources starting /imgs/ or /script/, or will it treat the whole thing as one URL /imgs/, /script/?

Didn't mention this in the initial post as I didn't have strong opinion. (Could be open to either)

We should treat:

Service-Worker-Fetch-Scope: /imgs/, /script/

…and…

Service-Worker-Fetch-Scope: /imgs/
Service-Worker-Fetch-Scope: /script/

…the same (due to how headers work). So I guess it would fire fetch events for subresources starting /imgs/ or /script/.

It's difficult to express "bypass the service worker for urls starting /video/". I guess you could support Ignore-Fetch and Fetch-Scope, but if both headers are used in the same request it could get pretty complicated.

Error out and ignore if both are given?

Maybe. Depends if we'd want to support:

Service-Worker-Fetch-Scope: /profile/
Service-Worker-Bypass-Scope: /profile/imgs/

You could say longest match wins. Not sure what happens with:

Service-Worker-Fetch-Scope: /profile/
Service-Worker-Bypass-Scope: /profile/

…though.

The controlled page may have been served by an earlier version of the service worker (due to skipWaiting), no service worker, or another registration (due to clients.claim). So there's no certainty around the rules the client is following.

Yep, you're right that there could be a race. My impression around skipWaiting has been that it inherently adds some race, and therefore it could be probably okay to have the race like this, but maybe not. I'm interested in learning how concerning/critical does this race look to you (and everyone)!

It's more that you're taking over a page that may have some fetch behaviour dictated by an older service worker. This is certain to happen if you use skipWaiting. I dunno if that means it's a race or not.

Maybe it doesn't matter because:

  • If you use skipWaiting, you're now controlling a page that was loaded via an old service worker, so this isn't totally new.
  • We already split fetch control between the service worker and page headers (eg CSP).

But I think it's much easier to understand if the fetch event handler and "when to use the fetch event" have the same lifetime.

The benefit of tying it to the client is you can give different clients different instructions. Is that useful @n8schloss?

If you end up with items in the cache with the Service-Worker-Fetch-Scope header, you might end up applying rules unintentionally.

I'm wondering if making this header always stripped away when cached could introduce more or less confusion.

It's definitely a different confusion! 😀

@wanderview
Copy link
Member

I would vote for not magically stripping headers when in cache. I don't think we have anything else like that in the platform...

Can someone clarify for me which response the headers are evaluated on? Is it:

  1. The client navigation response (doc html, worker top script, etc)?
  2. The service worker script?
  3. Something else?

Sorry if that's defined somewhere in here, but the thread has got a bit long.

If the answer is (1), then I assume the service worker FetchEvent handler can manually add/remove these headers before returning a Response back to respondWith(), correct?

@jakearchibald
Copy link
Contributor

If the answer is (1),

That's the proposal, yeah.

then I assume the service worker FetchEvent handler can manually add/remove these headers before returning a Response back to respondWith(), correct?

Yeah, see the examples in #1454 (comment)

@makotoshimazu
Copy link

If we think the header state is stick to clients, does exposing the white list and the black list to client API make sense?
For example, if we can get/set the values via clients, the service worker can do something like this:

self.addEventListener('activate', e => {
  e.waitUntil(async () => {
    (await Clients.matchAll()).forEach(client => {
      // client.fetchEventAllowedScopes == ['/previously-set-allowed-scope']
      // client.fetchEventDisallowedScopes == ['/previously-set-disallowed-scope']
      await Promise.all(
          client.setFetchEventAllowedScopes(['/foo/', '/bar/']),
          client.setFetchEventDisallowedScopes(['/foo/posts/']);
      // client.fetchEventAllowedScopes == ['/foo/', '/bar/']
      // client.fetchEventDisallowedScopes == ['/foo/posts/']
    });
  });
});

while I'm feeling that it might look more like the dynamic routing and be a big hammer.

@jakearchibald
Copy link
Contributor

jakearchibald commented Sep 15, 2019

Thoughts for TPAC:

  • I'm not as against this as I was initially. CSP is an interesting parallel of fetch behaviour being tied to a client. There may be gotchas with skipWaiting and mixed behaviour, but that isn't new.
  • Does this cause a race with any fetches? Particularly, ones caused by headers.
  • If a page is limiting fetches to /foo/, what happens if the request is to /bar/ but redirects to /foo/?
  • Does this need to be dynamic? (see comment above)
  • Could spec says cache should return responses with immutable headers, but no browser does that #1456 give us better ergonomics here?
  • I'm really not sure how this works with the cache API.

@jakearchibald
Copy link
Contributor

TPAC resolution:

  • Would prefer something like fetchEvent.setSubresourceRoutes (name can be changed) where the allow/blocklist can be set.
  • Would be nice if it can be updated later, either on clients API or in document/worker.
  • Chrome can origin trial with whatever it wants, as long as it doesn't ship untrialed like that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants