Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Service worker stuck in busy state when SSE request hits the cache #2692

Closed
andyrichardson opened this issue Dec 4, 2020 · 15 comments
Closed
Assignees
Labels
Needs More Info Waiting on additional information from the community. workbox-routing

Comments

@andyrichardson
Copy link

andyrichardson commented Dec 4, 2020

Library Affected:
workbox-routing

Browser & Platform:
all browsers

Issue or Feature Request Description:
Bug report:

Consider the following service worker

import { clientsClaim, skipWaiting } from 'workbox-core';
import { registerRoute } from **'workbox-routing';
import { StaleWhileRevalidate } from 'workbox-strategies';

skipWaiting();
clientsClaim();

registerRoute(
  ({ request, url }) => request.url.origin === self.location.origin && /(?<!service-worker)\.js$/.test(url.pathname),
  new StaleWhileRevalidate({
    cacheName: 'app',
  }),
);

When a new service worker exists and installation of the newer worker begins, it would be expected that the use of skipWaiting would activate the new service worker.

This isn't the case however because the registerRoute function instantiates a fetch handler that never calls waitUntil.

The result of this is that the updated service worker will never activate (stuck in installed state) until the user's browser session is closed.

@jeffposnick
Copy link
Contributor

Hello @andyrichardson! I am not sure that your diagnosis of what's going on is accurate.

Calling respondWith(responsePromise) on a FetchEvent already extends the lifetime of the service worker until responsePromise settles. C.f. the note in step 4 of the respondWith() description in the spec:

Note: event.respondWith(r) extends the lifetime of the event by default as if event.waitUntil(r) is called.

Calling skipWaiting() does three things in parallel:

  • sets a flag indicating that if the current service worker enters the installed state, it should immediately transition to the activating state.
  • tells the service worker registration associated with the current service worker that if there's a service worker that's already waiting, it should transition to activating.
  • returns a promise that resolves immediately with undefined (not waiting for the other two steps to complete).

If you're trying to debug why a service worker is waiting in the installed state instead of activating, it might be because you're calling skipWaiting() in the incorrect service worker (the old one, not the new one that's actually waiting), or because your code is more complex than the example you included above and you're running afoul of something like #2525. Because of that issue, we've stopped wrapping skipWaiting() in an install event handler in Workbox v6, and encourage folks to just call self.skipWaiting() instead.

@jeffposnick jeffposnick self-assigned this Dec 5, 2020
@jeffposnick jeffposnick added Needs More Info Waiting on additional information from the community. workbox-routing labels Dec 5, 2020
@andyrichardson
Copy link
Author

andyrichardson commented Dec 7, 2020

Hey @jeffposnick, thanks for the response.

Calling respondWith(responsePromise) on a FetchEvent already extends the lifetime of the service worker until responsePromise settles.

Totally with you on this - even without checking the spec, I was under a similar assumption that respondWith would remove the need for waitUntil.

I've managed to consistently reproduce this issue on Chrome and Firefox and the one line change in the PR fixed the problem. I'm not totally sure why this would be the case.

I'll spend some time this week creating a reproduction so you can try it out for yourself!

Just to confirm, have you tried creating a service worker which calls registerRoute and then subsequently try and update that service worker? This has consistently caused the new service worker on to get stuck in the installed state on Chrome and Firefox on my end.

@jeffposnick
Copy link
Contributor

Before you spend too much more time debugging this, can you try a quick swap from using workbox-core's skipWaiting() to using self.skipWaiting() instead, and see if that resolves things? (You could alternatively update to Workbox v6. Following #2525, the two are equivalent.)

The previous implement of workbox-core's skipWaiting() wrapped the call to self.skipWaiting() in an install event handler, and that ended up introducing edge cases. That's really the only thing that comes to mind as a possible cause of what you're describing.

@andyrichardson
Copy link
Author

I had tried calling self.skipWaiting:

  • On js load
  • On install event
  • On message from browser (w/ message added to browser)

But no dice!

The current project I'm on uses whatever is bundled with CRA but I'll try a standalone example without CRA and see if this is still a concern.

@jeffposnick
Copy link
Contributor

That's very strange.

What happens if you go to Chrome's DevTools and manually click on the "skipWaiting" link that you should see next to the waiting service worker?

77b5eaa130252f7e

@andyrichardson
Copy link
Author

That's very strange.

What happens if you go to Chrome's DevTools and manually click on the "skipWaiting" link that you should see next to the waiting service worker?

Manually clicking skipWaiting or update does nothing (no error or visual indication of a change). I have to either hit stop or unregister to get the new service worker activated.

Kapture 2020-12-08 at 18 20 15

I got around to creating a simple reproduction (with v5 and v6 package versions) but I'm unable to reproduce the issue in a more basic environment so I'm not sure what the cause is.


If I'm not mistaken, this nesting implies that the service worker has spawned a worker thread - that might be what is keeping the service worker from unloading.

Screen Shot 2020-12-08 at 18 28 39

That spawned worker thread looks to have some relation to buffering

Screen Shot 2020-12-08 at 18 31 18

@andyrichardson
Copy link
Author

andyrichardson commented Dec 8, 2020

Jeeeeeezzz

Okay I found the cause of the issue.

registerRoute(
  ({ url }) => /^somefeatureflagservice/.test(url.host),
  new StaleWhileRevalidate({
    cacheName: `flags`,
  }),
);

In my case, I was caching a feature flagging network request for offline support.

This request was a websocket Server-sent event (SSE) request.

I guess that explains why adding waitUntil along with respondWith fixed the issue.

respondWith doesn't flag the event as closed with a streaming connection (multiple responses are possible) while waitUntil(someProm) does (once someProm resolves I'm guessing it assumes a connection close).

@andyrichardson
Copy link
Author

andyrichardson commented Dec 8, 2020

Not sure where we go from here - it would be nice to have that waitUntil addition to cover situations where websocket requests are being cached.

Edit: Both via the cache and network, SSE requests look to get infinitely stuck in the pending state when using StaleWhileRevalidate

Is workbox's cache intended to work with SSE/websocket/streaming connections?

If not, do we want to add in a warning / skip the cache altogether so we don't keep a fetch event open indefinitely?

@jeffposnick
Copy link
Contributor

jeffposnick commented Dec 8, 2020

Ah, okay, glad that there's an explanation!

Service workers don't intercept WebSocket requests, but they can be used with the Streams API, and they also explicitly do intercept SSE, as per this thread that @wanderview participated in back in the day. That thread discussed potential edge cases and had an overall consensus that using service workers with SSE was likely to be uncommon.

I'm not ruling out making a change to Workbox if needed, but I'm wondering if there's an underlying issue that would better be addressed by the browser(s). You mentioned both Chrome and Firefox exhibited that behavior—how about Safari?

I'd like to summarize the issue and then pull in some folks for their opinion. Please correct me if I'm wrong about any of this:

  • You have a service worker, using Workbox, whose fetch handler calls event.respondWith(responsePromise) to generate a response for a SSE. (I'm not clear on whether this was a mistake since you didn't realize it was a SSE, or whether you actually meant to do this.)

  • Doing so keeps that service worker alive for an extended period of time. (Presumably until the server closes the connection, though maybe there's a hard time limit imposed by the browser?)

  • If, during the time that the active service worker is being kept alive, a newly installed service worker calls self.skipWaiting(), this has no effect. The new service worker remains in the installed state indefinitely, and things appear to be "stuck."

  • But, if the active service worker had called event.waitUntil(responsePromise) inside its fetch handler, immediately prior to calling event.respondWith(responsePromise), then calling self.skipWaiting() in the newly installed service worker behaves as intended, and things are not "stuck." (This is what Fix skipWaiting not working #2693 added.)

  • The service worker spec states that "event.respondWith(r) extends the lifetime of the event by default as if event.waitUntil(r) is called.", so I find it unexpected that explicitly calling event.waitUntil() changes the behavior.

@wanderview
Copy link

wanderview commented Dec 8, 2020

I would expect the service worker to be killed after 5 minutes of being held open by respondWith() processing an SSE response. (Unless another FetchEvent or something occurred since then of course...)

Note, if you open devtools then chrome will not terminate the service worker due to this timeout.

@jeffposnick
Copy link
Contributor

jeffposnick commented Dec 8, 2020

Thanks, @wanderview. Any thoughts on those final two points, where adding in an explicit event.waitUntil() prior to the event.respondWith() leads to different behavior? Does that sound like a browser bug, or am I misinterpreting the service worker spec?

@wanderview
Copy link

Doesn't make sense from the browser perspective. I don't really know what workbox is doing around where it calls those, though. Maybe its an incorrect association and the real issue was whether devtools had been opened or not, etc.

@jeffposnick
Copy link
Contributor

Gotcha. If the browser technically shouldn't require that explicit event.waitUntil(), then I'd rather not add it in to Workbox to work around the reported behavior.

@andyrichardson, what if we added additional logging that's enabled in the workbox-routing development builds which checks for fetchEvent.request.headers.get('accept') === 'text/event-stream' and warns when workbox-router attempts to respond to such an event?

I'll wait to hear back from you, but my understanding is that you didn't actually want to have the service worker intercept that SSE, and it's not clear how a StaleWhileRevalidate strategy could properly handle it anyway—though a different workbox-strategy, like NetworkOnly, presumably would be able to handle it.

@andyrichardson
Copy link
Author

andyrichardson commented Dec 8, 2020

Hey folks! Thanks for the discussion on this!

You have a service worker, using Workbox, whose fetch handler calls event.respondWith(responsePromise) to generate a response for a SSE. (I'm not clear on whether this was a mistake since you didn't realize it was a SSE, or whether you actually meant to do this.)

Yes that's pretty much it

registerRoute(
  ({ url }) => /^somefeatureflagservice/.test(url.host),
  new StaleWhileRevalidate({
    cacheName: `flags`,
  }),
);

what if we added additional logging that's enabled in the workbox-routing development builds which checks for fetchEvent.request.headers.get('accept') === 'text/event-stream' and warns when workbox-router attempts to respond to such an event?

In my case, I wasn't even aware that SSE was present on the app - that request was being made by a third party library. I think the sketchy part here is that once we get into that SEE w/ workbox caching state - it's impossible to push a new service-worker (or app in the case of precaching) and you're at the mercy of when the user exits their browser session.

If it's possible to prevent that locked-in condition whether that be by ignoring text/event-stream requests or by calling waitUntil (which I would guess would cover all streaming protocols) I think that's the safest option - it doesn't look like SEE can ever work with workbox's cache anyhow.

If not, maybe documentation on filtering out text/event-stream be the second best option.

Side note - CRA for better or for worse only enables service workers in production builds. Obviously there's more to web dev than just CRA but it's worth noting that a lot of users could encounter this issue and not be aware until it hits production.

@andyrichardson andyrichardson changed the title skipWaiting breaks when using registerRoute Service worker stuck in busy state when SSE request hits the cache Dec 9, 2020
@tomayac
Copy link
Member

tomayac commented Apr 25, 2024

Hi there,

Workbox is moving to a new engineering team within Google. As part of this move, we're declaring a partial bug bankruptcy to allow the new team to start fresh. We realize this isn't optimal, but realistically, this is the only way we see it working. For transparency, here're the criteria we applied:

Thanks, and we hope for your understanding!
The Workbox team

@tomayac tomayac closed this as completed Apr 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs More Info Waiting on additional information from the community. workbox-routing
Projects
None yet
Development

No branches or pull requests

4 participants