Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve service worker script caching and update #1283

Merged
merged 4 commits into from
Mar 13, 2018
Merged

Conversation

jungkees
Copy link
Collaborator

@jungkees jungkees commented Mar 5, 2018

This change includes/considers the following:

This change basically makes it check out if the main script resource is
identical to the existing resource. If so, it returns; otherwise, it
creates a new service worker and evalute it to check out if any imported
scripts are changed. It continues with Install only when any of the
resources has been changed. With the change, importScripts() returns
resources from the cache for any duplicated requests including the
request for the main script.

Fixes #1041, #1212, #1023.


Preview | Diff

This change includes/considers the following:
 - Include imported scripts to byte-check (for classic scripts).
 - Compare responses' body instead of source text as per whatwg/html#3316.
 - Handle duplicate importScripts() as per #1041.
 - Replace *imported scripts updated flag* referenced in importScripts()
   by using service worker's state item.
 - Have Update's perform the fetch steps cover module scripts.
 - Avoid dobule-download of imported scripts pointed out in #1023 (comment).

This change basically makes it check out if the main script resource is
identical to the existing resource. If so, it returns; otherwise, it
creates a new service worker and evalute it to check out if any imported
scripts are changed. It continues with Install only when any of the
resources has been changed. With the change, importScripts() returns
resources from the cache for any duplicated requests including the
request for the main script.

Fixes #1041, #1212, #1023.
@jungkees
Copy link
Collaborator Author

jungkees commented Mar 5, 2018

@jakearchibald, this obsoletes #1023 and covers #1041 and #1212. Please take a look.

@domenic, please review if the perform the fetch steps correctly covers module scripts, and also if comparing reponse's body would be correct.

/cc @matto @wanderview @aliams @cdumez

@domenic
Copy link
Contributor

domenic commented Mar 6, 2018

The comparison of response body looks perfect. That is much simpler than the work I was planning in whatwg/html#3316, and I think in a good way. And I believe since you updated the "perform the fetch" in both places (importScripts() and fetch a module worker script graph), it should all work great for module scripts too.

Thanks for doing this despite my continual slowness in helping support. I think your version is better than mine anyway, so it all worked out :)

@jungkees
Copy link
Collaborator Author

jungkees commented Mar 6, 2018

@domenic, thanks for reviewing and having helped out to solve this. I'll work on the rest with other SW reviewers.

docs/index.bs Outdated
1. If |response|'s <a>unsafe response</a>'s [=response/type=] is not "<code>error</code>", and |response|'s [=response/status=] is an <a>ok status</a>, then:
1. Let |newestWorker| be the result of running [=Get Newest Worker=] with |registration|.
1. Let |resource| be null.
1. If |newestWorker| is not null, set |resource| to |newestWorker|'s [=service worker/script resource map=][|request|'s [=request/url=]], or null if it does not [=map/exist=].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused by this, it makes it look like importScripts() in one service worker (serviceWorker) is affecting the state of another service worker (newestWorker), which I thought #1041 decided was wrong.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a case where serviceWorker doesn't have any stored response and hasn't gone to the installed state. (Cases other than that return early in the step 2 and 3.)

My intention here is to compare the response retrieved from the network (or HTTP cache depending on the cache mode) with the registration's newest worker's stored response, if any. I think we'd want to make it continue with Install only when importScripts() for serviceWorker has byte-differnce against newestWorker's stored resources. The service worker's include updated resources flag is used to mark this condition that is later checked in Update.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm understanding more.

I thought when we discussed this a while back (#839 (comment)), we decided the "update" service worker shouldn't run at all if the main script and imported scripts of the newest installed worker haven't changed. I.e., the update check should first fetch the main script and importScripts of the newest worker, if those haven't changed you halt the update. If there is a change you launch the new service worker. It'd be wasteful to spawn a worker just to do the importScripts then discover there was no change, and then terminate the worker before installation. But maybe we should talk about this one of the GitHub issues.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm understanding more.

I thought when we discussed this a while back (#839 (comment)), we decided the "update" service worker shouldn't run at all if the main script and imported scripts of the newest installed worker haven't changed. I.e., the update check should first fetch the main script and importScripts of the newest worker, if those haven't changed you halt the update. If there is a change you launch the new service worker. It'd be wasteful to spawn a worker just to do the importScripts then discover there was no change, and then terminate the worker before installation. But maybe we should talk about this one of the GitHub issues.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see.. We had already discussed this a while back ;). But I was worried about the comment in https://github.com/w3c/ServiceWorker/pull/1023/files#r92201798, too. And what should we do if any of the imported scripts were updated, later start a worker with them in the cache, and they contain errors. We couldn't avoid starting a work in this case anyway.

But I'm open to this option or any other smarter ways to improve performance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to spec it in a way that doesn't do a double download...

Currently in the common case a navigation most likely will not cause a separate service worker to run for the update check. This spec change will change it so that most likely one will, so the browser will often start two service workers for a navigation. That would really impact memory/performance for us.

I wouldn't worry too much about the uncommon error case causing an extra service worker to be started.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair point. I think we can also avoid a double download by populating the cache before the first execution of the main script.

docs/index.bs Outdated
* |resource| is null.
* |resource|'s [=response/body=] is not byte-for-byte identical with |response|'s [=response/body=].
1. [=map/Set=] [=service worker/script resource map=][|request|'s [=request/url=]] to |response|.
1. Set |serviceWorker|'s [=classic scripts imported flag=].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, I'm a bit confused what about the meaning of include updated resources flag and classic scripts imported flag. Could you help me understand what they are doing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The classic scripts imported flag is used to indicate the main script has some imported scripts so Update must not return early before actually checking the imported scripts even if the main scripts are byte identical. I named it classic .. because module scripts don't support importScripts() in the first place.

The job's potentially include updated resources flag is used to check if we can return early even without evaluating the script. That is, the script received from the network is byte identical and doesn't import any script. For module scripts, it checks all the responses gotten during fetching the module worker script graph.

The service worker's include updated resources flag is checked when Update couldn't determined from the earlier steps whether it could stop early.

docs/index.bs Outdated
* |newestWorker| is null.
* |resource| is null.
* |resource|'s [=response/body=] is not byte-for-byte identical with |response|'s [=response/body=].
1. [=map/Set=] [=service worker/script resource map=][|request|'s [=request/url=]] to |response|.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be "Set serviceWorker's script resource map", right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Fixed.

docs/index.bs Outdated
1. If |response|’s <a for="response" href="https://github.com/whatwg/fetch/issues/376">cache state</a> is not "<code>local</code>", set |registration|’s [=service worker registration/last update check time=] to the current time.
1. [=Extract a MIME type=] from the |response|'s [=unsafe response=]'s [=response/header list=]. If this MIME type (ignoring parameters) is not a [=JavaScript MIME type=], return a [=network error=].
1. If |response|'s <a>unsafe response</a>'s [=response/type=] is not "<code>error</code>", and |response|'s [=response/status=] is an <a>ok status</a>, then:
1. Let |newestWorker| be the result of running [=Get Newest Worker=] with |registration|.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't newestWorker just be serviceWorker, it's the new installing worker?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the importScripts() is called during the evaluation in Update, serviceWorker ins't in the installing state yet. newestWorker here would be one of the workers of the registration. So, we're comparing the new pre-installing worker's resources with the current newest worker's resources here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, I think I saw the Run Service Worker algorithm being called in Install and thought that was the first time we are running the worker.

docs/index.bs Outdated
1. If |response|'s <a>unsafe response</a>'s [=response/type=] is not "<code>error</code>", and |response|'s [=response/status=] is an <a>ok status</a>, then:
1. Let |newestWorker| be the result of running [=Get Newest Worker=] with |registration|.
1. Let |resource| be null.
1. If |newestWorker| is not null, set |resource| to |newestWorker|'s [=service worker/script resource map=][|request|'s [=request/url=]], or null if it does not [=map/exist=].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm understanding more.

I thought when we discussed this a while back (#839 (comment)), we decided the "update" service worker shouldn't run at all if the main script and imported scripts of the newest installed worker haven't changed. I.e., the update check should first fetch the main script and importScripts of the newest worker, if those haven't changed you halt the update. If there is a change you launch the new service worker. It'd be wasteful to spawn a worker just to do the importScripts then discover there was no change, and then terminate the worker before installation. But maybe we should talk about this one of the GitHub issues.

docs/index.bs Outdated
1. If <a>script resource map</a>[|url|] [=map/exists=], return <a>script resource map</a>[|url|].
1. Else, return a <a>network error</a>.
1. If |serviceWorker|'s [=service worker/script resource map=][|request|'s [=request/url=]] [=map/exists=], return [=service worker/script resource map=][|request|'s [=request/url=]].
1. If |serviceWorker|'s [=state=] is *installed*, *activating*, *activated*, or *redundant*, return a [=network error=].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Would it be better to say "Not parsed or installing"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Addressed.

docs/index.bs Outdated
A [=/service worker=] has an associated <dfn export id="dfn-imported-scripts-updated-flag">imported scripts updated flag</dfn>. It is initially unset.
A [=/service worker=] has an associated <dfn export id="dfn-classic-scripts-imported-flag">classic scripts imported flag</dfn>. It is initially unset.

A [=/service worker=] has an associated <dfn export id="dfn-include-updated-resources-flag">include updated resources flag</dfn>. It is initially unset.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be "includes updated resources flag"? "Include" sounds it an action, as in something will be included, whereas "includes" sounds more like a statement.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this flag as part of the new snapshot.

docs/index.bs Outdated
@@ -2147,8 +2155,12 @@ spec: webappsec-referrer-policy; urlPrefix: https://w3c.github.io/webappsec-refe

A <a>job</a> has a <dfn id="dfn-job-worker-type">worker type</dfn> ("<code>classic</code>" or "<code>module</code>").

A [=job=] has a <dfn export id="dfn-job-script-resource-map">script resource map</dfn> which is an <a>ordered map</a> where the keys are [=/URLs=] and the values are [=/responses=].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only used in the update algorithm. Unless I'm missing something, it could just be a variable within the update algorithm.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I made it a variable.

docs/index.bs Outdated
A <a>job</a> has an <dfn id="dfn-job-update-via-cache-mode">update via cache mode</dfn>, which is "`imports`", "`all`", or "`none`".

A [=job=] has a <dfn id="dfn-job-potentially-include-updated-resources-flag">potentially include updated resources flag</dfn>. It is initially unset.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this is only used in the update algorithm.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I made it a variable.

This addresses #1283 (comment).

This makes fetching the imported classic scripts done before evaluating
the main script such that it does not start the service worker if no
scripts (including imported scripts) were updated.

This replaces unnecessary flags with local variables according to
comments.
@jungkees
Copy link
Collaborator Author

jungkees commented Mar 8, 2018

@mattto, @jakearchibald, I uploaded a new snapshot. Please take a look.

@jakearchibald
Copy link
Contributor

👍Thanks for doing this.

The job stuff is making things increasingly difficult to read IMO. I wonder if parallel queues would let us achieve the same without a state object being passed around.

Copy link
Member

@mfalken mfalken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing my concern about startup.

docs/index.bs Outdated
1. If |response|'s [=response/type=] is not "`error`", and |response|'s [=response/status=] is an <a>ok status</a>, then:
1. [=map/Set=] |serviceWorker|'s [=service worker/script resource map=][|request|'s [=request/url=]] to |response|.
1. Set |serviceWorker|'s [=classic scripts imported flag=].
1. Return |response|.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To aid my understanding, what exactly happens in the error response case? Does it result in an error so the installation/update fails, or does it mean we basically treat it like the importScripts() call never happened?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An error response makes it throw a "NetworkError" DOMException in fetch a classic worker-imported script step 4. That again makes Update reject the job promise in https://github.com/w3c/ServiceWorker/pull/1283/files#diff-27b79860afe28f01aed4f1f6228367faR2467. This particular behavior hasn't been changed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. That means if you do try { importScripts('fail.js'); } and catch the exception, you can later try to do importScripts('fail.js') and it'll retry again. I think that matches our implementation too.

(Acknowledged that this hasn't changed, just confirming my understanding.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if you catch the exception, it just installs it without populating the cache. When the service worker starts up later, importScript('fail.js') will get a network error again that throws and be caught again.

@jungkees
Copy link
Collaborator Author

jungkees commented Mar 9, 2018

@jakearchibald,

The job stuff is making things increasingly difficult to read IMO. I wonder if parallel queues would let us achieve the same without a state object being passed around.

I think we should figure out to what extent we could refactor it. I tried some bits to make the related algorithm instances serialized using parallel queues here: #1229. But it wasn't that I intended to replace the whole job queue structure with the parallel queue. What concerns me about changing it now is the implementations have already implemented the job queues as in the spec. Let's take a look what we can do.

@jungkees
Copy link
Collaborator Author

jungkees commented Mar 9, 2018

BTW, I think we need some additional tests for this change. Does anyone have bandwidth to help with writing tests?

Copy link
Member

@mfalken mfalken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks this looks good me. The commit description looks outdated though, in particular: "If so, it returns; otherwise, it
creates a new service worker and evalute it to check out if any imported scripts are changed."

@jungkees
Copy link
Collaborator Author

Thanks for reviewing. I'll update the commit description when merging the final snapshot.

I also checked out the tests and found they already cover the behavior this PR addresses:

@jungkees
Copy link
Collaborator Author

I'll merge this PR. @ylafon is helping out with the travis.yml settings for later works.

@jungkees jungkees merged commit 8e25c26 into master Mar 13, 2018
@jungkees jungkees deleted the sw-script-cache branch March 13, 2018 08:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants