Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include imported scripts to byte-check #1023

Closed
wants to merge 1 commit into from

Conversation

jungkees
Copy link
Collaborator

This changes the behavior of the service worker script resource
comparison. Before this, only the main service worker script was
compared to a new version. With this change, all the imported scripts
stored in the imported scripts map as well as the main script are
inspected against the corresponding network resources (based on the
urls.)

Note:

  • Service worker's script resource map has been renamed and moved to
    service worker's script resource's imported scritps map.
  • registration's last update check time's always updated whenever the
    response is fetched from the network (regardless it's a main script or
    an imported script.)

Fixes #839.

@jungkees
Copy link
Collaborator Author

/cc @wanderview @mattto @annevk

: Output
:: True or false, a boolean

1. If |sourceScript| and |targetScript| are not both [=classic scripts=] or not both [=module scripts=], return false.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow good catch

1. If |targetResponse|'s [=response/type=] is "<code>error</code>", or |targetResponse|'s [=response/status=] is not an [=ok status=], return false.
1. If the result of [=UTF-8 decoding=] |response|'s [=response/body=] is not a byte-for-byte match with the result of [=UTF-8 decoding=] |targetResponse|'s [=response/body=], return false.
1. If |sourceScript| is a [=module script=], then:
1. If |sourceScript|’s [=module script/module record=]'s \[[ECMAScriptCode]] is not a byte-for-byte match with |targetScript|’s [=module script/module record=]'s \[[ECMAScriptCode]], return false.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we check imported scripts too?

Copy link
Collaborator Author

@jungkees jungkees Dec 14, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

importScripts() is not supported in module scripts: https://html.spec.whatwg.org/#import-scripts-into-worker-global-scope step 1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I meant imported as in import whatever from "./foo.js"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought those urls are fetched, parsed and stored as part of the module record, but maybe I'm wrong. I noticed an environment settings object has a module map which is referenced throughout traversing the module script graph. I'll check the algorithms again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh does it flatten it somehow? I couldn't figure it out 😞

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jakearchibald, I think you're right. I can't find any step that flattens the entire imported assets. It seems each module script has module record that has the parsed source text and info to the required modules (record's [[RequestedModules]] in ECMAScript language.) Getting to ModuleEvaluation() when running a module script, it evaluates all the related module scripts recursively finding the dependencies.

And I think we should compare all the module scripts stored in an environment settings object's module map. I'd want to confirm this with @domenic before updating this part.

@domenic, SW intends to compare all the script assets including the main script and the imported scripts for both classic and module scripts. For classic scripts, SW has it's own imported scripts map that's used for this byte-by-byte check. But for module scripts, I think I misunderstood that the main script's module record has a flattened asset that include all the imported scripts in the graph. I'd like to check with you whether this is wrong, and if so, iterating an environment setting's object's module map and compare the source texts of all the module scripts in the map would be a right way for our purpose?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, [[ECMAScriptCode]] appears to be a "parse result" according to the spec, but the spec doesn't define what that is 😢 . But since it's the result of parsing it doesn't feel appropriate for a byte-for-byte check.

The way ModuleDeclarationInstantiation performs "For each ImportEntry" makes me think that there isn't any flattening going on. Maybe HTML's module map is an easier way in, since it's at least a flattened view of the imported scripts - although this may contain dynamically imported items that we're not interested in.

If I'm right that [[ECMAScriptCode]] isn't the source text, and source text is only available on classic scripts, we need to store this ourselves.

"Perform the fetch" is called for the top level script, but also each module fetch, so we could use this hook to store our own collection of source texts for byte-by-byte comparison.

@domenic does the above sound right at all? Our intent is to refetch the top-level script and all its imports, and see if any are byte-different to the original.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jungkees whoa, I replied before seeing your message. Weird how similar they are!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, there is no flattened representation of the graph. I think checking everything in the module map is more correct.

I guess [[ECMAScriptCode]] is indeed not correct :-/. However, what we can do is update HTML to move "source text" from classic scripts only to the base script type. We'd probably want to add a note saying something like "source text is only used for classic scripts in this specification, but the Service Workers specification needs to store it for reasons". I don't think you all should create a separate parallel registry :).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good. Moving the source text to the base script type will help us use the source text of scripts for both classic and module script cases. I'll spec it to checking everything in the module map with that change.

Another thing that needs a solution is the lifetime of the scripts and the map for SW. With other web workers, the lifetime of the scripts is tied to the WorkerGlobalScope object, so it doesn't store the main script and only stores the module scripts in the module map on the global. But with SW, the lifetime of the scripts outlives those globals, so it stores the main script and the imported scripts on a service worker concept rather than on the SW global. Because of this difference, I think we need to discuss around how to make the fetch a classic worker script and the fetch a module worker script graph work more correctly with SW. There's an opened issue for this: #1013.

1. Let |sourceMap| be |sourceScript|'s [=script resource/imported scripts map=].
1. [=map/For each=] |url| → |response| of |sourceMap|:
1. Let |request| be a new [=/request=] whose [=request/url=] is |url|, [=request/client=] is |job|'s [=job/client=], [=request/type=] is "<code>script</code>", [=request/destination=] is "<code>script</code>", [=request/parser metadata=] is "<code>not parser-inserted</code>", [=request/synchronous flag=] is set, and whose [=request/use-URL-credentials flag=] is set.
1. Set |request|'s [=request/cache mode=] to "<code>no-cache</code>" if any of the following are true:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to cause a double-download. I'll explain this in another comment…

@jakearchibald
Copy link
Contributor

When the imported scripts are checked, we're doing it with no-cache, but then later when we run the script it'll execute importScripts and once again request the resources no-cache.

I hadn't thought about this until reading this PR, but I think it's something we need to solve.

"Check If Service Worker Resources Are Identical" could return a boolean as it does now, but also a map of all the scripts it fetched. Then, importScripts could check this cache before going to the network. This cache can be deleted once install completes - or whenever we set the "imported scripts updated flag" flag, which discards imports that were used in the previous version, but not this one.

@jungkees
Copy link
Collaborator Author

When the imported scripts are checked, we're doing it with no-cache, but then later when we run the script it'll execute importScripts and once again request the resources no-cache.

Right. I concerned about this too, but it was still better than invoking Run Service Worker for every navigation.

"Check If Service Worker Resources Are Identical" could return a boolean as it does now, but also a map of all the scripts it fetched.

It seems a bit irony that we should introduce this kind of temporary cache to support bypassing HTTP cache. But yeah I think this is definitely a way to avoid the double-fetch of the seemingly identical resource (unless the content is dynamically changing in the server.) I'd like to also hear opinions from @wanderview and @mattto.

@jakearchibald
Copy link
Contributor

It seems a bit irony that we should introduce this kind of temporary cache to support bypassing HTTP cache

Hah, yeah. But right now we fetch the main script, do the comparison, then reuse that script we fetched. It seems weird to reuse for the main script but refetch imported scripts. Also, this double-fetch means the scripts we byte-check could be different to the ones we end up running, which seems weird.

@wanderview
Copy link
Member

When the imported scripts are checked, we're doing it with no-cache, but then later when we run the script it'll execute importScripts and once again request the resources no-cache.

How about:

  1. The byte-for-byte check could populate a "script resource map" like importScripts() does now.
  2. Then the that "script resource map" can be set as the SW's map.
  3. The importScripts() algorithm could then check the map before going to network/http cache.
  4. After script eval or install would need something to remove resources put in the map in step (1), but no longer used by the new service worker script. This is annoying, but not too with some book keeping.

Note, step (3) would also fix any issues where we go to network/http cache multiple times if script eval calls importScripts() multiple times.

@jungkees
Copy link
Collaborator Author

jungkees commented Dec 15, 2016

I think @jakearchibald and @wanderview largely suggested the same thing. Basically byte-check will populate a temporary map if the fetched resource is new. importScripts() will consult this map (to populate SW's script resource's imported scripts map*) before fetching through HTTP cache/network until the imported scripts updated flag is set. Once the flag is set, importScripts() always returns the results from the imported scripts map.

I think if we define and maintain the temporary map on a service worker registration, we can just add and update entries throughout the registration's lifetime.

I'll add this change to the PR.

(*) This is the SW's script resource map in the current spec. I suggested changing the name and the location of the concept as such through this PR for clarity and a better interface to other algorithms.

This changes the behavior of the service worker script resource
comparison. Before this, only the main service worker script was
compared to a new version. With this change, all the imported scripts
stored in the imported scripts map as well as the main script are
inspected against the corresponding network resources (based on the
urls.)

Note:
 - Service worker's script resource map has been renamed and moved to
 service worker's script resource's imported scritps map.
 - registration's last update check time's always updated whenever the
 response is fetched from the network (regardless it's a main script or
 an imported script.)

Fixes #839.
@jungkees jungkees force-pushed the include-imported-scripts-for-byte-check branch from 7cd7668 to 548e59a Compare December 15, 2016 05:38
jungkees added a commit that referenced this pull request Mar 5, 2018
This change includes/considers the following:
 - Include imported scripts to byte-check (for classic scripts).
 - Compare responses' body instead of source text as per whatwg/html#3316.
 - Handle duplicate importScripts() as per #1041.
 - Replace *imported scripts updated flag* referenced in importScripts()
   by using service worker's state item.
 - Have Update's perform the fetch steps cover module scripts.
 - Avoid dobule-download of imported scripts pointed out in #1023 (comment).

This change basically makes it check out if the main script resource is
identical to the existing resource. If so, it returns; otherwise, it
creates a new service worker and evalute it to check out if any imported
scripts are changed. It continues with Install only when any of the
resources has been changed. With the change, importScripts() returns
resources from the cache for any duplicated requests including the
request for the main script.

Fixes #1041, #1212, #1023.
jungkees added a commit that referenced this pull request Mar 13, 2018
This change includes/considers the following:

 - Include imported scripts to byte-check (for classic scripts).
 - Compare responses' body instead of source text as per whatwg/html#3316.
 - Handle duplicate importScripts() as per #1041.
 - Remove imported scripts updated flag referenced in importScripts() by
   using service worker's state item instead.
 - Have Update's perform the fetch steps cover module scripts.
 - Confirm dobule-download of imported scripts pointed out in #1023 (comment)
   never happens; importing a script again always gets the result from
   the cache.

This change basically gets the imported scripts included to the
byte-check for updates. Update continues with Install if any of the
(main or imported) resources has been changed. This change also makes
any duplicated requests (including the main script) from importScripts()
return the result from the cache.

Fixes #839, #1041, #1212, #1023.
@jungkees
Copy link
Collaborator Author

#1283 obsoleted this PR.

@jungkees jungkees closed this Mar 13, 2018
@jungkees jungkees deleted the include-imported-scripts-for-byte-check branch March 13, 2018 08:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants