Define the "Preload Cache" #590

yoavweiss · 2017-08-30T11:20:58Z

We need to define the preload cache, as currently it is not defined and different implementations are doing observably different things.

This issue was opened on the Preload spec, but as the cache would mostly sit inside Fetch and it's not clear what action would be needed on the preload spec side, I'm "moving" it here.

Related previous discussion here was at #354 where @jakearchibald made a proposal to address this. I open this as a separate issue as I think the H2 push cache and the preload cache are inherently different things in different layers.

I think that anything we define here should, to some extent, look at how the different implementations tackle this today. The logic that they apply for this (e.g. in Blink or in WebKit) seems a bit complex, but was created for the "memory cache" case. We need to think if it can be safely simplified for the preload case. (and if memory cache itself should be standardized)

wanderview · 2017-12-18T17:22:30Z

What preload cache behavior is desired when document.open() is called? It seems the document is preserved, but "reset" in this case. The spec and firefox also create a new global. Should preload cache be cleared or preserved across document.open()?

annevk · 2017-12-18T18:17:53Z

I think we should just pick something and make sure it's tested. If we can get away with not creating a new global for document.open() (which would simplify matters quite a bit, as a document can then only ever be associated with a single global) it would probably be most natural that the cache is kept.

igrigorik · 2018-01-15T16:47:41Z

I open this as a separate issue as I think the H2 push cache and the preload cache are inherently different things in different layers.

From a developer ergonomics perspective, it would be nice if that wasn't the case. As it stands, there are subtle (potentially) observable differences in behavior between pushing and preloading a resource — one ends up in h2 push cache, other in renderer's memory cache, at least for Chrome. It would be nice to normalize all this, for everyone's sanity.. if we can.

(and if memory cache itself should be standardized)

Given that, afaik, every implementation has one. Yes? Which is what we're after here.. :)

wanderview · 2018-01-16T15:23:36Z

Given that, afaik, every implementation has one. Yes? Which is what we're after here.. :)

Gecko at least does not have a memory cache like chromium. We have things like image cache, but that is defined in the html spec.

From my perspective the thing I want to know about these caches is:

Do they sit above or below the network stack where service worker interception occurs?

SW sits above the http cache and below the image cache in the spec today.

kinu · 2018-04-11T06:44:02Z

Wanted to know if people expect that preload cache we're trying to define here also covers the prefetch case or not. From impl perspective they sit at different layers and behave slightly different (i.e. they have different lifetime), but from developers perspective the behavior may look pretty similar.

To give more context Chromium's current impl is like this: preload cache sits next to a memory cache, i.e. it's above the network stack and also above SW. Prefetch basically just puts things in HTTP cache (and never put them in memory cache or preload cache equivalent thing in Chromium). Both preload and prefetch allow subsequent fetches to succeed regardless of the Cache control up to a certain period, but that effect doesn't stick across different navigations / different fetch groups for preload, while it does for prefetch.

sleevi · 2018-04-12T21:18:17Z

@wanderview wrote:

Do they sit above or below the network stack where service worker interception occurs?
SW sits above the http cache and below the image cache in the spec today.

The H/2 PUSH cache sits above the network and above the HTTP cache, below SW.
The Preload cache sits above SW and below/beside the image cache.

yoavweiss · 2018-04-24T10:15:56Z

The H/2 PUSH cache sits above the network and above the HTTP cache, below SW.

I think it's conceptually below the HTTP cache at least as implemented today (as resources don't get committed to the HTTP cache until they are "claimed" from the H2 push cache)

yoavweiss · 2018-04-24T12:49:55Z

Looking back at this, I think this work was simplified by @yutakahirano's recent(ish) refactoring of Chromium's implementation, which split the preload cache from the memory cache.
I'll try to find the time in the next few days to document what the current Chromium logic is.

annevk · 2020-06-04T07:28:29Z

Oh, I missed that the memory cache and the preload cache got separated. It seemed kinda preferable to me if all these mechanisms would end up in the same cache. I think at least that in Firefox that's what we're planning to use for rel=preload.

cc @smaug---- @emilio @mayhemer @ddragana

youennf · 2020-06-04T10:55:17Z

I think at least that in Firefox that's what we're planning to use for rel=preload.

I believe WebKit does the same thing.

annevk · 2020-06-05T13:00:19Z

whatwg/html#154 is relevant here.

yutakahirano · 2020-06-05T14:29:06Z

The memory cache in blink is shared across documents but I think the preload cache should be per-document. I expect more predictability for the preload cache than the memory cache, and that's why we have the separate caches.

domfarolino · 2020-06-05T14:31:23Z

Yutaka makes a good point. Our memory cache is a map of weak references, while our preload cache has strong references. This seems pretty reasonable to me.

mayhemer · 2020-06-24T16:42:28Z

It seemed kinda preferable to me if all these mechanisms would end up in the same cache. I think at least that in Firefox that's what we're planning to use for rel=preload.

I think separation is better.

In Gecko, there is a concept of resource memory cache, which each resource loader implements on its own (= Memory Cache). There is also a concept of sharing e.g. stylesheets among documents, this is also part of this Memory Cache concept. This is independent of preloads.

Then, each DOM document instance keeps a strong map of preloads that consuming tags can look for and consume (and remove from the map). In reality, a preload creates an entry in the Memory Cache (in resource loaders), because preload in Gecko is nothing else than a speculative load with just a flag for higher priority. The map in the document (= Preload Cache) is there to have a central spot to look at when <link rel=preload> HTML tag is added to the tree to be source of event notifications and also simplifies few other things regarding implementation, e.g. for fetch and font types, and few other details.

annevk · 2020-06-25T11:25:14Z

Are they fully separate? If you have <link rel=stylesheet href=x>...<link rel=preload href=x as=style>, what happens? How many fetches does the service worker see?

annevk · 2020-06-26T10:10:39Z

@yutakahirano does per-document mean that a worker (whatever the type) does not have access to preloaded resources?

yutakahirano · 2020-06-29T03:19:03Z

Yes.

In Blink, there is one memory cache in a renderer process, and only the main thread can use it. That means workers don't have access to the memory cache. There is one preload cache for each environment settings object. That means a preloaded resource is only matched in the preload cache with a request initiated by the same document. Please note that the preloaded resource can be matched in the memory cache with a request initiated by a different document - but the matching criteria (e.g., cache-control: no-store) and the matching status (e.g.., whether we show the unused preload warning) are different from the preload cache case.

noamr · 2021-09-22T18:59:27Z

I would like to suggest that there "preload" is not (or shouldn't be) defined as a cache at all, or at least not cache in the way we usually use the word, an ephemeral storage with a size limit and something like an LRU mechanism.

Rather, proposing this alternative, which seems roughly equivalent to how @mayhemer describes the Gecko implementation here:

The document should hold a list of responses, 1:1 mapped to the <link rel="preload" /> elements currently in the head.
If a link is modified, the response gets re-fetched, and if it's removed the response is discarded.
When requesting a preloadable resource (anything in the as list), the corresponding spec would first look for the URL in that preloaded response list (provided that it also has the same crossorigin characteristics etc), and use that response before trying to call fetch.

It puts the preload "cache" above the HTTP and SW caches, but below the resource-specific memory caches.

This would make it easier to clarify how preloads should behave when the link element changes or gets removed, or if there are strange scenarios (e.g. an image is preloaded, then used, then removed, then loaded again) - as long a preload link is currently in the document, its response is still accessible.

yoavweiss · 2021-09-23T08:36:09Z

I agree that the "preload cache" doesn't need an eviction policy, size limits, etc.

I'm not sure I see the benefits of defining that cache above Fetch. It seems like it would significantly increase the room for mistakes on the part of the different Fetch callers.

I'm similarly not sure we need to evict preloads from the cache if their corresponding <link> is removed. I don't think this is what current implementations do. What's the use case this will enable?

Also, I think it's worthwhile to also think of generalizing the "list of available images" from HTML as a generic cache for all resource types, which is how it's implemented in at least 2 engines. While I wouldn't want to couple both efforts, it'd be good to have a holistic high-level design to how they'd both work.

noamr · 2021-09-23T08:51:07Z

I agree that the "preload cache" doesn't need an eviction policy, size limits, etc.

I'm not sure I see the benefits of defining that cache above Fetch. It seems like it would significantly increase the room for mistakes on the part of the different Fetch callers.

I meant above fetch conceptually, like above what fetch currently does.

Though I think we anyway need to add a layerthat's equivalent to the browser engines' "resource loader" concepts, which sits between the individual resources and the current fetch, and does things like report resource timing. I think that preload handling should be done at that layer, as unlike fetch it has a concept of the document and it would be easier to create 1:1 mappings with a link element there rather than create elaborate new API and storage in FETCH.

I'm similarly not sure we need to evict preloads from the cache if their corresponding <link> is removed. I don't think this is what current implementations do. What's the use case this will enable?

Allowing the applications to free memory if it preloaded a lot of resources that are no longer needed. But it's sort of a "side" case.

Also, forcing the browser to make another request. If you want to preload a resource that has no-cache headers but you still want to preload it, you can preload it with <link rel="preload" /> and then decide for yourself when to "invalidate" that preloaded version.
It's straightforward enough for creating predictable WPT tests for it, I believe.

Also, I think it's worthwhile to also think of generalizing the "list of available images" from HTML as a generic cache for all resource types, which is how it's implemented in at least 2 engines. While I wouldn't want to couple both efforts, it'd be good to have a holistic high-level design to how they'd both work.

I like the idea, but I think it's totally separated from the preload list. Preload (IMO) should be roughly equivalent to asking the document whether it currently has a <link rel="preload"> in the DOM and querying that element for its response, and this is separate from any cache mechanism.

jakearchibald · 2021-09-23T11:02:40Z

Yeah, that's the question really: Do preload fetches go via the preload cache? The answer isn't clear to me.

noamr · 2021-09-23T11:34:29Z

Yeah, that's the question really: Do preload fetches go via the preload cache? The answer isn't clear to me.

True, maybe the simplest would be to define it as reading from the same list.

The trouble with this whole approach is that it makes preloads cancel their URLs' pragma: no-cache behavior, so if a no-cache URL is accessed many times in the document it would only access the network once. I guess that's why in Chrome only the first load uses the preloaded version and later it relies on HTTP/SW caching behavior. Is that the current Firefox behavior?

yoavweiss · 2021-09-23T11:36:34Z

I'd love to get the opinions of @pmeenan, @yutakahirano, @ddragana, @achristensen07 and @cdumez on this. I suspect it'd require some implementation alignment, so it'd be great to know there's willingness for that.

pmeenan · 2021-09-23T16:19:53Z

I'm happy to help work on aligning implementations to make it easier to reason about and more predictable. The no-cache/no-store semantics and multiple accesses in particular feel like the hairiest part to get right (multiple preloads of the same URL and if they create new references or not complicating the logic somewhat). Treating it like a one-time key with a strong reference to the link element vs a key-value cache that de-dupes multiple references.

noamr · 2021-09-26T15:55:51Z

One thing that might make sense here is to make use of cache-control: immutable (already shipped in Safari/Firefox, bug open in Chrome).

Something like:

Preloaded responses without immutable go to the regular cache, equivalent to <script>fetch(link.href)</script> with some high priority
Preloaded responses with immutable are kept in memory throughout the lifetime of their link element and served from there for eligible clients.

This allows fine-grained control over the cache both from the server and from the client.

yoavweiss · 2021-09-27T05:59:31Z

I don't think we should tie preload to any specific cache headers. That doesn't seem web compatible, nor what we'd want.
At the same time, I think that what you proposed above would work for one-time use, which is the majority case.

noamr · 2021-09-27T06:23:53Z

I don't think we should tie preload to any specific cache headers. That doesn't seem web compatible, nor what we'd want.
At the same time, I think that what you proposed above would work for one-time use, which is the majority case.

In a way current preload is not compatible with existing cache headers - preloading something that has no-cache which might return a different value when actually fetched.

I believe that if we don't tie preload to cache headers at all, then we should say that it trumps cache headers - not just for the "first load" - meaning that if you preloaded something it stays in the preload list regardless of whether its content on the server might have changed, and is treated as if it was immutable (during the lifetime of the link element). I wonder how compatible it is with the intent of something like fetch Request cache mode.

This test is meant to support the discussion in whatwg/fetch#590. The test accesses a URL that returns an integer that gets incremented with each request (starting with 0). The test preloads it once, and then loads it twice. In Firefox, the test returns 0,0 In Chrome, the test returns 0,1 (preload is used for one request) In Safari, the test returns 1,2 (preload makes an unused request)

noamr · 2021-09-27T14:14:03Z

I added a test that shows the problem here.

For a response that returns an integer that increments with each request, fetch (after preload) would return different values for the second request in Firefox (0), Chrome (1) and Safari (2).

I believe that one of the goals of this effort is to make sure we can set the expected results for that test (which are currently not specified) :)

noamr · 2021-09-28T10:51:46Z

I created a table summarizing the current browser behavior, based on this test.

The test fetches different resources with different response types (Cache enabled/disabled or 404 error).
The values in the table mean the following:

None: preloaded resource is not reused.
Multiple: preloaded resource is used for subsequent requests.
Once: preloaded resource is used for one subsequent request, the next one will re-fetch the resource.

It's a bit difficult to tell which of this comes from behavior of preload vs. from behavior of the HTTP cache, but that difficulty is currently passed on to web developers - when browsers deal with cache so differently for simple use cases, it's difficult to understand how to optimize a website.

Resource	Response	Chrome		Firefox		Safari
Resource	Response	Load	Preload	Load	Preload	Load	Preload
Fetch	Cache	Multiple		None		Varying	Multiple
	No Cache	None				Once	None
	Error	None				Once	None
Fetch with `force-cache`	Cache	Multiple
	No Cache	None				Varying	None
	Error	Multiple				Varying	None
Valid image	Cache	Multiple
	No Cache	Once		Multiple
	Error	Multiple		Once		Multiple
Invalid image	Cache	Multiple		Once		Multiple
	No Cache	Once					None
	Error	Once					None
Script	Cache	Multiple		None	Once	Multiple
	No Cache	Once		None	Once	Once	Multiple
	Error	Once	None			Once	Multiple
Style	Cache	Multiple
	No Cache	Once		Multiple		Once	Multiple
	Error	Once	None	Once	None	Once	Multiple

pmeenan · 2021-09-28T13:49:34Z

I think I'm most surprised with the "Once" result for Chrome a valid image with cache enabled (and somewhat surprised that different content types behave differently given the fetch path in Chrome). I'd have expected the resource to land in the disk cache and be re-used (though I guess that's the point of this discussion, to get the behaviors to make sense).

The "multiple" results from no-cache feel like they will likely break developer expectations.

emilio · 2021-09-28T14:19:51Z

I think the "multiple" results for no-cache for images at least come from https://html.spec.whatwg.org/#the-list-of-available-images, and I think generally they are needed for compat (e.g., different CSS image loads from a style change should result in the same image).

For stylesheets in the same document it's also long-standing behavior of Gecko at least.

noamr · 2021-09-28T14:35:59Z

I think the "multiple" results for no-cache for images at least come from https://html.spec.whatwg.org/#the-list-of-available-images, and I think generally they are needed for compat (e.g., different CSS image loads from a style change should result in the same image).

For stylesheets in the same document it's also long-standing behavior of Gecko at least.

Yea, that makes sense to me. I'm surprised of the "once" behavior in Chrome though, seems like preload does something extra that counteracts that memory cache.

Note also that the tests activate many moving parts, there could always be some fragile mistake in them (those fragile mistakes unfortunately also happen in websites that try to take advantage of preload, unfortunately).

noamr · 2021-09-29T06:19:05Z

I updated the table to include comparison between load and preload, and also fix some issues with the test.

noamr · 2021-10-26T05:31:19Z

After the TPAC conversation, this is the rough definition of preload cache I propose:

A document has a store of preloaded responses
A preload link (tag or header) fetches and keeps the response in that store (regardless of cache or errors)
The next subsequent fetch that matches that response will receive that response and remove it from store (though it might still be kept in other caches, like the resource/memory cache or HTTP cache)
The load event clears the store and reports what preloads were unused

This definition totally separates preloads from the different resource caches or any type-specific behavior, though implementations are welcome to further optimize.

Loading once and before the load event reduces issues with cache headers and reloading due to errors, and focuses preload on "loading something before it's used".

The tests will ensure that there are no extra network fetches (e.g, in case of invalid images), but they will allow implementations to have less fetches.

A document keeps a list of preloaded resources, with a request and response for each. A preloaded resource is a result of <link rel=preload> When consumed (from the FETCH algorithm), the response is reused if the request matches all relevant parameters, and removed from the store. When the document is fully loaded ("load" event) the store is cleared. See whatwg/fetch#590

A document keeps a list of preloaded resources, each with relevant parameters from the request, and the response once available. Once a <link rel=preload> element starts fetching a resource, that entry is added, and once the response is fully loaded, the fetch consuming the resource receives the response. See whatwg/fetch#590

A document keeps a list of preloaded resources, each with relevant parameters from the request, and the response once available. Once a <link rel=preload> element starts fetching a resource, that entry is added, and once the response is fully loaded, the fetch consuming the resource receives the response. See whatwg/fetch#590.

Before any particular fetch steps are performed, see if there is a matching request already in the preload store and consume it. This is called from the main fetch to avoid race conditions. Depends on whatwg/html#7260, and together they fix #590. Tests: web-platform-tests/wpt#31539.

Before any particular fetch steps are performed, see if there is a matching request already in the preload store and consume it. This is called from the main fetch to avoid race conditions. Depends on whatwg/html#7260, and together they fix whatwg#590. Tests: web-platform-tests/wpt#31539.

A document keeps a list of preloaded resources, each with relevant parameters from the request, and the response once available. Once a <link rel=preload> element starts fetching a resource, that entry is added, and once the response is fully loaded, the fetch consuming the resource receives the response. See whatwg/fetch#590.

yoavweiss mentioned this issue Aug 30, 2017

Define the "Preload Cache" w3c/preload#97

Closed

jyasskin mentioned this issue Apr 4, 2018

Prefetch vs Service Workers w3c/resource-hints#78

Closed

kinu mentioned this issue Apr 11, 2018

Sketch loading and caching in the explainer based on Kinuko's description WICG/webpackage#173

Merged

jyasskin mentioned this issue Jul 18, 2018

Reuse of responses httpwg/http-core#52

Closed

annevk mentioned this issue Jun 4, 2020

Should the list of available images key on referrer policy too? whatwg/html#5541

Closed

annevk mentioned this issue Jun 9, 2020

Specify speculative HTML parsing (preload scanner) whatwg/html#5624

Closed

mnot mentioned this issue Sep 3, 2020

Some browsers are reusing 'no-cache' cached images without revalidation #1088

Closed

domenic mentioned this issue Oct 27, 2020

Memory cache/list of available images interop meta-issue whatwg/html#6110

Open

yoavweiss mentioned this issue Jan 18, 2021

4.6 Integrate the processing model with Fetch w3c/resource-timing#252

Closed

16 tasks

noamr mentioned this issue Sep 23, 2021

Create resource timing entries for HTML resources whatwg/html#6542

Closed

16 tasks

noamr mentioned this issue Sep 26, 2021

Multi-spec non-feature TODOs for web performance WG w3c/web-performance#38

Closed

20 tasks

noamr mentioned this issue Sep 27, 2021

WIP: Preload browser differences (for discussion, not for merging) web-platform-tests/wpt#30981

Closed

This was referenced Oct 26, 2021

Define behavior of <link rel=preload> in detail whatwg/html#7260

Merged

Integration with preload #1342

Merged

annevk mentioned this issue Feb 11, 2022

Define the "memory cache" #1400

Open

annevk closed this as completed in #1342 Feb 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define the "Preload Cache" #590

Define the "Preload Cache" #590

yoavweiss commented Aug 30, 2017

wanderview commented Dec 18, 2017

annevk commented Dec 18, 2017

igrigorik commented Jan 15, 2018

wanderview commented Jan 16, 2018

kinu commented Apr 11, 2018

sleevi commented Apr 12, 2018 •

edited

Loading

yoavweiss commented Apr 24, 2018

yoavweiss commented Apr 24, 2018 •

edited

Loading

annevk commented Jun 4, 2020

youennf commented Jun 4, 2020

annevk commented Jun 5, 2020

yutakahirano commented Jun 5, 2020 •

edited

Loading

domfarolino commented Jun 5, 2020

mayhemer commented Jun 24, 2020

annevk commented Jun 25, 2020

annevk commented Jun 26, 2020

yutakahirano commented Jun 29, 2020

noamr commented Sep 22, 2021

yoavweiss commented Sep 23, 2021

noamr commented Sep 23, 2021 •

edited

Loading

jakearchibald commented Sep 23, 2021

noamr commented Sep 23, 2021 •

edited

Loading

yoavweiss commented Sep 23, 2021

pmeenan commented Sep 23, 2021

noamr commented Sep 26, 2021 •

edited

Loading

yoavweiss commented Sep 27, 2021

noamr commented Sep 27, 2021

noamr commented Sep 27, 2021

noamr commented Sep 28, 2021 •

edited

Loading

pmeenan commented Sep 28, 2021

emilio commented Sep 28, 2021

noamr commented Sep 28, 2021

noamr commented Sep 29, 2021

noamr commented Oct 26, 2021 •

edited

Loading

Define the "Preload Cache" #590

Define the "Preload Cache" #590

Comments

yoavweiss commented Aug 30, 2017

wanderview commented Dec 18, 2017

annevk commented Dec 18, 2017

igrigorik commented Jan 15, 2018

wanderview commented Jan 16, 2018

kinu commented Apr 11, 2018

sleevi commented Apr 12, 2018 • edited Loading

yoavweiss commented Apr 24, 2018

yoavweiss commented Apr 24, 2018 • edited Loading

annevk commented Jun 4, 2020

youennf commented Jun 4, 2020

annevk commented Jun 5, 2020

yutakahirano commented Jun 5, 2020 • edited Loading

domfarolino commented Jun 5, 2020

mayhemer commented Jun 24, 2020

annevk commented Jun 25, 2020

annevk commented Jun 26, 2020

yutakahirano commented Jun 29, 2020

noamr commented Sep 22, 2021

yoavweiss commented Sep 23, 2021

noamr commented Sep 23, 2021 • edited Loading

jakearchibald commented Sep 23, 2021

noamr commented Sep 23, 2021 • edited Loading

yoavweiss commented Sep 23, 2021

pmeenan commented Sep 23, 2021

noamr commented Sep 26, 2021 • edited Loading

yoavweiss commented Sep 27, 2021

noamr commented Sep 27, 2021

noamr commented Sep 27, 2021

noamr commented Sep 28, 2021 • edited Loading

pmeenan commented Sep 28, 2021

emilio commented Sep 28, 2021

noamr commented Sep 28, 2021

noamr commented Sep 29, 2021

noamr commented Oct 26, 2021 • edited Loading

sleevi commented Apr 12, 2018 •

edited

Loading

yoavweiss commented Apr 24, 2018 •

edited

Loading

yutakahirano commented Jun 5, 2020 •

edited

Loading

noamr commented Sep 23, 2021 •

edited

Loading

noamr commented Sep 23, 2021 •

edited

Loading

noamr commented Sep 26, 2021 •

edited

Loading

noamr commented Sep 28, 2021 •

edited

Loading

noamr commented Oct 26, 2021 •

edited

Loading