Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache-Control: immutable #148

Closed
Malvoz opened this issue Apr 21, 2018 · 19 comments · Fixed by #325
Closed

Cache-Control: immutable #148

Malvoz opened this issue Apr 21, 2018 · 19 comments · Fixed by #325
Labels
enhancement New feature or request
Milestone

Comments

@Malvoz
Copy link
Contributor

Malvoz commented Apr 21, 2018

The cache-control header (which takes precedence over expires if present) has been asked about before in #85 and #73.

I would like to raise this again because the header provides finer control than expires. Also with the addition of the immutable directive (see blog posts 1, 2, 3), we get a performance benefit but also no longer have to set long max-age directives for infinite caching.

@LeoColomb LeoColomb added the enhancement New feature or request label Apr 21, 2018
@LeoColomb
Copy link
Member

Thanks for your suggestion, @Malvoz!

I would like to raise this again because the header provides finer control than expires.

Cache-Control is already used by Apache with the ExpireXX directives.

Also with the addition of the immutable directive

Generally I'm in favor to follow new standards, but here I'm concerned by the potential downsides of the immutable directive:

  • It's not manage by native Apache conf yet, I mean we need to use Header, which comes with a room for bad configurations.
  • Real infinite caching without revalidation must be used with care:
    • Should not be used without SSL/TLS level.
    • Must be used with really-definitive or really-well-managed files, the user must know what it means. This is not so trivial.

@Malvoz
Copy link
Contributor Author

Malvoz commented Apr 22, 2018

Real infinite caching without revalidation must be used with care:
Should not be used without SSL/TLS level.
Must be used with really-definitive or really-well-managed files, the user must know what it means. This is not so trivial.

Yes good catch, it could be commented out with notes on TLS/SSL. The web is moving towards an "HTTPS first" web and there are other HTTP header fields that indeed require HTTPS. I would be surprised if H5BP does not move to an HTTPS-first approach in the future with HTTP configurations commented out instead.

immutable is relatively new and support for it does not cover all major browsers yet. But the fact that cache-control rolls out new directives, IMO speaks in favor for it.

Are there equivalent approaches of expires to all cache-control's directives?

@LeoColomb
Copy link
Member

Are there equivalent approaches of expires to all cache-control's directives?

No, but Expires header is added by Apache automatically for backward compatibility only.

@creopard
Copy link
Contributor

Here's also a nice read about Expires header vs Cache-Control and why Expires header is deprecated...
https://www.fastly.com/blog/headers-we-dont-want

@LeoColomb
Copy link
Member

Just to be clear here: Cache-Control is already the preference in the config. Expires is added by Apache, not explicitly by the config.

@LeoColomb LeoColomb changed the title Cache-Control Cache-Control: immutable May 16, 2018
@creopard
Copy link
Contributor

I guess I was confused by "Expires" and the "ExpiresActive" setting...

@Malvoz
Copy link
Contributor Author

Malvoz commented Jun 21, 2018

@LeoColomb

Generally I'm in favor to follow new standards, but here I'm concerned by the potential downsides of the immutable directive:

... we need to use Header, which comes with a room for bad configurations.

The only benefit I see using mod_expires is that you can ExpiresByType <media type> which seems impossible using cache-control? Instead you need to FilesMatch every potential file which may be error prone. Is that what you are referring to?

... Must be used with really-definitive or really-well-managed files, the user must know what it means. This is not so trivial.

So unless I'm aware of that fact (I realize there is a note on this), this is already an issue with:

ExpiresByType text/css "access plus 1 year"

ExpiresByType application/javascript "access plus 1 year"
ExpiresByType application/x-javascript "access plus 1 year"
ExpiresByType text/javascript "access plus 1 year"

@Malvoz
Copy link
Contributor Author

Malvoz commented Jun 21, 2018

Self quote:

The only benefit I see using mod_expires is that you can ExpiresByType <media type> which seems impossible using cache-control? Instead you need to FilesMatch every potential file which may be error prone.

Maybe you could do something like:
Header set Cache-Control "<VALUE>" "expr=%{CONTENT_TYPE} =~ m#<MEDIA TYPE>|<MEDIA TYPE>#"

@Malvoz
Copy link
Contributor Author

Malvoz commented Jun 7, 2019

Now that this issue is about immutable - I've been looking into filename-based_cache_busting.conf and there are things I suggest to adress:

  • The snippet currently does not set any directives for caching which is kind of the point of versioning files. The optimal strategy would be to serve these files with both immutable and a long max-age as fallback for browsers that don't understand the immutable directive.

  • This advice is quite outdated:

# To understand why this is important and even a better solution than
# using something like `*.css?v231`, please see:
# https://www.stevesouders.com/blog/2008/08/23/revving-filenames-dont-use-querystring/

In 2008 Steve Souders wrote about Squid not caching resources with query string parameters. But it's been around 10 years since Squid changed that behavior:
http://www.squid-cache.org/Versions/v2/2.7/RELEASENOTES.html#s1

The default rules to not cache dynamic content from cgi-bin and query URLs have been altered. Previously, the "cache" ACL was used to mark requests as non-cachable - this is enforced even on dynamic content which returns cachability information. This has changed in Squid-2.7 to use the default refresh pattern. Dynamic content is now cached if it is marked as cachable [...]

@Malvoz
Copy link
Contributor Author

Malvoz commented Jul 7, 2019

Friendly bump :)

The immutable directive is really beneficial in terms of performance. More info on that here:

And it's backwards compatible, browsers that don't understand it just ignores it and uses max-age instead.

Perhaps we can set an environment variable at:

RewriteRule ^(.+)\.(\w+)\.(bmp|css|cur|gif|ico|jpe?g|m?js|png|svgz?|webp|webmanifest)$ $1.$3 [L]

and respond to request within that environment with:

<IfModule mod_headers.c>
  Header merge Cache-Control "immutable, max-age=31536000"
</IfModule>

Now, I'm not comfortable with apache env variables so if you agree with this, you can PR or help me set it up :)

@LeoColomb
Copy link
Member

Thanks @Malvoz.
I'm ready to go. Thoughts @XhmikosR?

@LeoColomb
Copy link
Member

OK, we can start thinking of an implementation.

Webhint suggests the following:

    # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

    # Where needed add `immutable` value to the `Cache-Control` header

    <IfModule mod_headers.c>

        # Because `mod_headers` cannot match based on the content-type,
        # the following workaround needs to be done.

        # 1) Add the `immutable` value to the `Cache-Control` header
        #    to all resources.

        Header merge Cache-Control immutable

        # 2) Remove the value for all resources that shouldn't be have it.

        <FilesMatch "\.(appcache|cur|geojson|ico|json(ld)?|x?html?|topojson|xml)$">
            Header edit Cache-Control immutable ""
        </FilesMatch>

    </IfModule>

As we already did with other conditional headers, we may use MIME type expressions instead. (and as you suggested)

Header merge Cache-Control "immutable" "expr=%{CONTENT_TYPE} =~ m#<MEDIA TYPE>|<MEDIA TYPE>#"

Perhaps we can set an environment variable at

I don't feel confortable adding environment variables. Hard to understand when they are evaluated, hard to debug.

  • This advice [cache-busting with hash in filenames] is quite outdated

This is a different issue, but you are right.
That said it can be hard to have a strong configuration on proxies or CDN when using query string.
To be honest I don't have any precise opinion on this, except that webpack still use the hash-in-name template by default, if I'm correct.

@LeoColomb LeoColomb added this to the v4.0.0 milestone Jul 31, 2019
@Malvoz
Copy link
Contributor Author

Malvoz commented Aug 2, 2019

Webhint suggests the following:

As an aside, I've already opened an issue at webhint about Apaches ability to match based on content-type. The example also seems to have syntax errors, and they should use a long max-age as fallback too, I can take these things up with them.


In the following example, I'm matching against every file that is not text/html and has v= in a query string.

Header set Cache-Control "max-age=31536000, immutable" "expr=%{QUERY_STRING} =~ m#v\=#i && %{CONTENT_TYPE} !~ m#text/html#i"

This would match e.g. /app.css?v=1.0.0.

To meet your want/requirement of having file-name based matching, can we then just apply some regex for %{REQUEST_FILENAME} instead of %{QUERY_STRING} to the example above?

@Malvoz
Copy link
Contributor Author

Malvoz commented Aug 2, 2019

This advice [cache-busting with hash in filenames] is quite outdated

This is a different issue, but you are right.
That said it can be hard to have a strong configuration on proxies or CDN when using query string.

I'm yet to find any up-to-date sources to verify proxies/CDNs having issues with query strings in the modern web (again, Squid introduced caching of query strings as a default in 2008~). But perhaps I haven't searched hard enough. ^^

@LeoColomb
Copy link
Member

In the following example

Let's start with MIME-type only first. We'll see cache busting later.

And I think we should prefer merging over setting Cache-Control header to add the immutable attribute.


But perhaps I haven't searched hard enough.

Lack of feature or correctness is never documented. 😆

@Malvoz
Copy link
Contributor Author

Malvoz commented Aug 9, 2019

I think we should prefer merging over setting Cache-Control header to add the immutable attribute.

I overlooked that in the example. However I don't think merge is good enough either, in section 2.1, RFC 8246:

[...] proxies SHOULD skip conditionally revalidating fresh
responses containing the immutable extension unless there is a signal
from the client that a validation is necessary (e.g., a no-cache
Cache-Control request directive defined in Section 5.2.1.4 of
[RFC7234]).

Although I don't know why a developer would, but in any case a developer uses no-cache or perhaps no-store with versioned files then immutable (and max-age) would be ignored.

@Malvoz
Copy link
Contributor Author

Malvoz commented Nov 10, 2019

Revisiting this; reusing the same MIME-types as used in filename-based_cache_busting.conf (except for .webmanifest, since it shouldn't be versioned) to match the same cache-busting pattern:

<IfModule mod_headers.c>
  Header set Cache-Control "max-age=31536000, immutable" "expr=%{REQUEST_URI} =~ m#^(.+)\.(\w+)\.(bmp|css|cur|gif|ico|jpe?g|m?js|a?png|svgz?|webp)$#i"
</IfModule>

/cc @LeoColomb

@Malvoz
Copy link
Contributor Author

Malvoz commented Nov 12, 2019

A self-reminder to look into this more, while the example above would make sure that other directives (such as no-cache and no-store) are overridden for versioned files per the regex - which is necessary to preserve the behavior of long max-age and immutable (as described in #148 (comment)), this would also override no-transform, it shouldn't...

Q: do transcoding intermediaries (proxies and others) only require Cache-Control to be sent for the document (text/html)? If so then this is not an issue, as immutable shouldn't be specified for HTML resources (and the proposed regex doesn't look for HTML).

Not sure if answer lies somewhere in
https://www.w3.org/TR/ct-landscape/
https://www.w3.org/TR/ct-guidelines/

https://support.google.com/webmasters/answer/6211428?hl=en says (emphasize mine):

Opting out of Web Light
If you do not want your pages to be transcoded, set the HTTP header "Cache-Control: no-transform" in your page response. If Googlebot sees this header, your page will not be transcoded.


Edit: I guess this could be solved by proper ordering in .htaccess, setting the Header merge of Cache-Control: no-transform after immutable... @LeoColomb is ordering of config snippets bad to rely on? Does H5BP do that already?

@LeoColomb
Copy link
Member

Does H5BP do that already?

In a way to get things working yes, but the perfect order is mostly impossible.
Anyway, we can review the order if it helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants