Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix GCP CDN caching assets with max-age=0 #314

Merged

Conversation

allantatter
Copy link
Contributor

@allantatter allantatter commented Sep 9, 2022

Problem

GCP's (Google Cloud Platform) Cloud CDN caches responses with Cache-Control: max-age=0 and serves it from stale cache. However the application's intent is not to have those responses cached at all.

Solution

To overcome this, we can remove forcing the max-age=0 header. This also gives the underlying application the ability to set the Cache-Control header itself. The application might have endpoints that should not be cached (Cache-Control: no-cache, private) or want to control depending on the response (endpoint?) for how long it should be cached (e.g. Cache-Control: max-age=60).

@LeoColomb
Copy link
Member

Thanks a lot for opening this PR, @allantatter!

GCP's (Google Cloud Platform) Cloud CDN caches responses with Cache-Control: max-age=0

Interesting! Could you provide any documentation regarding this behavior?

This also gives the underlying application

Unfortunately this is where the suggested solution is reaching this boilerplate boundaries. H5BP configs template targets static files first (i.e. no application behind), before application scenarii.
I believe (I might be wrong) when an application is behind, the server is going to require additional configuration anyway, and then you are free to customize the values.

I hope my point is clear enough! 😅

@allantatter
Copy link
Contributor Author

Unfortunately I did not find the reason why GCP Cloud CDN behaves like that in their documentation:

However, that would not be the only reason why to change Expires from 0 to off. I would think the reason, why currently H5BP set Exipres to 0 is distinguish the data responses (JSON, XML, etc) and not to put the long-living caching headers to those responses. But the side-effect of forcing the Expires to be 0 is that it also overrides the Cache-Control headers set by the underlying application itself (both Cache-Control: no-cache and Cache-Control: max-age=60 would become Cache-Control: max-age=0). I would think the current H5BP solution oversteps its own boundaries and should not mutate the dynamic data requests :) It should just ignore those.

I know that I can change the H5BP config with my custom configuration but I'd like to stick to the default setup as much as possible so it would be easier to update my projects when any new releases are done here.

@LeoColomb
Copy link
Member

Hey @Malvoz, are you around? I'm curious if you have an opinion on this 😊

t-drabold
t-drabold previously approved these changes Oct 20, 2022
@StrangePeanut
Copy link

Wow! Why is an absurd request like this being approved or even considered? I'm concerned for this project.

Wouldn't off stop NGINX from sending headers at all basically letting the browser decide what to do? This would bring back the unexpected/inconsistent behaviour which 0 was meant to prevent.

Why couldn't OP use a configuration of their own? Wasn't this project a baseline rather than one that caters to one person's specific needs (questionable needs at that as the described behaviour apparently isn't even documented)?

What is next? Getting rid of cache headers altogether because someone doesn't want their JavaScript cached?!

@allantatter
Copy link
Contributor Author

allantatter commented Oct 20, 2022

@StrangePeanut What are you talking about? :D

Wouldn't off stop NGINX from sending headers at all basically letting the browser decide what to do?

Nope, off means that it won't interfere cache-control headers and lets the application decide if and how it should be cached.

Wasn't this project a baseline rather than one that caters to one person's specific needs (questionable needs at that as the described behaviour apparently isn't even documented)?

"behaviour apparently isn't even documented" - documented was not the symptom however the main issue is not related to GCP caching. And why do you think I want to solve only my needs? I would then have my custom logic within my own project. I made the PR to fix issues for everyone here.

What is next?

Get less emotional and let's talk again after that.

@StrangePeanut
Copy link

off means that it won't interfere cache-control headers

That might be if the app includes them, else it would be completely up to the browser. This is bad practice and the reason why it is currently set to 0, ie to provide a decent baseline.

documented was not the symptom however the main issue is not related to GCP caching

Your initial comment says otherwise. Typically you would want NGINX to serve resources directly (eg cached) whenever possible. In which case letting NGINX provide the appropriate headers is expected if not, necessary.

I made the PR to fix issues for everyone here.

Your use case simply does not apply to everyone, and merging this PR will inevitably cause more issues than it solves.

@allantatter
Copy link
Contributor Author

allantatter commented Oct 21, 2022

I agree that breaking changes should not be merged hastily. Although bugs should be fixed. So it depends whether we think this is a bug or just one way it could behave (and my PR would just be another approach to it).

The reason why I thought it is a bug came from this behaviour a saw with these 2 scenarios:

Screenshot 2022-10-21 at 09 16 34

If there is a scenario where it makes sense that by default NGINX always overrides Cache-Control value to max-age=0 then I agree that my PR would not necessarily improve the default configuration but just change it. It would make a lot of sense that if no Cache-Control header is present, then NGINX could introduce some default values. But can it do that conditionally so it would not override Cache-Control if it's already set? Then it may be applicable for everyone.

Your initial comment says otherwise.

I know yep that the problem surfaced for me because of GCP platform and this was misleading. I afterwards noticed the main underlying pain points.

@jchimene
Copy link

This seems to be configuring h5bp for your particular use case (and perhaps others). Not every particular configuration change should result in a PR to upstream. Perhaps a wiki entry would be more appropriate contribution?

@allantatter
Copy link
Contributor Author

@jchimene So you think it's the expected behaviour that you see on my graph? I don't think it's rare case that the backend frameworks add Cache-Control headers into the response.

@jchimene
Copy link

@allantatter It's not a rare case, but neither is it pathological. To me, it's appropriate for a particular installation to configure this behavior for their app's behavior. I would not look for such configuration to be pushed upstream. To reiterate: this is not a nominal configuration issue as this PR suggests. I'd prefer the default configuration to "fail safe".

@StrangePeanut
Copy link

Having NGINX serve resources directly in one way or another to offload the backend is very common today. Providing no Cache Control headers in such a scenario would reintroduce the unexpected/inconsistent behaviour which a value of 0 was meant to prevent.

While this could probably be addressed through the use of conditions, it goes beyond the scope of providing a baseline configuration. Where does one draw the line? I can already think of several similar conditions that could be added. At the end of the day, it would bloat the project for the sake of a small number of users. As far as I know, this repository never offered a set-and-forget configuration; it already comes with several supplementary configurations that are not enabled/included by default for this exact reason.

@Dreamsorcerer
Copy link

Wow! Why is an absurd request like this being approved or even considered? I'm concerned for this project.

Literally anyone can "approve" a PR on Github. Check the user account, it's a new user that has nothing to do with this project and no activity at all prior to that action.

Unless you see an approval from @LeoColomb it is meaningless...

@LeoColomb
Copy link
Member

LeoColomb commented Oct 23, 2022

Literally anyone can "approve" a PR on Github.

Not only this is true, but my comments above are quite far from an approval.
So please @StrangePeanut, let's try to keep the discussion healthy and put rude words aside, as it's hurting everyone here: the reporter, the maintainers, and the contributors.

Now, on the concrete side of the discussion, and as initially answered, I share the concerns of changing the configuration to what this PR is suggesting.
In short, based on everyone replies:

  • The root goal of H5BP server configs is static files;
  • This default configuration should stay “fail-safe”;
  • off as default value can raise significant issues for perf and security;
    • Can be seen as a regression;
  • 0 as default value might raise issues as well;
    • This where we reach Nginx "weakness" on Cache-control configuration, or at least if we want to keep the config files healthy for maximum coverage.

Are we all aligned on this?

I'd also add that I think backend frameworks that do a proper work on Cache-control header are quite rare, or at least quite modern.
By this extends, I suppose the situation where power user can disable Nginx overwriting when they are confident about their backend is better than the opposite.

Right?

@StrangePeanut
Copy link

I apologise, I commented with the assumption that this PR had been approved by someone with write permissions. Only now do I realise that this was not the case. Sorry.


Since OP's original issue is that content gets cached undesirably if served with a cache-control value of max-age=0, how about changing it to -1 instead? This introduces the following changes:

Before

cache-control: max-age=0
expires: <current date and time>

After

cache-control: no-cache
expires: Thu, 01 Jan 1970 00:00:01 GMT

-1 ensures that content is absolutely never to be cached which is what we want. Therefore the change from 0 to -1 could be considered a general improvement while potentially addressing OP's issue at the same time.

@allantatter
Copy link
Contributor Author

@StrangePeanut I like your proposed solution, at least it fixes the initial issue. And if I need to add dynamic Cache-Control headers then I can do that by adding custom configurations. If everyone (or most) agree then I can make the changes in the PR.

@LeoColomb
Copy link
Member

This proposition looks good to me as well.
I'd appreciate a third approval before moving forward.
What do you think about the suggestion @jchimene? (@Dreamsorcerer?)

@jchimene
Copy link

+1 It has the virtues of clarity and self-documentation.

@StrangePeanut
Copy link

After putting it to the test, it appears that a value of -1 affects the cache control header as expected but not the expires header which may lead to unexpected behaviour again.

We really should be using a value of epoch to produce the desired results which is in line with the documentation, https://nginx.org/en/docs/http/ngx_http_headers_module.html#expires

Test Results

expires 0;

cache-control: max-age=0
expires: <current date and time>

expires -1;

cache-control: no-cache
expires: <current date and time>

expires epoch;

cache-control: no-cache
expires: Thu, 01 Jan 1970 00:00:01 GMT

@allantatter
Copy link
Contributor Author

expires epoch could be a better option, agreed. Will make that change.

@Dreamsorcerer
Copy link

My thoughts:

GCP should probable have a bug filed against it. Reading around, the expected HTTP behaviour is that it should only serve a stale resource if it has lost connection to the upstream server. With a value of 0, it would always be stale.

However, the recommended approach to always serve the latest version is to use no-cache, so this looks like a sensible change.

One thing missing here though, and I couldn't find anything when searching the project, is the use of private, to indicate that things should not be cached by a proxy, but should be cached by the browser.

In my nginx.conf, I currently have this:

        map $sent_http_content_type $cache_control {
                default                 "private, must-revalidate";
                application/atom+xml    "public";
                application/rss+xml     "public";
                text/css                "public";
                application/javascript  "public";
                text/x-component        "public";
                ~audio/                 "public";
                ~video/                 "public";
                ~image/                 "public";
                ~font/                  "public, immutable";
        }
        add_header Cache-Control $cache_control;

Though with a little more reading, I've realised that it's not recommended to include public in some of these cases, so that should probable be tweaked.

@Dreamsorcerer
Copy link

After putting it to the test, it appears that a value of -1 affects the cache control header as expected but not the expires header which may lead to unexpected behaviour again.

Worth noting that the Expires header appears to be worthless, not sure why it's still being added these days. As far as I can see, Cache-Control max-age replaced Expires in HTTP/1.1, so I doubt there's anything worth supporting that still requires the Expires header.

@StrangePeanut
Copy link

@Dreamsorcerer Spot on. Your approach is superior.

LeoColomb added a commit to h5bp/server-configs-test that referenced this pull request Oct 29, 2022
LeoColomb added a commit to h5bp/server-configs-test that referenced this pull request Oct 29, 2022
@LeoColomb
Copy link
Member

@allantatter Could you please rebase? Or allow maintainer write permissions?
I could not merge without a rebase.

@allantatter allantatter force-pushed the feature/fix-cache-expiration-for-gcp-cdn branch from c53189c to 3dbeed2 Compare November 1, 2022 06:08
@allantatter
Copy link
Contributor Author

@LeoColomb Done. However won't the test fail now as you changed the expected result of Cache-Control from max-age=0 to not being there instead of no-cache?

@LeoColomb LeoColomb enabled auto-merge (rebase) November 1, 2022 14:56
@LeoColomb LeoColomb merged commit b83e71b into h5bp:main Nov 1, 2022
@emansom
Copy link

emansom commented Nov 1, 2022

One thing missing here though, and I couldn't find anything when searching the project, is the use of private, to indicate that things should not be cached by a proxy, but should be cached by the browser.

In my nginx.conf, I currently have this:

        map $sent_http_content_type $cache_control {
                default                 "private, must-revalidate";
                application/atom+xml    "public";
                application/rss+xml     "public";
                text/css                "public";
                application/javascript  "public";
                text/x-component        "public";
                ~audio/                 "public";
                ~video/                 "public";
                ~image/                 "public";
                ~font/                  "public, immutable";
        }
        add_header Cache-Control $cache_control;

Though with a little more reading, I've realised that it's not recommended to include public in some of these cases, so that should probable be tweaked.

@Dreamsorcerer Would very much appreciate a smarter content_transformation.conf, please create a PR for this as well. 👍🏻

@Dreamsorcerer
Copy link

Sorry, I've got no time, feel free to take that template and make a PR.

@LeoColomb
Copy link
Member

LeoColomb commented Nov 1, 2022

@emansom That is actually a really good idea.
As replied by @Dreamsorcerer, don't hesitate to open that PR.
I/we will help to make it acceptable if needed -- btw, thanks everyone for your inputs on this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants