Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document HTTP cache config #2593

Merged
merged 1 commit into from
Jun 5, 2024
Merged

Document HTTP cache config #2593

merged 1 commit into from
Jun 5, 2024

Conversation

bep
Copy link
Member

@bep bep commented May 23, 2024

No description provided.

@bep
Copy link
Member Author

bep commented Jun 3, 2024

@jmooring could I borrow your eyes for a moment.

doc: https://deploy-preview-2593--gohugoio.netlify.app/getting-started/configuration/#configure-http-cache
pr: gohugoio/hugo#12523

There are some test failures, but other than that this works pretty good in my tests, but I'm struggling with deciding what the config defaults should be. People use resources.GetRemote for lots of "things" and this is mostly valuable for content sources that changes (e.g. CMS, WordPress ...). Below is what's in the PR, but I'm now leaning towards just disabling everything by default. What do you think?

Also note that not every HTTP server out there implements the HTTP spec properly (Netlify is doing a poor job, see https://answers.netlify.com/t/server-sometimes-responds-with-http-200-even-when-if-none-match-matches-etag/37852/33?u=bep).

[HTTPCache]
  [HTTPCache.cache]
    [HTTPCache.cache.for]
      excludes = ['**{.jpg,.jpeg,.png,.webp,.gif,.ttf}']
      includes = ['**']
  [[HTTPCache.polls]]
    disable = true
    high = '0s'
    low = '0s'
    [HTTPCache.polls.for]
      includes = ['**{.jpg,.jpeg,.png,.webp,.gif,.ttf}', 'https://*.{twitter,x,facebook,instagram}.com/**']
  [[HTTPCache.polls]]
    disable = false
    high = '30m0s'
    low = '10m0s'
    [HTTPCache.polls.for]
      includes = ['https://*.{github}.com/**']
  [[HTTPCache.polls]]
    disable = false
    high = '30s'
    low = '1s'
    [HTTPCache.polls.for]
      includes = ['**']

@jmooring
Copy link
Member

jmooring commented Jun 3, 2024

@bep You've obviously given this almost infinitely more thought than I have, but I think disabling by default would be safer.

Reasoning:

  1. This will, in some cases, change existing behavior.
  2. I am concerned that someone may unexpectedly hit a rate limit where they didn't before because, by default, the caches.getresource never expires.

@jmooring
Copy link
Member

jmooring commented Jun 3, 2024

I also think we need to craft a example that users can easily replicate to demonstrate the power of this feature, and why in some cases it's better than setting a unique cache expiration for each resources.GetRemote call. Maybe this is integration with something like Google forms for event registrations, a commenting system, an Instagram feed, etc.

This might be best in a tips and tricks article... not sure.

@bep
Copy link
Member Author

bep commented Jun 3, 2024

but I think disabling by default would be safer.

Agree. That was my conclusion as well, I guess.

and why in some cases it's better than setting a unique cache expiration for each resources.GetRemote call.

Well, there's 2 aspects of this new feature:

  1. Proper handling of etag and/or last-modified headers.
  2. Polling for changes and trigger partial rebuilds.

Obviously 2) works better with HTTP servers with sensible HTTP cache implementations (avoid having to download 50mb JSON files if nothing changed), but it would work fine with the {{ $cacheKey := print $url (now.Format "2006-01-02") }} strategy.

@bep bep force-pushed the future/httpcache branch 2 times, most recently from 465a4d7 to 6bb7615 Compare June 3, 2024 18:57
@bep
Copy link
Member Author

bep commented Jun 4, 2024

OK, I will merge this PR as part of 0.127.0, should be OK. I will write some technical notes here if someone wants to elaborate:

  • With polling enabled for a resource when the server is running, rebuilds will only trigger on actual changes. This is controlled with eTag changes (if the server does not provide one, Hugo generates one from a MD5 hash of the response body).
  • For a given resource, if polling is enabled and HTTP cache disabled, we will not check the remote for changes until the TTL of the file in the file cache expires (so having a maxAge set to 10h and a poll interval of 1s would not make much sense in this setup).
  • For a given resource, if both polling and HTTP cache is enabled, we will check for changes even outside any TTL in the file cache. In short, we pass any cached etag in a if-none-match request header and/or any cached last-modified in the if-modified-since request header and return the cached response on HTTP 304. For more details, see https://github.com/gohugoio/httpcache.

@bep bep marked this pull request as ready for review June 5, 2024 10:47
@bep bep merged commit 927c905 into master Jun 5, 2024
6 checks passed
@jmooring jmooring deleted the future/httpcache branch June 6, 2024 16:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants