Cloudflare has the ability to index cached resources by tag which allows those resources be purged by tag. However, this feature is only available for Enterprise customers.
Despite this limitation, an index can be built using Workers, D1, and Queues.
This application is broken up into three Workers, three Queues, and one D1 database.
This worker watches requests to the Cloudflare Cache / Origin, captures the tags, and sends them to the Controller in order to be persisted.
Important
By the time a response from an origin reaches a Worker, Cloudflare
has already swallowed the Cache-Tag
header and it is no longer available. To get around this, the worker reads the
custom X-Cache-Tag
header instead.
The worker also exposes a /.cloudflare/purge
endpoint that allows tags to be purged. This endpoint matches the
interface of the Cloudflare endpoint,
but only allows tags
. The tags that are purged will be scoped
to the zone in which the request is
made too. For example, a purge request to https://example.com/.cloudflare/purge
would only purge resources from the
example.com
zone.
A Worker is an account-level resource, but Cache is a zone-level resource. Because of this, there is no way to know what zone a resource is being cached in from a Worker.
To mitigate this problem, we can leverage the CF-Worker header which gets added to outbound requests from a Worker. Unfortunately, this header does not exist when using Service Bindings. The only way to retrieve the header is by making a request to the worker on the provided workers.dev subdomain.
The Controller exists primarily as an intermediary between Watcher and Handler to collect zone information. It is not included as a part of Handler in order to ensure that the worker is collocated in the same data center as Watcher.
The worker also exposes a /purge
endpoint that allows tags to be purged.
This endpoint matches the
interface of the Cloudflare endpoint, but only allows tags
. If no
zone information is provided (via the
CF-Worker header), matching
resources from all zones will be purged.
After receiving and validating requests to either the /capture
or /purge
endpoints, the
worker adds the requests to the cache-capture
and cache-purge-tag
queues respectively.
This worker listens to all three queues and handles them.
When a message is received from Controller in the cache-capture
queue; the URL, zone, and
tags are stored in the D1 database.
A message received from Controller in the cache-purge-tag
queue results in the URLs being
looked up in the D1 database from the provided tag, and re-queing those URLs by
adding each one to the cache-purge-url
queue. Since this will result in the resource being eventually removed from the
cache, the URL and all tags associated with it are removed from the D1
database.
Finally, when a message is received from the cache-purge-url
queue, the URLs are
purged with Cloudflare's API.
I am not aware of a good way to distribute this application for use on your own other than forking it and modifying it. It is licensed under the AGPL-3.0 license so you are free to modify it under the terms of that license. I thought about using Terraform in order to make it easier for others to deploy on their own, but it seemed like overkill for my purposes. I'm happy to accept PRs that make life easier.
In addition to running the suite of Cloudflare Workers, there is a bit of work on the origin server that needs to be done. Thankfully, this is effectively the same as the setup for the standard cache tag purging
- On cacheable responses, add a
X-Cache-Tag
header in the same format as the standardCache-Tag
header - When a change occurs, use the
/.cloudflare/purge
endpoint on Watcher (or for all zones, the/purge
endpoint on Controller) to purge by tag.
If you are using Drupal, you can install and configure the Cloudflare Worker Purge module and these steps will be done for you.
Finally, there is some setup in Cloudflare that is identical to the setup for the standard cache tag purging
- Ensure that the origin is proxied through Cloudflare.
- Create a Cache Rule and ensure that the appropriate resources are cached.
I chose to use the API_TOKEN
secret for
authentication/authorization to the Controller and to use the same token to make requests to
the Cloudflare API. This simplified the approach by only having to have a
single secret in the worker and sharing that secret with the Origin server. This allows the origin to make requests to
the Cloudflare API or the
Worker seamlessly.
The minimum API Token permissions needed are: