-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Send post ID of queried object or first post in loop in URL Metric storage request to schedule page cache validation #1641
Conversation
619ac59
to
124b9ba
Compare
This PR is not currently accounting for prompting page caches to flush the cache for archive pages in which there is no queried object, that is, the home page (which isn't a static front page). Author and term archive pages should be accounted for with Pantheon Advanced Cache since there is a queried object and Pantheon Advanced Cache uses Nevertheless, sometimes page cache plugins will automatically flush the homepage cache when a post is updated:
What this means is that even if the loaded URL doesn't have a singular "queried object", if we instead opt to flush the cache for the first post that appears in the loop, then this should generally result in the page cache(s) being flushed for the desired archive pages, or at very least the non-static homepage. |
* @return string HMAC. | ||
*/ | ||
function od_get_url_metrics_storage_hmac( string $slug, string $url ): string { | ||
$action = "store_url_metric:$slug:$url"; | ||
function od_get_url_metrics_storage_hmac( string $slug, string $url, ?string $queried_object_type = null, ?int $queried_object_id = null ): string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on what we previously discussed. I think it would be great to abstract the URL representation into a custom type. For example a class like URL_Entity
or URL_Content
or so, which would be able to support different ways to identify content (e.g. not everything has a "queried object" or a numeric ID, some URLs need other kinds of identifiers).
Maybe have that class implement __toString()
, which would be a perfect fit for the usage here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, potentially. However, that may make more sense for #1466. In particular, I'm wary to not expose too much of the internals of how the page is constructed, as these arguments have to be passed to the frontend. The exclusive purpose of this parameter here is to communicate the endpoint which object should have its cache cleaned. Since it won't always be the "queried object" in WP parlance in the case of many archives but rather could be simply the first post in the loop. In that way, there should always be some object to pass here with an integer ID. Otherwise, if there is no object, then this would normally be is_404()
case which is not optimized anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wary to not expose too much of the internals of how the page is constructed, as these arguments have to be passed to the frontend.
I don't think any of this would be sensitive - most of it is already surfacing publicly anyway, via body
classes that WordPress Core adds.
Since it won't always be the "queried object" in WP parlance in the case of many archives but rather could be simply the first post in the loop.
For that though, you would need to know which loop. For example, if there was a custom post type archive "type", it would still be missing an identifier for which type, so that you could fetch the first post for that post type archive, not any post type archive.
All of this IMO would make more sense to be abstracted away in a class. The class could also have it built-in to return the relevant object ID in case of an archive. It would make passing around this data more intuitive too (e.g. in JS it would just be an object with the necessary data), since all you need to know would be to pass around one object (e.g. urlEntity
or whatever we'd call it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think subqueries are relevant here, are they? Why would we need to know which loop the post was in?
Most caching plugins would have no idea which URL to flush when provided the post in a subquery since there is nothing connecting the top URL query vars to the arbitrary query vars passed into the subquery. More sophisticated caching plugins would be able to connect these dots if they're using surrogate keys and if they do output buffering of the page and accumulate a new key for each the_post
action.
Pantheon Advanced Page Cache only computes surrogate keys for the posts in the main loop: https://github.com/pantheon-systems/pantheon-advanced-page-cache/blob/e3b5552b0cb9268d9b696cb200af56cc044920d9/inc/class-emitter.php#L211-L322
LightSpeed Cache's surrogate key computation is quite similar: https://github.com/litespeedtech/lscache_wp/blob/7c707469b3c88b4f45d9955593b92f9aeaed54c3/src/tag.cls.php#L239-L313
By default it is only computing surrogate keys for posts in the main loop. It does expose actions which plugins can use to add surrogate keys for subqueries, which seems to only be used in a WooCommerce integration for its related products:https://wpdirectory.net/search/01JCKMEYKEX35RM0JVJ4TE8RV3
For our purposes here, we just need to communicate to the endpoint which object should be cleaned from the cache. Thinking about this some more, there's no need for the queriedObject
top-level key in the URL Metric schema. We just need the endpoint to accept two parameters alongside the url
and the slug
(in addition to the hmac
which validates the parameters):
- The type of object which should be cleaned from the cache.
- The ID of the object which should be cleaned from the cache.
Pantheon Advanced Page Cache does the right thing to flush term and author archives when the term or user is updated. But it seems unique in this regard. I don't see that LightSpeed Cache has any hooks to flush caches in response to anything bot a post edit (or a comment change). This being the case, perhaps using the queried object as the basis for flushing the cache isn't ideal anyway and that we should only ever pass along the ID for the post that should have its cache cleaned. This would handle the cases for:
- Singular post/page in which the queried object is a
post
. - Non-static homepage in which case there is no queried object but there is an initial
post
in the loop. - Author/term/date archives in which there is also an initial post in the loop.
So yeah, I think this can be further simplified so that a cache_purge_post_id
parameter is added to the endpoint, and when the endpoint receives it, it schedules an event in WP Cron to then fire the hooks to simulate an edit to that post in order for page caches to invalidate their cache for the URL(s). In reality, this will result in more than the intended URL being purged from the cache (e.g. the post permalink but also the homepage and maybe some archive as well), but there's no way to avoid that given the lack of any WordPress APIs for page caching. It would be really nice if WordPress had a standard API for page caching plugins to be told to purge the cache for a given URL!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure what you're referring to with "subqueries", so I only understand parts of your response.
I wouldn't tie our approach too much to what cache plugins are actually doing (particularly when there is more than they could do). That said, the approach of always clearing a specific post makes sense to me, since every URL contains at least one post in some capacity (except 404s).
What I was referring to before was the idea to pass the context around (e.g. type etc.) so that then the receiving endpoint would figure out which post to clear the cache for. I think what you're saying now is that instead OD would already determine that for the URL initially and then pass around the resulting post ID. So I think we were more or less talking about the same end goal, but with different approaches. Passing around the post ID to clear the cache for makes sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@felixarntz Sorry, I misinterpreted what you meant by "which loop". My mind went to adding additional WP_Query
loops inside of the main loop.
I think your idea makes sense as a future enhancement to give caching plugins more say over how cache invalidation is handled. But for this first iteration, I think sending along more context would be useful eventually.
Aside: What if the query vars were sent as-is without being pre-hashed into a slug
? Having access to these query vars would be useful for caching plugins to determine what cache should be cleared as well. They would also be useful for debugging a URL Metric. I was hesitant to expose the query vars, thinking that perhaps there could be some private data located in there, but maybe not since these query vars would all be exposed if pretty permalinks aren't enabled? Maybe I'm being overly cautious. Probably something for another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think keeping the query vars hashed for now is best. I'm cautious of exposing more if we don't have to, so let's discuss that separately.
I think your idea makes sense as a future enhancement to give caching plugins more say over how cache invalidation is handled. But for this first iteration, I think sending along more context would be useful eventually.
Which idea are you referring to? Based on my understanding, it was your idea from the earlier comment to pass around only a post ID that needs to have its cache invalidated when metrics for the URL change, instead of passing around queried object type and queried object ID. And that makes sense to me, based on the idea that the logic would determine that post ID...
- either based on the queried object ID
- or if there's no queried object ID, the first post in the current loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is my idea, as that is what I have found to be most reliable to invalidate caches for caching plugins. However some caching plugins, like Advanced Pantheon Page Cache, would be able to handle a more targeted cache invalidation when the queried object is a term or author. But they'll also work when invalidating the cache for a post that appears in the loop on the archive page, so I think for now what is currently implementing in this PR is a good first step.
f30dbce
to
d8483bb
Compare
const isLCP = | ||
elementIntersection.target === lcpMetric?.entries[ 0 ]?.element; | ||
const element = /** @type {Element|null} */ ( | ||
lcpMetric?.entries[ 0 ]?.element | ||
); | ||
const isLCP = elementIntersection.target === element; | ||
|
||
/** @type {ElementData} */ | ||
const elementData = { | ||
isLCP, | ||
isLCPCandidate: !! lcpMetricCandidates.find( | ||
( lcpMetricCandidate ) => | ||
lcpMetricCandidate.entries[ 0 ]?.element === | ||
elementIntersection.target | ||
( lcpMetricCandidate ) => { | ||
const candidateElement = /** @type {Element|null} */ ( | ||
lcpMetricCandidate.entries[ 0 ]?.element | ||
); | ||
return candidateElement === elementIntersection.target; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just to work around a static analysis issue I get complained about in my IDE. It's because web-vitals defines the Element
interface but it is not the same Element
defined in the IDE's DOM types.
4158f7a
to
28a6491
Compare
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message.
To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@westonruter This looks great. Just left some minor feedback for your consideration.
* @since n.e.x.t | ||
* @access private | ||
* | ||
* @return int|null Post ID or null if none found. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit-pick, but why not stick with int
and return 0
if none found? 0
can never be an ID anyway and is false-y too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. But null
seems to be more universally semantic to indicate nothing was found. If null
then you can check that case with either is_null()
or !is_int()
. And null
is also false-y.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My counter-argument is that int
is a single type, and the check can simply be made with ! $post_id
either way. Not a big deal either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except not with our strict PHPStan rules in place 😄
current_user_can( 'customize' ) || | ||
// Page caching plugins can only reliably be told to invalidate a cached page when a post is available to trigger | ||
// the relevant actions on. | ||
null !== od_get_cache_purge_post_id() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above.
* @return string HMAC. | ||
*/ | ||
function od_get_url_metrics_storage_hmac( string $slug, string $url ): string { | ||
$action = "store_url_metric:$slug:$url"; | ||
function od_get_url_metrics_storage_hmac( string $slug, string $url, ?int $cache_purge_post_id = null ): string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above.
Also, should this maybe be a required parameter? That would emphasize that it's needed to optimize responses anyway, as you already check above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, should this maybe be a required parameter? That would emphasize that it's needed to optimize responses anyway, as you already check above.
The check I added means that by default it will not offer to optimize responses when od_get_cache_purge_post_id()
returns null
. But this can be overridden with the od_can_optimize_response
filter. So we're not guaranteed that it will always be not-null
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be a use-case for that? I'm thinking it might be better to not make that possible. For example, we could make it so that the filter only is applied when there is a post ID. That feels more robust to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sometimes I've made sites where I use author archive pages to create team bio pages, for example. In such a case, there often isn't any blog post actually authored by that user. Nevertheless, if I go to their author page it doesn't actually return a 404. It serves a page with the author's name in the document title and if the theme has an author.php
template which is configured to display the user bio even when there are no posts, then it will work as intended. For example, the Twenty Ten theme shows the user bio on author.php
even when there are no posts by that author:
* @return bool Whether the HMAC is valid. | ||
*/ | ||
function od_verify_url_metrics_storage_hmac( string $hmac, string $slug, string $url ): bool { | ||
return hash_equals( od_get_url_metrics_storage_hmac( $slug, $url ), $hmac ); | ||
function od_verify_url_metrics_storage_hmac( string $hmac, string $slug, string $url, ?int $cache_purge_post_id = null ): bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above. Maybe make this required?
* - LightSpeed Cache <https://github.com/litespeedtech/lscache_wp/blob/7c707469b3c88b4f45d9955593b92f9aeaed54c3/src/purge.cls.php#L68> | ||
*/ | ||
/** This action is documented in wp-includes/post.php. */ | ||
do_action( 'transition_post_status', $post->post_status, $post->post_status, $post ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this works well. I could see how caching plugins compare the before and after value, and this here would be a weird scenario that Core I believe never triggers (the same value in both).
Probably still better to keep this than not have it, but this may not work reliably.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Core does actually trigger when it is the same value for both. If you have published a post and then you update the post again, then this same transition_post_status
action is triggered with the same publish
value for both of those arguments. I just tested to confirm this.
And we need to keep this anyway because some caching plugins only listen for this action to decide whether to invalidate their caches.
current_user_can( 'customize' ) || | ||
// Page caching plugins can only reliably be told to invalidate a cached page when a post is available to trigger | ||
// the relevant actions on. | ||
null !== od_get_cache_purge_post_id() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixes #1496.
This seeks to proactively address the problem of a page caching plugin holding onto an unoptimized cached page even after URL Metrics have been submitted by visitors.
Page caches generally will flush caches for a queried object's URL when the relevant actions fire (e.g.
clean_post_cache
,edit_post
,transition_post_status
). By sending the queried object information with the URL Metric storage request we can trigger these actions so that page caching plugins will flush the page caches in order so that optimizations leveraging the newly-stored URL Metric will promptly be reflected on the frontend.These actions are not triggered at the same moment that the new URL Metric is stored, as this could cause a cache stampede. Instead, a scheduled event is created 10 minutes into the future to trigger the cache invalidation for the post ID. Since the post ID is passed as an argument to the scheduled event, WP Cron will automatically prevent duplicates.