Send post ID of queried object or first post in loop in URL Metric storage request to schedule page cache validation #1641

westonruter · 2024-11-10T06:11:07Z

This seeks to proactively address the problem of a page caching plugin holding onto an unoptimized cached page even after URL Metrics have been submitted by visitors.

Page caches generally will flush caches for a queried object's URL when the relevant actions fire (e.g. clean_post_cache, edit_post, transition_post_status). By sending the queried object information with the URL Metric storage request we can trigger these actions so that page caching plugins will flush the page caches in order so that optimizations leveraging the newly-stored URL Metric will promptly be reflected on the frontend.

These actions are not triggered at the same moment that the new URL Metric is stored, as this could cause a cache stampede. Instead, a scheduled event is created 10 minutes into the future to trigger the cache invalidation for the post ID. Since the post ID is passed as an argument to the scheduled event, WP Cron will automatically prevent duplicates.

westonruter · 2024-11-11T05:20:40Z

This PR is not currently accounting for prompting page caches to flush the cache for archive pages in which there is no queried object, that is, the home page (which isn't a static front page). Author and term archive pages should be accounted for with Pantheon Advanced Cache since there is a queried object and Pantheon Advanced Cache uses clean_term_cache and clean_user_cache actions to flush caches by the relevant surrogate keys. But it seems other caching plugins like WP Super Cache, WP Total Cache, and WP Rocket do not flush page caches for such queried objects.

Nevertheless, sometimes page cache plugins will automatically flush the homepage cache when a post is updated:

Batcache
WP Rocket
WP Super Cache
Pantheon Advanced Cache (if the post appeared on the homepage)
W3 Total Cache

What this means is that even if the loaded URL doesn't have a singular "queried object", if we instead opt to flush the cache for the first post that appears in the loop, then this should generally result in the page cache(s) being flushed for the desired archive pages, or at very least the non-static homepage.

felixarntz · 2024-11-13T01:06:31Z

plugins/optimization-detective/storage/data.php

 * @return string HMAC.
 */
-function od_get_url_metrics_storage_hmac( string $slug, string $url ): string {
-	$action = "store_url_metric:$slug:$url";
+function od_get_url_metrics_storage_hmac( string $slug, string $url, ?string $queried_object_type = null, ?int $queried_object_id = null ): string {


Based on what we previously discussed. I think it would be great to abstract the URL representation into a custom type. For example a class like URL_Entity or URL_Content or so, which would be able to support different ways to identify content (e.g. not everything has a "queried object" or a numeric ID, some URLs need other kinds of identifiers).

Maybe have that class implement __toString(), which would be a perfect fit for the usage here.

Yeah, potentially. However, that may make more sense for #1466. In particular, I'm wary to not expose too much of the internals of how the page is constructed, as these arguments have to be passed to the frontend. The exclusive purpose of this parameter here is to communicate the endpoint which object should have its cache cleaned. Since it won't always be the "queried object" in WP parlance in the case of many archives but rather could be simply the first post in the loop. In that way, there should always be some object to pass here with an integer ID. Otherwise, if there is no object, then this would normally be is_404() case which is not optimized anyway.

I'm wary to not expose too much of the internals of how the page is constructed, as these arguments have to be passed to the frontend.

I don't think any of this would be sensitive - most of it is already surfacing publicly anyway, via body classes that WordPress Core adds.

Since it won't always be the "queried object" in WP parlance in the case of many archives but rather could be simply the first post in the loop.

For that though, you would need to know which loop. For example, if there was a custom post type archive "type", it would still be missing an identifier for which type, so that you could fetch the first post for that post type archive, not any post type archive.

All of this IMO would make more sense to be abstracted away in a class. The class could also have it built-in to return the relevant object ID in case of an archive. It would make passing around this data more intuitive too (e.g. in JS it would just be an object with the necessary data), since all you need to know would be to pass around one object (e.g. urlEntity or whatever we'd call it).

I don't think subqueries are relevant here, are they? Why would we need to know which loop the post was in?

Most caching plugins would have no idea which URL to flush when provided the post in a subquery since there is nothing connecting the top URL query vars to the arbitrary query vars passed into the subquery. More sophisticated caching plugins would be able to connect these dots if they're using surrogate keys and if they do output buffering of the page and accumulate a new key for each the_post action.

Pantheon Advanced Page Cache only computes surrogate keys for the posts in the main loop: https://github.com/pantheon-systems/pantheon-advanced-page-cache/blob/e3b5552b0cb9268d9b696cb200af56cc044920d9/inc/class-emitter.php#L211-L322

LightSpeed Cache's surrogate key computation is quite similar: https://github.com/litespeedtech/lscache_wp/blob/7c707469b3c88b4f45d9955593b92f9aeaed54c3/src/tag.cls.php#L239-L313

By default it is only computing surrogate keys for posts in the main loop. It does expose actions which plugins can use to add surrogate keys for subqueries, which seems to only be used in a WooCommerce integration for its related products:https://wpdirectory.net/search/01JCKMEYKEX35RM0JVJ4TE8RV3

For our purposes here, we just need to communicate to the endpoint which object should be cleaned from the cache. Thinking about this some more, there's no need for the queriedObject top-level key in the URL Metric schema. We just need the endpoint to accept two parameters alongside the url and the slug (in addition to the hmac which validates the parameters):

The type of object which should be cleaned from the cache.

The ID of the object which should be cleaned from the cache.

Pantheon Advanced Page Cache does the right thing to flush term and author archives when the term or user is updated. But it seems unique in this regard. I don't see that LightSpeed Cache has any hooks to flush caches in response to anything bot a post edit (or a comment change). This being the case, perhaps using the queried object as the basis for flushing the cache isn't ideal anyway and that we should only ever pass along the ID for the post that should have its cache cleaned. This would handle the cases for:

Singular post/page in which the queried object is a post.

Non-static homepage in which case there is no queried object but there is an initial post in the loop.

Author/term/date archives in which there is also an initial post in the loop.

So yeah, I think this can be further simplified so that a cache_purge_post_id parameter is added to the endpoint, and when the endpoint receives it, it schedules an event in WP Cron to then fire the hooks to simulate an edit to that post in order for page caches to invalidate their cache for the URL(s). In reality, this will result in more than the intended URL being purged from the cache (e.g. the post permalink but also the homepage and maybe some archive as well), but there's no way to avoid that given the lack of any WordPress APIs for page caching. It would be really nice if WordPress had a standard API for page caching plugins to be told to purge the cache for a given URL!

I'm not entirely sure what you're referring to with "subqueries", so I only understand parts of your response.

I wouldn't tie our approach too much to what cache plugins are actually doing (particularly when there is more than they could do). That said, the approach of always clearing a specific post makes sense to me, since every URL contains at least one post in some capacity (except 404s).

What I was referring to before was the idea to pass the context around (e.g. type etc.) so that then the receiving endpoint would figure out which post to clear the cache for. I think what you're saying now is that instead OD would already determine that for the URL initially and then pass around the resulting post ID. So I think we were more or less talking about the same end goal, but with different approaches. Passing around the post ID to clear the cache for makes sense to me.

@felixarntz Sorry, I misinterpreted what you meant by "which loop". My mind went to adding additional WP_Query loops inside of the main loop.

I think your idea makes sense as a future enhancement to give caching plugins more say over how cache invalidation is handled. But for this first iteration, I think sending along more context would be useful eventually.

Aside: What if the query vars were sent as-is without being pre-hashed into a slug? Having access to these query vars would be useful for caching plugins to determine what cache should be cleared as well. They would also be useful for debugging a URL Metric. I was hesitant to expose the query vars, thinking that perhaps there could be some private data located in there, but maybe not since these query vars would all be exposed if pretty permalinks aren't enabled? Maybe I'm being overly cautious. Probably something for another PR.

I think keeping the query vars hashed for now is best. I'm cautious of exposing more if we don't have to, so let's discuss that separately.

I think your idea makes sense as a future enhancement to give caching plugins more say over how cache invalidation is handled. But for this first iteration, I think sending along more context would be useful eventually.

Which idea are you referring to? Based on my understanding, it was your idea from the earlier comment to pass around only a post ID that needs to have its cache invalidated when metrics for the URL change, instead of passing around queried object type and queried object ID. And that makes sense to me, based on the idea that the logic would determine that post ID...

either based on the queried object ID

or if there's no queried object ID, the first post in the current loop

Yes, that is my idea, as that is what I have found to be most reliable to invalidate caches for caching plugins. However some caching plugins, like Advanced Pantheon Page Cache, would be able to handle a more targeted cache invalidation when the queried object is a term or author. But they'll also work when invalidating the cache for a post that appears in the loop on the archive page, so I think for now what is currently implementing in this PR is a good first step.

… stored

…d post object

…definitions

westonruter · 2024-11-14T17:59:32Z

plugins/optimization-detective/detect.js

-		const isLCP =
-			elementIntersection.target === lcpMetric?.entries[ 0 ]?.element;
+		const element = /** @type {Element|null} */ (
+			lcpMetric?.entries[ 0 ]?.element
+		);
+		const isLCP = elementIntersection.target === element;

 		/** @type {ElementData} */
 		const elementData = {
 			isLCP,
 			isLCPCandidate: !! lcpMetricCandidates.find(
-				( lcpMetricCandidate ) =>
-					lcpMetricCandidate.entries[ 0 ]?.element ===
-					elementIntersection.target
+				( lcpMetricCandidate ) => {
+					const candidateElement = /** @type {Element|null} */ (
+						lcpMetricCandidate.entries[ 0 ]?.element
+					);
+					return candidateElement === elementIntersection.target;
+				}


This is just to work around a static analysis issue I get complained about in my IDE. It's because web-vitals defines the Element interface but it is not the same Element defined in the IDE's DOM types.

plugins/optimization-detective/storage/rest-api.php

github-actions · 2024-11-14T20:48:19Z

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message.

Co-authored-by: westonruter <westonruter@git.wordpress.org>
Co-authored-by: felixarntz <flixos90@git.wordpress.org>

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

felixarntz

@westonruter This looks great. Just left some minor feedback for your consideration.

felixarntz · 2024-11-15T21:10:19Z

plugins/optimization-detective/detection.php

+ * @since n.e.x.t
+ * @access private
+ *
+ * @return int|null Post ID or null if none found.


Nit-pick, but why not stick with int and return 0 if none found? 0 can never be an ID anyway and is false-y too.

True. But null seems to be more universally semantic to indicate nothing was found. If null then you can check that case with either is_null() or !is_int(). And null is also false-y.

My counter-argument is that int is a single type, and the check can simply be made with ! $post_id either way. Not a big deal either way.

Except not with our strict PHPStan rules in place 😄

felixarntz · 2024-11-15T21:10:43Z

plugins/optimization-detective/optimization.php

+		current_user_can( 'customize' ) ||
+		// Page caching plugins can only reliably be told to invalidate a cached page when a post is available to trigger
+		// the relevant actions on.
+		null !== od_get_cache_purge_post_id()


felixarntz · 2024-11-15T21:10:51Z

plugins/optimization-detective/storage/data.php

 * @return string HMAC.
 */
-function od_get_url_metrics_storage_hmac( string $slug, string $url ): string {
-	$action = "store_url_metric:$slug:$url";
+function od_get_url_metrics_storage_hmac( string $slug, string $url, ?int $cache_purge_post_id = null ): string {


See above.

Also, should this maybe be a required parameter? That would emphasize that it's needed to optimize responses anyway, as you already check above.

Also, should this maybe be a required parameter? That would emphasize that it's needed to optimize responses anyway, as you already check above.

The check I added means that by default it will not offer to optimize responses when od_get_cache_purge_post_id() returns null. But this can be overridden with the od_can_optimize_response filter. So we're not guaranteed that it will always be not-null.

What would be a use-case for that? I'm thinking it might be better to not make that possible. For example, we could make it so that the filter only is applied when there is a post ID. That feels more robust to me.

Sometimes I've made sites where I use author archive pages to create team bio pages, for example. In such a case, there often isn't any blog post actually authored by that user. Nevertheless, if I go to their author page it doesn't actually return a 404. It serves a page with the author's name in the document title and if the theme has an author.php template which is configured to display the user bio even when there are no posts, then it will work as intended. For example, the Twenty Ten theme shows the user bio on author.php even when there are no posts by that author:

felixarntz · 2024-11-15T21:11:37Z

plugins/optimization-detective/storage/data.php

 * @return bool Whether the HMAC is valid.
 */
-function od_verify_url_metrics_storage_hmac( string $hmac, string $slug, string $url ): bool {
-	return hash_equals( od_get_url_metrics_storage_hmac( $slug, $url ), $hmac );
+function od_verify_url_metrics_storage_hmac( string $hmac, string $slug, string $url, ?int $cache_purge_post_id = null ): bool {


See above. Maybe make this required?

felixarntz · 2024-11-15T21:13:36Z

plugins/optimization-detective/storage/rest-api.php

+	 * - LightSpeed Cache <https://github.com/litespeedtech/lscache_wp/blob/7c707469b3c88b4f45d9955593b92f9aeaed54c3/src/purge.cls.php#L68>
+	 */
+	/** This action is documented in wp-includes/post.php. */
+	do_action( 'transition_post_status', $post->post_status, $post->post_status, $post );


I wonder if this works well. I could see how caching plugins compare the before and after value, and this here would be a weird scenario that Core I believe never triggers (the same value in both).

Probably still better to keep this than not have it, but this may not work reliably.

Core does actually trigger when it is the same value for both. If you have published a post and then you update the post again, then this same transition_post_status action is triggered with the same publish value for both of those arguments. I just tested to confirm this.

And we need to keep this anyway because some caching plugins only listen for this action to decide whether to invalidate their caches.

westonruter · 2024-11-17T04:26:04Z

plugins/optimization-detective/optimization.php

+		current_user_can( 'customize' ) ||
+		// Page caching plugins can only reliably be told to invalidate a cached page when a post is available to trigger
+		// the relevant actions on.
+		null !== od_get_cache_purge_post_id()


This was wrong!!! It should have been:

null === od_get_cache_purge_post_id()

See #1659

westonruter added [Type] Bug An existing feature is broken [Plugin] Optimization Detective Issues for the Optimization Detective plugin labels Nov 10, 2024

westonruter added this to the optimization-detective n.e.x.t milestone Nov 10, 2024

westonruter force-pushed the fix/od-with-page-caching-3 branch from 619ac59 to 124b9ba Compare November 10, 2024 06:17

westonruter mentioned this pull request Nov 11, 2024

Detection time window may not allow enough URL Metrics to be collected #1496

Closed

westonruter self-assigned this Nov 11, 2024

felixarntz reviewed Nov 13, 2024

View reviewed changes

Base automatically changed from fix/od-with-page-caching-2 to trunk November 13, 2024 04:57

westonruter added 6 commits November 13, 2024 12:28

Include queried object in URL Metric to clean cache when stored

5163e84

Add missing since tags

c97cc07

Fire action to clean the cache for the queried object when URL Metric…

137246d

… stored

Trigger save_post action for queried object when URL Metric stored

c0b8758

Fire transition_post_status action when URL Metric updated for querie…

faf6882

…d post object

Add reference to LightSpeed Cache

d8483bb

westonruter force-pushed the fix/od-with-page-caching-3 branch from f30dbce to d8483bb Compare November 13, 2024 20:29

westonruter added 2 commits November 14, 2024 09:47

Replace queriedObject with cachePurgePostId

39df98b

Work around minor static analysis complaint due to differing Element …

41a1526

…definitions

westonruter commented Nov 14, 2024

View reviewed changes

plugins/optimization-detective/storage/rest-api.php Outdated Show resolved Hide resolved

westonruter added 2 commits November 14, 2024 12:44

Switch to triggering invalidation of page cache via scheduled event

eee4761

Add test for od_get_cache_purge_post_id()

28a6491

westonruter force-pushed the fix/od-with-page-caching-3 branch from 4158f7a to 28a6491 Compare November 14, 2024 20:44

westonruter changed the title ~~Include queried object in URL Metric to clean cache when stored~~ Send post ID of queried object or first post in loop in URL Metric storage request to schedule page cache validation Nov 14, 2024

westonruter marked this pull request as ready for review November 14, 2024 20:48

Add docs for the od_url_metric_stored action

b76d1d3

felixarntz approved these changes Nov 15, 2024

View reviewed changes

westonruter mentioned this pull request Nov 15, 2024

Optimization Detective should be further hardened to account for high traffic sites #1655

Open

westonruter merged commit 7b57489 into trunk Nov 15, 2024
30 of 33 checks passed

westonruter deleted the fix/od-with-page-caching-3 branch November 15, 2024 23:12

westonruter mentioned this pull request Nov 17, 2024

Fix logic inversion in od_can_optimize_response() #1659

Merged

westonruter commented Nov 17, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Send post ID of queried object or first post in loop in URL Metric storage request to schedule page cache validation #1641

Send post ID of queried object or first post in loop in URL Metric storage request to schedule page cache validation #1641

westonruter commented Nov 10, 2024 •

edited

Loading

westonruter commented Nov 11, 2024

felixarntz Nov 13, 2024

westonruter Nov 13, 2024

felixarntz Nov 13, 2024 •

edited

Loading

westonruter Nov 13, 2024

felixarntz Nov 15, 2024

westonruter Nov 15, 2024

felixarntz Nov 15, 2024 •

edited

Loading

westonruter Nov 15, 2024

westonruter Nov 14, 2024

github-actions bot commented Nov 14, 2024

felixarntz left a comment

felixarntz Nov 15, 2024

westonruter Nov 15, 2024

felixarntz Nov 15, 2024 •

edited

Loading

westonruter Nov 15, 2024

felixarntz Nov 15, 2024

felixarntz Nov 15, 2024

westonruter Nov 15, 2024

felixarntz Nov 15, 2024

westonruter Nov 15, 2024

felixarntz Nov 15, 2024

felixarntz Nov 15, 2024

westonruter Nov 15, 2024

westonruter Nov 17, 2024

Send post ID of queried object or first post in loop in URL Metric storage request to schedule page cache validation #1641

Send post ID of queried object or first post in loop in URL Metric storage request to schedule page cache validation #1641

Conversation

westonruter commented Nov 10, 2024 • edited Loading

westonruter commented Nov 11, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

felixarntz Nov 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

felixarntz Nov 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Nov 14, 2024

felixarntz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

felixarntz Nov 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

westonruter commented Nov 10, 2024 •

edited

Loading

felixarntz Nov 13, 2024 •

edited

Loading

felixarntz Nov 15, 2024 •

edited

Loading

felixarntz Nov 15, 2024 •

edited

Loading