Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internal state of the page (e.g. queried posts or queried object) should factor into whether a stored URL metric is fresh #1466

Open
westonruter opened this issue Aug 13, 2024 · 5 comments
Labels
[Plugin] Optimization Detective Issues for the Optimization Detective plugin [Type] Enhancement A suggestion for improvement of an existing feature

Comments

@westonruter
Copy link
Member

westonruter commented Aug 13, 2024

This was originally brought up in #878 (comment).

There is a TODO currently in od_get_normalized_query_vars():

* TODO: For non-singular requests, consider adding the post IDs from The Loop to ensure publishing a new post will invalidate the cache.

Consider a homepage that shows 10 posts. None of the posts have featured images. URL metrics have been fully populated for the page in this state. Then someone publishes a new post that includes a featured image. All of a sudden there is a new image on the page which may be LCP, and yet the URL metrics have no record of this element. Nevertheless, by default the URL metrics are considered fresh by default for 1 day, which means it could take a day for that new image to start getting optimized.

One way to deal with this would be to include the IDs of the posts in the query among the normalized query vars. This could similarly handle the case where the static front page is changed. For example, in od_get_normalized_query_vars():

if ( is_singular() ) {
    $normalized_query_vars['queried_object_id'] = get_queried_object_id();
} else {
    $normalized_query_vars['queried_post_ids'] = join( ',', wp_list_pick( $GLOBALS['wp_query']->posts, 'ID' ) );
}

When the query vars change, this results in an entirely separate od_url_metrics post being used, as the normalized query vars are passed into od_get_url_metrics_slug() to obtain a MD5 hash which is used as the post_name.

Nevertheless, this seems somewhat of an abuse of the normalized query vars here, since these new "query vars" aren't actually part of the query at all. In reality, a new URL metrics post shouldn't be created, as it should instead update the existing post. If it creates a new post, then the old one will stick around until it is garbage-collected after a month.

Instead of creating a new URL metrics post entirely, perhaps there should be some kind of "ETag" which is stored with each URL metric. This also is proposed in #1424 where the list of registered tag visitors should factor into the composition of the ETag for a URL metric so that when a new tag visitor is registered (e.g. after a plugin is activated), then all URL metrics would immediately be considered stale so that new ones can be collected.

This is also related to another TODO:

// TODO: When optimizing above, if we find that there is a stored LCP element but it fails to match, it should perhaps set $needs_detection to true and send the request with an override nonce. However, this would require backtracking and adding the data-od-xpath attributes.

Instead of having to manually collect the various parts of page state that may affect the elements stored in URL metrics, we could instead know by time we finish iterating over the document whether the elements we saw correspond to the elements which were stored in the most recent URL metric. If not, then the URL metric should be considered stale and new URL metrics should be collected. But there is a bit of a chicken-and-egg problem here because the REST API endpoint will reject any requests when the URL metrics are fresh. Since the REST API endpoint doesn't know the result of iterating over the page during optimization, it doesn't know that new URL metrics should in fact be allowed. Also, if the URL metrics were all fresh then the requisite data-od-xpath attributes aren't added to the output for detect.js to use when collecting the URL metrics. So once it gets to the end of iterating over all the tags, it would have to go back over them from the beginning and add these attributes. Or it could add them unconditionally on the first pass, and then strip them all out when not needed during a second pass (but this seems not ideal to have to do with every response). This is opposed to the current approach to conditionally add them:

if ( $did_visit && $needs_detection ) {
$processor->set_meta_attribute( 'xpath', $processor->get_xpath() );
}

Several things to ruminate on here.

@westonruter westonruter added [Type] Enhancement A suggestion for improvement of an existing feature [Plugin] Optimization Detective Issues for the Optimization Detective plugin labels Aug 13, 2024
@github-project-automation github-project-automation bot moved this to Not Started/Backlog 📆 in WP Performance 2024 Aug 13, 2024
@westonruter westonruter moved this from Not Started/Backlog 📆 to Definition ✏️ in WP Performance 2024 Aug 13, 2024
@westonruter
Copy link
Member Author

The current template that was selected should also be considered as part of the ETag. If a theme all of a sudden is updated so that there is a category.php template file that overrides the generic category.php then likely this should result in the URL metrics being considered stale.

@westonruter
Copy link
Member Author

Also, the queried object would be considered part of this ETag so that the post_modified is incorporated. This would mean that when a post is updated, it's associated URL metrics would immediately become stale and new URL metrics would be collected.

@westonruter
Copy link
Member Author

A benefit of doing this is we can increase the TTL for how long a URL Metric is considered fresh. If URL Metrics will automatically become stale once a post is changed, then the TTL can be extended from a day to a week or longer.

@westonruter
Copy link
Member Author

To start with we can build up an array of relevant information about a page, including the queried object ID, the posts in the loops, the template being served, the active theme, the registered tag visitor ids, and so on, and then create an md5 hash of the serialized array and this is the ETag

Note that the ETag will need to be added to the list of args that goes into computing the HMAC, the same as the URL, slug, and cache_purge_post_id are being handled.

The ETag will be sent as a REST API arg similar to the slug, cache_purge_post_id, and HMAC.

@westonruter
Copy link
Member Author

westonruter commented Nov 21, 2024

The ETag will need to be added as a new root level property of the URL Metric schema. It can be optional for now so as to avoid invalidating all current URL Metrics, although in the future we can make it required once all existing URL Metrics will have it populated.

In the current logic which checks for whether a URL Metric is stale based on the timestamp, we'll then also need to check if the URL Metric's ETag matches the ETag for the current page. If they don't match then the URL Metric should also be considered stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Plugin] Optimization Detective Issues for the Optimization Detective plugin [Type] Enhancement A suggestion for improvement of an existing feature
Projects
Status: Definition ✏️
Development

No branches or pull requests

1 participant