Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add getINP to web-vitals.js #213

Merged
merged 1 commit into from
Apr 21, 2022
Merged

Conversation

mmocny
Copy link
Member

@mmocny mmocny commented Apr 3, 2022

I updated to the latest and greatest based on various snippets for calculating a responsiveness metric (Interaction to Next Paint (INP)) using Event Timing API.

So far, I have only tested in Chrome, and have not added tests, so this likely needs more work.

Below are some notes about decisions made:

  • I decided to register with a durationThreshold of 16ms to increase the frequency and accuracy of reports. But it may be a good idea to listen only to long events (104ms default, for example).

  • I decided it was better to include ALL entries which are the INP or worse, rather than just the INP entry

    • ...we already have this list, and metric has an entries[] array, so easy enough. This is useful for tracking all worst-durations.
    • However, I still only report 1 entry per interaction.
    • An alternative would be to report all entries with the same interactionID, or all entries that presented in the same frame, or some combination of all of the above.
  • I only report new INP when the metric value changes. We could change to report whenever entries[] changes.

  • interactionCount is an estimate based on a convention specific to Chrome

  • INP can theoretically change outside of new entries firing, (e.g. interactionCount changes but PO doesn't observe an interaction that was under durationThreshold).

    • Even when interactionCount lands, we won't be able to watch for interactionCount changes.
    • We can at least update the value when page is hidden, but it may be nice to have some to force a calculation (i.e. before a beacon update).
    • We also can't reliably support reportAllChanges for this reason.
    • This may be a signal that we want a new PO entry type for all INP changes, akin to first-input and largest-contentful-paint, and not just use event timing.
  • This version of getINP() uses the web-vitals.js convention of having 1 handler per entry (dictated by the observe() helper).

    • However, sometimes we can have more than one INP metric change per PO callback, because entries are in order of (type?), not duration. This may be the first metric where this is true (others metrics report once per frame, though I'm not sure how often PO reports multiple frames per callback).
    • It feels weird to get multiple getINP callbacks in the same frame.
    • We can fix this by either:
      • just sort entries by timeStamp in the observe() helper, or
      • change the observe or bindReporter() helpers to only report once per PO callback
  • reportAllChanges right now only supports reporting new metric values, every time they change.

    • However, for INP, it could sometimes be useful to record every single interaction, even when the current INP metric value itselfs doesn't change (i.e. to visualize interactions on a timeline). I think CLS may similarly benefit.
    • Some ideas to fix:
      • Report a metric change whenever any new entry is seen. Not much value over just using PO directly.
      • Add a reportAllInteractions(), and then implement getINP() in terms of that. Offers value above using PO directly.

@google-cla
Copy link

google-cla bot commented Apr 3, 2022

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

For more information, open the CLA check for this pull request.

@philipwalton
Copy link
Member

Hey @mmocny thanks for working on this! I've just started to take a look at this PR, and I have a few initial questions:

interactionCount is an estimate based on a convention specific to Chrome

Is this more accurate than your previous method based on event counts? Looking at the implementation it seems like this would only work if the first interaction is a long one (because it needs to get minKnownInteractionId from a performance entry, right?). I'm mainly asking because I think we probably want to raise the durationThreshold (see next question), but I'm concerned that would break this estimate. WDYT?

I decided to register with a durationThreshold of 16ms to increase the frequency and accuracy of reports. But it may be a good idea to listen only to long events (104ms default, for example).

What are the performance implications of this? This seems like it would require code to run on almost every interaction, and I don't have a sense for how bad that would be.

What if we picked a number like 32, to catch interactions that took more than 1 frame (and ignore interactions that are 1 frame or faster)?

This also raises the question of what we do about pages with no interactions (or at least no reported interactions). What does Chrome do in these cases? I suspect the use of eventCounts may be more accurate for this case as well?

An alternative would be to report all entries with the same interactionID, or all entries that presented in the same frame, or some combination of all of the above.

I think this would be my preference, and it would be consistent with what we do for CLS (we only report the entries that were part of that session window). Reporting the entries from that interaction seems most similar.

Of course, I'm now realizing that's what we do for CLS is not consistent with what we do for LCP, since with LCP we report all entries that were considered LCP at any point in the navigation.

I think ideally we could report all entries that contribute to the current duration, and then we could also release some debug code to break that down into delay, processing, and presentation. My concern is that if we report more entries than those that directly contribute to the current INP value, it'll make debugging harder.

WDYT? Also, I thought you said you had some tricks to segment out entries by frame, did that pan out?

I only report new INP when the metric value changes. We could change to report whenever entries[] changes ... we also can't reliably support reportAllChanges for this reason.

Good catch, I hadn't thought about this case.

I think it probably does make sense to report whenever the value or entries changes, but I want to think about it a bit more. I don't think this will affect most people, though, because the default will be to just report on hidden.

for INP, it could sometimes be useful to record every single interaction, even when the current INP metric value itselfs doesn't change (i.e. to visualize interactions on a timeline). I think CLS may similarly benefit ... not much value over just using PO directly.

Yeah, I think something like this would be valuable, but at that point I'd probably recommend just using PerformanceObserver directly. The main goal of this library is to enable RUM for these metrics in a way that most closely matches the data in CrUX.

That being said, I'd definitely be open to it if we get feedback that consumers of this library want that type of capability.

@philipwalton
Copy link
Member

Re: my comment:

I'm mainly asking because I think we probably want to raise the durationThreshold (see next question), but I'm concerned that would break this estimate. WDYT?

I took a look at your updated gist and see the comment, "this estimate does well on desktop, poorly on mobile":

function estimateInteractionCount() {
  const drag = performance.eventCounts.get('dragstart');
  const tap = performance.eventCounts.get('pointerup');
  const keyboard = performance.eventCounts.get('keydown');
  // This estimate does well on desktop, poorly on mobile
  // return tap + drag + keyboard;

  // This works well when PO buffering works well
  return (maxKnownInteractionId > 0) ? ((maxKnownInteractionId - minKnownInteractionId) / 7) + 1 : 0;
}

So, if we decide not to use durationThreshold:0 for the main runtime logic, another option would be to create a one-time PO just to get minKnownInteractionId and disconnect it after that (at least until interactionCount lands). WDYT?

@philipwalton philipwalton changed the base branch from main to next April 18, 2022 21:37
@mmocny
Copy link
Member Author

mmocny commented Apr 21, 2022

this estimate does well on desktop, poorly on mobile

I think on Desktop I got it to almost 100% to match, except for:

  • I think there were some occasional issues with drag
  • click is always considered a unique TAP interaction by Chrome, even when it is due to keyboard.

On mobile, there was a big flaw with touch scrolling. It seems that if you register a pointercancel listener, you will get increased eventCount for it-- but if you don't register, you wont. So you count scrolling as taps :(. That's seems unfortunate, but its consistent on Firefox as well, and it seems a showstopper.

I really hoped eventCount would be good enough so we can provide a FF polyfill. Right now, I think it would be better to just stick to Single-Worst-INP instead of HighP if we wanted to do that.

...create a one-time PO just to get minKnownInteractionId and disconnect it after that (at least until interactionCount lands). WDYT?

That was going to be my suggestion as well!

What are the performance implications of this?

TBH I don't really know. The PO callbacks are only called once per frame and the amount of processing is not very large. I think the potential implications are more along the lines of "queueing tasks when there would otherwise be idle time"... while that would only happen where there were actually events, there are a lot of different event types (i.e. hover) that fire and which we skip when looking for interactionIds.

I think ideally we could report all entries that contribute to the current duration, and then we could also release some debug code to break that down into delay, processing, and presentation. My concern is that if we report more entries than those that directly contribute to the current INP value, it'll make debugging harder.

WDYT? Also, I thought you said you had some tricks to segment out entries by frame, did that pan out?

I still find segmenting by frame to be the best option -- and after spending a few weeks on some Responsiveness demos, it seems Annie fully agrees. This file was the last update to that approach, but there are probably a few changes I would add now.

I do think it may be a bit confusing to report entries that don't have an interactionId, or which have a different interactionId... but I agree with your point: we can just release some debug code to go alongside.

@philipwalton
Copy link
Member

Thanks @mmocny! As we discussed earlier today, I'm going to merge this as is to the next branch and then we can iterate on the remaining issues in a separate PR.

@philipwalton philipwalton merged commit 8fddfc8 into GoogleChrome:next Apr 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants