Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replacing navigationId with getRelevantNavigation() #12

Closed
tunetheweb opened this issue Jan 24, 2023 · 18 comments
Closed

Replacing navigationId with getRelevantNavigation() #12

tunetheweb opened this issue Jan 24, 2023 · 18 comments

Comments

@tunetheweb
Copy link
Contributor

tunetheweb commented Jan 24, 2023

As discussed in #10 the navigation details are needed to get both the startTime of the navigation (so the metrics can be reported relative to the soft navigation time), and the URL (so the events can be used to record analytics for the appropriate URL rather than the current one).

At present this is achieved with the new navigationId attribute of the event, which can then be mapped to the navigation details in one of two ways:

  1. By using the PerformanceEntry API.
  2. By recording the soft-navigation entries via a PerformanceObserver and using this array to look up the event.

The first option is easier, but depends on making certain assumptions on the navigationId (for example >= 2 means it's a soft-navigation) and looking up based on this:

navEntry = performance.getEntries("soft-navigation")[navigationId - 2]
           || performance.getEntries("navigation")[0]

"Magic numbers" are generally to be avoid on the web platform, and this assumption may then be broken in the future if new navigation entries appear.

A perhaps better suggestion is to replace the navigationId with a getRelevantNavigation() method on the events that provides direct access to the associated navigation entry to avoid making any of those assumptions.

If this direction is taken, we may need to provide an overlap period where both navigationId and getRelevantNavigation() both exist to give anyone experimenting with this API time to migrate to this new proposal.

@yoavweiss
Copy link
Collaborator

^^ @clelland @noamr @mmocny

@clelland
Copy link
Collaborator

Agreed that navigationId has some ergonomics issues. You probably shouldn't be assuming that it matches array index here -- it's a counter right now, but there had been talk of making it a UUID instead. Also, with the possibility of bfcache restorations in addition to SPA navigations, you might find a particular navigation id in either buffer.

With navigationId, I think that the future-proof way of getting an associated navigation timeline entry would be to get (or accumulate with a performanceObserver, if it might overflow the buffer) navigation entries of all types, and then scan them for an entry with a matching ID.

var navs = []
new performanceObserver(entries => {
  navs.push(entries.getEntries());
}).observe(types: ['navigation','soft-navigation','back-forward-cache-restoration']);

navEntry = navs.find(el => navigationId == el.navigationId);

getRelevantNavigation is better, but I'm worried that it's not going to be performant when your goal is to cluster, say, all resource timing entries by their navigation -- or to get all performance entries associated with a given navigation. In those cases, you'd have to call getRelevantNavigation on each entry in turn and cache the results.

What about just making navigation a property on each timeline entry, being a reference to the appropriate navigation timeline entry object? That would give you direct access to it if you need it, and you can also use object equality testing to group related entry objects by navigation, when you want to do that.

@tunetheweb
Copy link
Contributor Author

With navigationId, I think that the future-proof way of getting an associated navigation timeline entry would be to get (or accumulate with a performanceObserver, if it might overflow the buffer) navigation entries of all types, and then scan them for an entry with a matching ID.

This is quite annoying to get and link up to the performance events (FCP, LCP...etc.). We need to observe the navigations, store them, then process the performance events to match them, then potentially finalise some of them based on the new navigations we saw earlier. It also feels like we're moving the complexity of this from the browser, to the developer. So the browser memory and buffers are handled better, but the application memory and browser buffers are potentially much worse! I feel like this should be handled by the browser to take away that pain and ensure it's implemented consistently.

getRelevantNavigation is better, but I'm worried that it's not going to be performant when your goal is to cluster, say, all resource timing entries by their navigation -- or to get all performance entries associated with a given navigation. In those cases, you'd have to call getRelevantNavigation on each entry in turn and cache the results.

What about just making navigation a property on each timeline entry, being a reference to the appropriate navigation timeline entry object? That would give you direct access to it if you need it, and you can also use object equality testing to group related entry objects by navigation, when you want to do that.

A navigation property works for me too!

@noamr
Copy link

noamr commented Jan 25, 2023

getRelevantNavigation is better, but I'm worried that it's not going to be performant when your goal is to cluster, say, all resource timing entries by their navigation -- or to get all performance entries associated with a given navigation. In those cases, you'd have to call getRelevantNavigation on each entry in turn and cache the results.

What about just making navigation a property on each timeline entry, being a reference to the appropriate navigation timeline entry object? That would give you direct access to it if you need it, and you can also use object equality testing to group related entry objects by navigation, when you want to do that.

What's the difference between the two? We can choose either API based on ergonomics and choose whether we cache/store the result or not based on performance

@clelland
Copy link
Collaborator

I think the difference is primarily the overhead of a function call any time you want to compare to performance entries, to see if they are part of the same navigation. With either navigationId or navigation, you can just compare the values directly; with getRelevantNavigation you need to do two extra function calls per comparison.

@noamr
Copy link

noamr commented Jan 26, 2023

I think the difference is primarily the overhead of a function call any time you want to compare to performance entries, to see if they are part of the same navigation. With either navigationId or navigation, you can just compare the values directly; with getRelevantNavigation you need to do two extra function calls per comparison.

The function can return a cached value, and OTOH entry.navigation can be a getter function. There is no perf overhead connected with the shape of the API here.

@paulirish
Copy link

-1 on exposing a navigation object. I've definitely come to expect all perfEntries are serializable and standalone. Referencing any parent/child relationship feels weird, and we'd have to 'clean' entries before serializing them to beacon.

I don't think the developer ergonomics here are much of a problem..

Here's my quick take on organizing all entries according to navigationid..

// collect all the things.
const allEntries = [];
const obs = new PerformanceObserver(list => allEntries.push(...list.getEntries()));
for (const type of PerformanceObserver.supportedEntryTypes) {
  obs.observe({type, buffered: true, includeSoftNavigationObservations: true});
}

const navigationEntries = allEntries.filter(e => ['navigation','soft-navigation','back-forward-cache-restoration'].includes(e.entryType));

const navigationIdToEntries = allEntries.reduce((map, curr) => map.set(curr.navigationId, [...map.get(curr.navigationId) || [], curr]), new Map());

And in action:
image

And if you want the pageUrl to associate... something like:

const pageUrlToEntries = [...navigationIdToEntries.entries()].map(([navId, entries]) => ([navigationEntries.find(e => e.navigationId === navId).name, entries]));

image

@noamr
Copy link

noamr commented Feb 18, 2023

@paulirish I think that's a good argument.
We could still have a better solution than a magic number though:

  • A UUID, with perhaps an empty string or "root" or something for the first navigation
  • Saving the navigationTime which would match the startTime of the navigation entry (and perhaps expose getEntriesByStartTime(startTime, types)?
  • Be explicit about what that number mean, even call it an index instead

@mmocny
Copy link

mmocny commented Feb 19, 2023 via email

@tunetheweb
Copy link
Contributor Author

I still feel that, while it's possible, to observe the individual events as per @paulirish 's code above, it still moves a lot of the onus on to the developer.

At the moment that code builds an ever-increasing allEntries array and we're leaving it to the developer to decide if/when to clear that down to prevent what is basically a memory leak. That feels like something we should be discouraging, rather than encouraging. And if it's too difficult for the browser to track this appropriately, then why do we feel developers will do this better?

Contrast that to most current performance observer methods usage IMHO (for example, in web-vitals.js library) where the observation is dealt with and then no longer needs to be referenced anymore so it's discarded. The observation includes everything needed to deal with it appropriately (with some small exceptions like INP, where the store the last 10 interactions - specifically to reduce memory limit).

With some SPAs being very long lived, and potentially interaction-heavy, the memory implications of storing all the old observations is not small.

@noamr
Copy link

noamr commented Mar 1, 2023

I have another suggestion:

  • Use the name property of the soft navigation entry. It would be soft-navigation-${navStartTimestamp}
  • Give the other entries a navigationName attribute referring to it
  • Get the navigation using getEntriesByName()
  • If we really want to, we can still have getRelevantNavigation() helper that does this, but it does not replace doing this with plain objects.
    WDYT?

@yoavweiss
Copy link
Collaborator

@clelland - thoughts on @noamr's suggestion?

@clelland
Copy link
Collaborator

Catching up here --

I expect we'd need to do a similar thing with back-forward-cache restores; with a name of back-forward-cache-restore-${timestamp}, and use it the same way. We'd be creating an implicit requirement that any future 'navigation-ish' entries have a name defined.

The problem, I expect, is that you'd have to special-case the navigationName for entries relating to the initial (hard) navigation. We can't use this scheme for the performancenavigationtiming entry's name, since that's already defined to be the URL of the current page. (And I'm not sure how likely it is that other entries -- resourcetiming, e.g. -- will have the same name). I think we'd need to have to leave it out, or set it to an empty string, or use some other flag that directs you to not use getEntriesByName, but instead getEntriesByType('performanceNavigationTiming')[0]

@clelland
Copy link
Collaborator

I think that the main thing @noamr's suggestion in #12 (comment) allows is the reuse of getEntriesByName, rather than having to expose a new API (previously proposed as getEntriesByStartTime). The trouble is that we can't uniformly use name, since it's already defined for PerformanceNavigationTiming objects.

At the same time, @tunetheweb and @paulirish's point that referencing actual timeline event objects easily leads to memory leaks, and changes the independent nature of the objects, is well-taken.

If we just remove the requirement that the navigation id match the name property of some other entry, then it's about equivalent to the suggestion of making the navigation id into a UUID (or something slightly more semantic, like ${entry-type}-${timestamp}). A getEntriesByNavigationId would be ~ as much new API surface as getEntriesByStartTime.

@yoavweiss
Copy link
Collaborator

getEntriesByNavigationId would also have the added ergonomic advantage of grabbing all the relevant entries for a certain navigation from the timeline.

@tunetheweb
Copy link
Contributor Author

Just to update on this. This is the code I'm suggesting in the article (and is basically what I have implemented in web-vitals):

const softNavEntry =
  performance.getEntriesByType('soft-navigation').filter(
    (entry) => entry.navigationId === navigationId
  )[0];
const hardNavEntry = performance.getEntriesByType('navigation')[0];

const navEntry = softNavEntry || hardNavEntry;

const pageUrl = navEntry?.name;
const startTime = navEntry?.startTime;

This is:

  • reasonably ergonomic
  • is backword compatible with the original integer and the newer UUID/string implementation
  • let's the browser handle the entries (so avoiding the memory leak issue of developers not considering that)

The one downside is that getEntriesByType is limited by the max buffer size, but since we should be processing the entries as they come in, I'm not sure that's a real issue. And the reason they are limited is presumably to avoid memory leak issue anyway so the alternative of observing these and adding to an unlimited entry is not appealing to me!

Also this doesn't use the original hard navigation UUID (though it could, but would need an extra fallback anyway).

I'm not sure we need a getEntriesByNavigationId to be honest - I can't really think of a use for it.

A getNavigationEntryById to wrap the above code might be useful however. It's obviously not a lot of code, but would allow this to be more easily expanded (e.g. if/when back/forward navigations come along) rather than each implementation knowing which types to include. Then again, they likely need to know which "navigation" types they handle so maybe that's less of an issue.

@mmocny
Copy link

mmocny commented Sep 6, 2023

but since we should be processing the entries as they come in, I'm not sure that's a real issue

Because there could be a new Perf observer from any new script injected at any time, buffers are never flushed or reset, so it really doesn't matter when you start observing. If there is a chance of filling, they will fill and then the only way to get the data is to have a PerformanceObserver already registered.

the reason they are limited is presumably to avoid memory leak

Actually buffer limits are there because this is memory which is allocated on each and every page load even in cases where there is never any observer. So, we limit the amount of resource dedicated to this speculative feature. As soon as an Observer is registered, we can save more data, but the global buffer doesn't actually grow when that happens (since its a static array), instead it gets places into FIFO queue for each observer.

I'm not sure we need a getEntriesByNavigationId
A getNavigationEntryById to wrap the above code might be useful however

Interesting. Besides the ergonomics, I guess the buffering of entries does imply you should already be observing them and doing a reverse lookup as you suggest...

@yoavweiss
Copy link
Collaborator

I think we can close this in favor of w3c/performance-timeline#182

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants