-
Notifications
You must be signed in to change notification settings - Fork 9.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
new_audit: add TTI Companion Metric to JSON #8975
Conversation
Also, looking at the pr title lint - what's a good title for this? report, metric, or misc? |
Awesome thanks so much for this @deepanjanroy!
I believe @hoten come up with first ever pronounceable metric acronym if he'd like to share it :)
You should be able to just remove the
Big +1 👍
IMO the only way we're going to be able to do this is something like the old fixed cost curve from eFID. That has the nice property that dragging out the trace doesn't end up affecting things much but tends to underpenalize bad experiences that happen very late, so I'm not sure how to resolve that issue really. On the one hand, pushing out the bad experience means people are indeed much less likely to experience it but if it can wait that long it probably should have just been broken up and run earlier in the first place soooooo... 🤷♂
I think we'll go with |
Hm the downside of fixed cost curve is assuming the same distribution of all sites, and building a more sophisticated model (maybe taking into account other times when key big objects are first painted as opposed to just the first contentful paint - I think you suggested something like that) is not easy. For now I think taking it until TTI is probably the best tradeoff.
Done - thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks great @deepanjanroy awesome job navigating the confusing world of simulated/observed LH metrics! 🎉 👏 👏
lighthouse-core/audits/metrics/cumulative-long-queuing-delay.js
Outdated
Show resolved
Hide resolved
lighthouse-core/audits/metrics/cumulative-long-queuing-delay.js
Outdated
Show resolved
Hide resolved
lighthouse-core/computed/metrics/cumulative-long-queuing-delay.js
Outdated
Show resolved
Hide resolved
lighthouse-core/test/audits/metrics/cumulative-long-queuing-delay-test.js
Outdated
Show resolved
Hide resolved
const settings = {throttlingMethod: 'provided'}; | ||
const context = {options, settings, computedCache: new Map()}; | ||
|
||
return Audit.audit(artifacts, context).then(output => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: async/await
and expect
for new tests
though @brendankenny do you disagree with using expect
for all new tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
though @brendankenny do you disagree with using expect for all new tests?
no, it sounds good to me. I've (mostly) been matching the assert
or expect
used in existing test files just because I personally don't like when they're mixed, but totally 👍👍 for expect
in new test files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't realized I was mixing node assert and jest expects. Changed them all to jest expect since it has a nicer API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh wait sorry, still super nit but would prefer async/
await` for new tests too :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops forgot to do async await last time. Done.
lighthouse-core/test/computed/metrics/cumulative-long-queuing-delay-test.js
Outdated
Show resolved
Hide resolved
lighthouse-core/test/computed/metrics/cumulative-long-queuing-delay-test.js
Outdated
Show resolved
Hide resolved
|
||
it('only looks at tasks within FMP and TTI', () => { | ||
const events = [ | ||
// TODO(deepanjanroy@): Is there an interval data structure in lighthouse? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, not really at the moment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm ok. I'm going to punt on introducing that for now.
:) Time In Long Tasks (TILT). It's not comprehensive, but it gets the point across. |
So there's good news and bad news. 👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there. 😕 The bad news is that it appears that one or more commits were authored or co-authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that here in the pull request. Note to project maintainer: This is a terminal state, meaning the ℹ️ Googlers: Go here for more info. |
I love the acronym, but I worry it's too easy to interpret is as sum of long task durations, as opposed to sum of times beyond 50ms. I'm currently calling the "time beyond 50ms" as "long queuing delay region", and so the sum of all these regions are "cumulative long queuing delay". If we can find a better name for the "time beyond 50ms in tasks", a better name for the whole metric should follow from there. |
Updated everything except the proto test (working on that) so please take another look. As a tangent: I'm not sure if I've been spamming everyone here. I never used the new github review UI before, and I'm still figuring out how to batch comments properly; let me know if there are any conventions you follow. |
Thought about it a bit more... landed on an alternative to "long queuing delay region" - what about just "excessive"? Then we could do ETILT :) The excessive qualifies the time measurement, and suggests that it's not just a straight total.
Totally fine. |
|
||
const UIStrings = { | ||
title: 'Cumulative Long Queuing Delay', | ||
description: '[Experimental metric]. Sum of Task Lengths beyond 50ms, between ' + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This description seems ambiguous. Is it the sum of tasks that are beyond 50ms in length? Or the sum of task lengths minus 50ms?
Ideas:
Sum of task lengths that occur between FCP and TTI that delay additional task queuing by more than 50ms.
Or make it less technical?
Sum of tasks that delay additional task queuing. This is an estimate of task delay based on mainthread busyness.
<- don't know if thats really true 😆 just thinking out loud for ideas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm you're right. Reworded to "[Experimental metric] Total time period between FCP and Time to Interactive during which queuing time for any input event would be higher than 50ms.' Is that clear enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this LGTM from an HTTPArchive data gathering perspective! I'm super curious how the percentiles hold up there.
// was 0ms. These numbers include 404 pages, so rounding up the scoreMedian to 300ms and | ||
// picking 25ms as PODR. See curve at https://www.desmos.com/calculator/x3nzenjyln | ||
scoreMedian: 300, | ||
scorePODR: 25, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are some seriously fast pages 😮
// sites on mobile, 25-th percentile was 270ms, 10-th percentile was 22ms, and 5th percentile | ||
// was 0ms. These numbers include 404 pages, so rounding up the scoreMedian to 300ms and | ||
// picking 25ms as PODR. See curve at https://www.desmos.com/calculator/x3nzenjyln | ||
scoreMedian: 300, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this does feel a bit weird that a single 350 ms task immediately after FCP and total mainthread silence after that already gives you a 50
if we were to start breaking our 75/95
rule, I think I'd want to start it here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to 600/200. We now have 600ms of jank = 0.5, and 100ms of jank = .999. 350 ms task will have 300ms of jank and get a .885.
const settings = {throttlingMethod: 'provided'}; | ||
const context = {options, settings, computedCache: new Map()}; | ||
|
||
return Audit.audit(artifacts, context).then(output => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh wait sorry, still super nit but would prefer async/
await` for new tests too :)
|
||
const events = [ | ||
{start: 1000, end: 1110, duration: 110}, // Contributes 10ms. | ||
{start: 2000, end: 2100, duration: 100}, // Contributes 50ms. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't this only contribute 50 anyway, sans clipping?
I guess the whole idea of TTI occurring in the middle of a task isn't really possible though anyhow :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes it would. Changed a test case a little bit to test clipping better.
lighthouse-core/computed/metrics/lantern-cumulative-long-queuing-delay.js
Show resolved
Hide resolved
Co-Authored-By: Patrick Hulce <patrick.hulce@gmail.com>
…ng-delay.js Co-Authored-By: Patrick Hulce <patrick.hulce@gmail.com>
…ng-delay.js Co-Authored-By: Patrick Hulce <patrick.hulce@gmail.com>
…lay-test.js Co-Authored-By: Patrick Hulce <patrick.hulce@gmail.com>
…delay-test.js Co-Authored-By: Patrick Hulce <patrick.hulce@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed comments and fixed tests.
I made a bikeshedding doc here for the name of the metric (internal because we haven't had much public discussions around this metric yet). Feel free to edit directly and discuss there, but I don't want to block landing the metric over naming it since it's all very experimental right now.
Also, how do I make the cla/google
check happy?
Please take a look!
|
||
const UIStrings = { | ||
title: 'Cumulative Long Queuing Delay', | ||
description: '[Experimental metric]. Sum of Task Lengths beyond 50ms, between ' + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm you're right. Reworded to "[Experimental metric] Total time period between FCP and Time to Interactive during which queuing time for any input event would be higher than 50ms.' Is that clear enough?
// sites on mobile, 25-th percentile was 270ms, 10-th percentile was 22ms, and 5th percentile | ||
// was 0ms. These numbers include 404 pages, so rounding up the scoreMedian to 300ms and | ||
// picking 25ms as PODR. See curve at https://www.desmos.com/calculator/x3nzenjyln | ||
scoreMedian: 300, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to 600/200. We now have 600ms of jank = 0.5, and 100ms of jank = .999. 350 ms task will have 300ms of jank and get a .885.
lighthouse-core/computed/metrics/lantern-cumulative-long-queuing-delay.js
Show resolved
Hide resolved
const settings = {throttlingMethod: 'provided'}; | ||
const context = {options, settings, computedCache: new Map()}; | ||
|
||
return Audit.audit(artifacts, context).then(output => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops forgot to do async await last time. Done.
|
||
const events = [ | ||
{start: 1000, end: 1110, duration: 110}, // Contributes 10ms. | ||
{start: 2000, end: 2100, duration: 100}, // Contributes 50ms. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes it would. Changed a test case a little bit to test clipping better.
Ping on this: Anything else I need to do for merging? |
Sorry @deepanjanroy for the silence! My approval still stands, ideally it gets a onceover from another "metricy" person on the team. @brendankenny @paulirish any takers? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some superficial style/nits feedback, but looks great to me.
lighthouse-core/test/audits/metrics/cumulative-long-queuing-delay-test.js
Outdated
Show resolved
Hide resolved
const output = await Audit.audit(artifacts, context); | ||
|
||
expect(output.numericValue).toBeCloseTo(48.3, 1); | ||
expect(output.score).toBeCloseTo(1, 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add a comment here that it's expected to be // very nearly 1
or something if it really isn't expected to reach 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It actually does reach 1. Changed to .toBe(1)
.
lighthouse-core/computed/metrics/cumulative-long-queuing-delay.js
Outdated
Show resolved
Hide resolved
static getEstimateFromSimulation(simulation, extras) { | ||
// Intentionally use the opposite FCP estimate, a more pessimistic FCP means that more tasks are | ||
// excluded from the CumulativeLongQueuingDelay computation, so a higher FCP means lower value | ||
// for the same work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this gets a bit confusing. What about something like
// Intentionally use the opposite FCP estimate. A pessimistic FCP is higher than an optimistic FCP,
// which means more tasks are excluded from the CumulativeLongQueuingDelay computation. So a more
// pessimistic FCP gives a more optimistic CumulativeLongQueuingDelay for the same work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A pessimistic FCP is higher than an optimistic FCP
is that strictly true? I don't know :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is that strictly true?
pessimistic FCP is strictly higher than or equal to optimistic FCP :)
It includes everything in optimistic plus the script initiated render-blocking priority requests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reworded this comment (and the comment below about TTI.)
: extras.interactiveResult.pessimisticEstimate.timeInMs; | ||
|
||
// Require here to resolve circular dependency. | ||
const CumulativeLongQueuingDelay = require('./cumulative-long-queuing-delay.js'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: move down to above the return
statement`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm using the CumulativeLongQueuingDelay.LONG_QUEUING_DELAY_THRESHOLD
constant unfortunately, so this is the furthest I can move it down.
}); | ||
} | ||
|
||
return events.sort((a, b) => a.start - b.start); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it makes some sense to always be sorted, but if it really is a perf concern, I don't think anything depends on it being that way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah you're right - this sorting is not necessary. I remember I was doing the clipping in a different way initially which relied on sorting, but not anymore, and I forgot to change this. Removed the sorting.
Addressed comments. Do I need to do anything about the cla/google error? |
No, that's an annoying thing where it doesn't always recognize our changes through the github "suggestion" UI as coming from us, even though the author and committer are you and the Still LGTM! |
PTAL - this implements the Sum of queueing time beyond 50ms TTI Companion Metric. A few questions / things I want to call out: