Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Statistical estimation of the lighthouse score distribution parameters: Covariance matrices? #12014

Closed
koraa opened this issue Jan 28, 2021 · 2 comments
Assignees

Comments

@koraa
Copy link

koraa commented Jan 28, 2021

Hi,

in my continuing quest (see #11570 for previous work) to statistically model the lighthouse performance score I have found that many of the individual performance scores are correlated. This graph should immediately make clear what I mean. Since a lot of of performance measurements are proxies for cpu performance and such, so this is not very surprising.

The graph above is for the correlation of individual scores on an empty page (lighthouse score close to one), but this holds even for lighthouse score which are relatively close to average (0.2-0.8) as this graph shows.

Currently I am estimating the distribution of the mean for each individual score essentially by application of the central limit theorem, derive confidence intervals, combine these under the assumption that variables are uncorrelated. The results are nice, but they could be improved by a structured treatment of score correlations.

So I am wondering, is there any previous work you can refer me to in regards to the correlation between scores? Any insight you could offer as to the magnitude of the correlation?

Optimally there would be a correlation matrix available; I could generate one over the data I have available but I suspect the correlation will be specific to my test data; a wider population of websites would have to be used…

@patrickhulce
Copy link
Collaborator

Always super interesting to see what you're up to in this area :)

Depending on your exact goals, the global correlations might not be all that useful to you. The correlation between performance metrics changes drastically depending on choices the page makes. Some examples...

  • When the site does all its work to reach FCP (a traditional HTML/CSS render-blocking page)...
    • The correlation between FCP, LCP, Speed Index, and TTI will be 1
    • The correlation between TBT/CLS and everything else will be essentially 0
  • When the site follows a client-side rendering model...
    • The correlation between FCP and TTI will be far weaker (you'll always have the permanent correlation of developers that build sites with a poor performance tend to build sites with poor performance across the board and the fact that TTI = max(FCP, last CPU work)).
    • The correlation between TBT and TTI will be fairly positive (old investigations I don't remember well and can't find now were somewhere in the ~0.4-0.6 range I think?)

If you're still interested in global correlations from a broader dataset, I'd suggest looking into querying HTTPArchive as a starting point. If you end up with any big takeaways

@paulirish
Copy link
Member

Since this thread, we've had additional research in this area, though it's not immediately linkable. I think this thread is complete, but if there is additional interest I can rustle up some analyses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants