-
Notifications
You must be signed in to change notification settings - Fork 9.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ability to control precision of scoring values #11570
Comments
Hi @koraa super cool work on harmonicabsorber!! that's so rad. a few things i wanted to add. web performance measurement is tricky and there's so many sources of variance/jitter that it's nigh impossible to expect repeatedable results. (using webpagereplay would definitely be an ingredient in the best repro setup). We have some more docs on this topic at https://github.com/GoogleChrome/lighthouse/blob/master/docs/variability.md As for clamping.. we clamp just to give a signal of our significant digits. there's enough variance that having higher precision just becomes misleading. that said, we do not clamp the metric values.. only the scores. so for all those audit values, i would recommend looking at the (While discussing this we also considered that perhaps it'd be useful to our scoring calculations.. eg a method to calculate the 0-1 score of a given LCP/whatever value. We currently don't have this quite available, but it's possible to explore..) |
Hi @paulirish,
Thanks for pointing that out! Yes I am aware of that, our goal at the moment is not to reduce variance and instead develop appropriate statistical models to characterize the distribution and produce an estimate of the score along with error bars… However, I agree that performing estimation on measurements separately and then compute the final score from these estimations is better than characterizing score distributions. To that end, is it correct to assume that all scores use the same method of generating the score from the scores log-normal cdf or are there differences between the scores?
(Not that it matters much; but don't you lose a bit of precision by clamping twice? Clamping subscores and then clamping the weighted average again? Not that I have checked anything here, but my feeling here is that this would introduce, some uncomfortable nonlinearities in cases where many sub scores are close to |
Yes, all performance metric scores use the log-normal CDF method with different control points for each metric-environment that are defined in the audit's lighthouse/lighthouse-core/audits/metrics/first-contentful-paint.js Lines 39 to 47 in e9d7224
lighthouse/lighthouse-core/audits/audit.js Lines 71 to 83 in e9d7224
We do, but as you noted, it's hard to see and doesn't matter much compared to the other sources of noise in this data :) All statistical analysis attempts we're aware of use the underlying metric values for the reasons above. Please let us know if there are any particular utilities Lighthouse could expose that would help your projects in this area! Super exciting to see this type of work being done independently and would love to share notes :) |
in case it helps, we do all sorts of math-y things for the scoring calculator here: https://github.com/paulirish/lh-scorecalc/blob/master/script/math.js |
The overall scoring and audit code each use a function
clampTo2Decimals
to clamp scored values to two decimals.Having the ability to change the number of digits (or even just deactivate the clamping altogether) would be useful and give statistically minded users the ability to perform more in depth analysis of the produced scores.
E.g. the use case I would find this useful because I am currently optimizing a test setup for reduced jitter; I can still measure jitter with the reduced precision, but due to the clamping I need many more samples to properly quantify the amount of jitter
I would be very open to creating a PR for this, provided there is interest in merging such a feature…
The text was updated successfully, but these errors were encountered: