Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to control precision of scoring values #11570

Closed
koraa opened this issue Oct 16, 2020 · 4 comments
Closed

Ability to control precision of scoring values #11570

koraa opened this issue Oct 16, 2020 · 4 comments

Comments

@koraa
Copy link

koraa commented Oct 16, 2020

The overall scoring and audit code each use a function clampTo2Decimals to clamp scored values to two decimals.

Having the ability to change the number of digits (or even just deactivate the clamping altogether) would be useful and give statistically minded users the ability to perform more in depth analysis of the produced scores.

E.g. the use case I would find this useful because I am currently optimizing a test setup for reduced jitter; I can still measure jitter with the reduced precision, but due to the clamping I need many more samples to properly quantify the amount of jitter

I would be very open to creating a PR for this, provided there is interest in merging such a feature…

@paulirish
Copy link
Member

Hi @koraa

super cool work on harmonicabsorber!! that's so rad.

a few things i wanted to add.

web performance measurement is tricky and there's so many sources of variance/jitter that it's nigh impossible to expect repeatedable results. (using webpagereplay would definitely be an ingredient in the best repro setup). We have some more docs on this topic at https://github.com/GoogleChrome/lighthouse/blob/master/docs/variability.md

As for clamping.. we clamp just to give a signal of our significant digits. there's enough variance that having higher precision just becomes misleading.

that said, we do not clamp the metric values.. only the scores. so for all those audit values, i would recommend looking at the numericValue which will have a bit higher precision. (Also fwiw the lhr.audits.metrics.details.items payload has even more numbers, though personally i'd stick to each metric audit's numericValue ;)

(While discussing this we also considered that perhaps it'd be useful to our scoring calculations.. eg a method to calculate the 0-1 score of a given LCP/whatever value. We currently don't have this quite available, but it's possible to explore..)

@koraa
Copy link
Author

koraa commented Nov 20, 2020

Hi @paulirish,

web performance measurement is tricky and there's so many sources of variance/jitter that it's nigh impossible to expect repeatedable results. (using webpagereplay would definitely be an ingredient in the best repro setup). We have some more docs on this topic at https://github.com/GoogleChrome/lighthouse/blob/master/docs/variability.md

Thanks for pointing that out! Yes I am aware of that, our goal at the moment is not to reduce variance and instead develop appropriate statistical models to characterize the distribution and produce an estimate of the score along with error bars…

However, I agree that performing estimation on measurements separately and then compute the final score from these estimations is better than characterizing score distributions. To that end, is it correct to assume that all scores use the same method of generating the score from the scores log-normal cdf or are there differences between the scores?

As for clamping.. we clamp just to give a signal of our significant digits. there's enough variance that having higher precision just becomes misleading.

(Not that it matters much; but don't you lose a bit of precision by clamping twice? Clamping subscores and then clamping the weighted average again? Not that I have checked anything here, but my feeling here is that this would introduce, some uncomfortable nonlinearities in cases where many sub scores are close to (n+0.5)%; would be hard to see though because of the high dimensional nature of the average. Might be better to calculate the average on scores before clamp and then clamp once).

@patrickhulce
Copy link
Collaborator

is it correct to assume that all scores use the same method of generating the score from the scores log-normal cdf or are there differences between the scores?

Yes, all performance metric scores use the log-normal CDF method with different control points for each metric-environment that are defined in the audit's options object.

mobile: {
// 25th and 5th percentiles HTTPArchive -> median and PODR, then p10 is derived from them.
// https://bigquery.cloud.google.com/table/httparchive:lighthouse.2018_04_01_mobile?pli=1
// see https://www.desmos.com/calculator/oqlvmezbze
scoring: {
p10: 2336,
median: 4000,
},
},

/**
* Computes a score between 0 and 1 based on the measured `value`. Score is determined by
* considering a log-normal distribution governed by two control points (the 10th
* percentile value and the median value) and represents the percentage of sites that are
* greater than `value`.
* @param {{median: number, p10: number}} controlPoints
* @param {number} value
* @return {number}
*/
static computeLogNormalScore(controlPoints, value) {
const percentile = statistics.getLogNormalScore(controlPoints, value);
return clampTo2Decimals(percentile);
}

but don't you lose a bit of precision by clamping twice? Clamping subscores and then clamping the weighted average again?

We do, but as you noted, it's hard to see and doesn't matter much compared to the other sources of noise in this data :) All statistical analysis attempts we're aware of use the underlying metric values for the reasons above.

Please let us know if there are any particular utilities Lighthouse could expose that would help your projects in this area! Super exciting to see this type of work being done independently and would love to share notes :)

@koraa koraa closed this as completed Nov 20, 2020
@connorjclark
Copy link
Collaborator

in case it helps, we do all sorts of math-y things for the scoring calculator here: https://github.com/paulirish/lh-scorecalc/blob/master/script/math.js

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants