Ability to control precision of scoring values #11570

koraa · 2020-10-16T15:05:22Z

The overall scoring and audit code each use a function clampTo2Decimals to clamp scored values to two decimals.

Having the ability to change the number of digits (or even just deactivate the clamping altogether) would be useful and give statistically minded users the ability to perform more in depth analysis of the produced scores.

E.g. the use case I would find this useful because I am currently optimizing a test setup for reduced jitter; I can still measure jitter with the reduced precision, but due to the clamping I need many more samples to properly quantify the amount of jitter

I would be very open to creating a PR for this, provided there is interest in merging such a feature…

The text was updated successfully, but these errors were encountered:

paulirish · 2020-11-16T21:37:36Z

Hi @koraa

super cool work on harmonicabsorber!! that's so rad.

a few things i wanted to add.

web performance measurement is tricky and there's so many sources of variance/jitter that it's nigh impossible to expect repeatedable results. (using webpagereplay would definitely be an ingredient in the best repro setup). We have some more docs on this topic at https://github.com/GoogleChrome/lighthouse/blob/master/docs/variability.md

As for clamping.. we clamp just to give a signal of our significant digits. there's enough variance that having higher precision just becomes misleading.

that said, we do not clamp the metric values.. only the scores. so for all those audit values, i would recommend looking at the numericValue which will have a bit higher precision. (Also fwiw the lhr.audits.metrics.details.items payload has even more numbers, though personally i'd stick to each metric audit's numericValue ;)

(While discussing this we also considered that perhaps it'd be useful to our scoring calculations.. eg a method to calculate the 0-1 score of a given LCP/whatever value. We currently don't have this quite available, but it's possible to explore..)

koraa · 2020-11-20T14:06:21Z

Hi @paulirish,

web performance measurement is tricky and there's so many sources of variance/jitter that it's nigh impossible to expect repeatedable results. (using webpagereplay would definitely be an ingredient in the best repro setup). We have some more docs on this topic at https://github.com/GoogleChrome/lighthouse/blob/master/docs/variability.md

Thanks for pointing that out! Yes I am aware of that, our goal at the moment is not to reduce variance and instead develop appropriate statistical models to characterize the distribution and produce an estimate of the score along with error bars…

However, I agree that performing estimation on measurements separately and then compute the final score from these estimations is better than characterizing score distributions. To that end, is it correct to assume that all scores use the same method of generating the score from the scores log-normal cdf or are there differences between the scores?

As for clamping.. we clamp just to give a signal of our significant digits. there's enough variance that having higher precision just becomes misleading.

(Not that it matters much; but don't you lose a bit of precision by clamping twice? Clamping subscores and then clamping the weighted average again? Not that I have checked anything here, but my feeling here is that this would introduce, some uncomfortable nonlinearities in cases where many sub scores are close to (n+0.5)%; would be hard to see though because of the high dimensional nature of the average. Might be better to calculate the average on scores before clamp and then clamp once).

patrickhulce · 2020-11-20T16:05:56Z

is it correct to assume that all scores use the same method of generating the score from the scores log-normal cdf or are there differences between the scores?

Yes, all performance metric scores use the log-normal CDF method with different control points for each metric-environment that are defined in the audit's options object.

lighthouse/lighthouse-core/audits/metrics/first-contentful-paint.js

Lines 39 to 47 in e9d7224

    
           mobile: { 
        
             // 25th and 5th percentiles HTTPArchive -> median and PODR, then p10 is derived from them. 
        
             // https://bigquery.cloud.google.com/table/httparchive:lighthouse.2018_04_01_mobile?pli=1 
        
             // see https://www.desmos.com/calculator/oqlvmezbze 
        
             scoring: { 
        
               p10: 2336, 
        
               median: 4000, 
        
             }, 
        
           },

lighthouse/lighthouse-core/audits/audit.js

Lines 71 to 83 in e9d7224

    
             /** 
        
              * Computes a score between 0 and 1 based on the measured `value`. Score is determined by 
        
              * considering a log-normal distribution governed by two control points (the 10th 
        
              * percentile value and the median value) and represents the percentage of sites that are 
        
              * greater than `value`. 
        
              * @param {{median: number, p10: number}} controlPoints 
        
              * @param {number} value 
        
              * @return {number} 
        
              */ 
        
             static computeLogNormalScore(controlPoints, value) { 
        
               const percentile = statistics.getLogNormalScore(controlPoints, value); 
        
               return clampTo2Decimals(percentile); 
        
             }

but don't you lose a bit of precision by clamping twice? Clamping subscores and then clamping the weighted average again?

We do, but as you noted, it's hard to see and doesn't matter much compared to the other sources of noise in this data :) All statistical analysis attempts we're aware of use the underlying metric values for the reasons above.

Please let us know if there are any particular utilities Lighthouse could expose that would help your projects in this area! Super exciting to see this type of work being done independently and would love to share notes :)

connorjclark · 2020-11-21T01:07:27Z

in case it helps, we do all sorts of math-y things for the scoring calculator here: https://github.com/paulirish/lh-scorecalc/blob/master/script/math.js

koraa mentioned this issue Oct 16, 2020

Lighthouse Score tuning bench for developers adobe/helix-cli#1536

Closed

devtools-bot added the needs-priority label Oct 16, 2020

patrickhulce added the needs-discussion label Oct 16, 2020

paulirish self-assigned this Nov 10, 2020

paulirish added pending-close and removed needs-discussion labels Nov 16, 2020

koraa closed this as completed Nov 20, 2020

koraa mentioned this issue Dec 23, 2020

One common interface for deriving score from raw values & deriving the weighted average. #11881

Open

koraa mentioned this issue Jan 28, 2021

Statistical estimation of the lighthouse score distribution parameters: Covariance matrices? #12014

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ability to control precision of scoring values #11570

Ability to control precision of scoring values #11570

koraa commented Oct 16, 2020

paulirish commented Nov 16, 2020

koraa commented Nov 20, 2020 •

edited

Loading

patrickhulce commented Nov 20, 2020

connorjclark commented Nov 21, 2020

Ability to control precision of scoring values #11570

Ability to control precision of scoring values #11570

Comments

koraa commented Oct 16, 2020

paulirish commented Nov 16, 2020

koraa commented Nov 20, 2020 • edited Loading

patrickhulce commented Nov 20, 2020

connorjclark commented Nov 21, 2020

koraa commented Nov 20, 2020 •

edited

Loading