-
Notifications
You must be signed in to change notification settings - Fork 9.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
One common interface for deriving score from raw values & deriving the weighted average. #11881
Comments
This may be immediately useful to you: https://github.com/paulirish/lh-scorecalc/blob/4c7724ad6587595bda807e1c2278a07c9f55e7ec/script/math.js#L78 |
Thank you! EDIT: To expand on this a little bit; my biggest issue right now isn't really the math portion; it's the special cases I have to consider. My code looks something like this right now: const lhScoreFromRaw = curry('lhScoreFromRaw', (raw, audit) => {
const a = is_a(audit, String) ? lhAuditCls(audit) : audit;
if (a.meta.scoreDisplayMode === 'binary')
return raw; // Already either 1 or 0
if (a instanceof ByteEfficiencyAudit)
return ByteEfficiencyAudit.scoreForWastedMs(raw);
const lnp = lhScoreParams(a);
if (isReal(lnp.p10) && isReal(lnp.median))
return a.computeLogNormalScore(lnp, raw);
assert(false, `Don't know how to score ${typename(a)}`);
}); It considers (1) binary scoring (which don't have raw values) (2) numeric scoring from log normal (using the private computeLogNormalScore interface) (3) Byte efficiency audits (4) just now I discovered I neglected to consider #11882 so I'll have to add another special case for that… |
Just found another special case: redirects audit which is 1 for zero and one redirects and uses the method outlined in #11883 in other cases. Raw value is set to the wasted ms in either case. This seems a bit odd since in the zero and one redirects case the raw value and the score are entirely uncorrelated. This is also the first scoring function I've seen which depends on >1 variables. Isn't the latency tested somewhere else too? One of the findings from harmonicobserver is that measurements tend to be correlated (graph – experiment is artificial because it shows the graph for an empty page. The effect is still present but less impactful for less extreme sites). This increases variance because the weighted average of multiple correlated functions still has a pretty high variance. If latency is measured somewhere else too, it might be better (just for our mental models) to define the score based solely on the number of redirects (something like p90=.5, median=1.5). Are you interested in more suggestions like these? Otherwise I'll leave it with making-lh-continous suggestions :) |
These would be great improvements. It would allow use cases like yours, as well as making counterfactual tools like the score calculator ("what happens to my scores if I improved this audit by X") easier to implement without duplicating large chunks of internal code that's subject to change. I'm sure it's also not the prettiest process to duplicate :) Scoring is a mix of some of the oldest code still around (chunks of The good news is that this is almost all internal code, so we have a ton of leeway to change things. We also have pretty good input/output test coverage that's relatively agnostic to the implementation, so we can be bold without worrying too much about breaking anything. But there are some other constraints that we can't really live without that makes this a difficult externalized (and especially static) API design space. Here's the ones I can think of, others can chime in with any more or if they think any in my list aren't actually a problem :) Metric scoringMetrics are probably the most straightforward case for current functionality. Could pretty easily expose a The main thing also needed would be the The open question would be do we want to commit to always scoring metrics like this? Audit scoringThis is a lot harder, as even for classes of audits that have a comparable
for many of the less straightforward audits there are multiple levels of thresholds and heuristics to prune out things that aren't worth bringing to the user's attention, and (stepping out of the performance section) there are many audits with no meaningful If we limit ourselves to just perf audits, the other question is: does the score matter? All the non-metric audits in perf are scored but the score isn't visible in the report and they all have weight 0 so don't contribute to the overall category score. Maybe it makes more sense to be looking at e.g. Report scoring
This is an area where old code could maybe be cleared up in the split between If the score has come out of an audit, the score will be lighthouse/lighthouse-core/audits/audit.js Lines 225 to 229 in 0d25b6b
All those except lighthouse/lighthouse-core/scoring.js Lines 65 to 69 in 0d25b6b
to make sure the category isn't scored if one of the weighted audits threw an error. So a For the general case, we also can't assume the audit weights will be whatever's in the default-config, so we need the weights too. Maybe I'm missing the reasoning here for avoiding having to pass in an object audit result, but all this information is in the lighthouse JSON output (scores, scoreDisplayMode, audit weights). Could we instead make it easier to synthesize input to scoring.js from an existing LHR with changed audit scores in it? Things that might change from taking a number of lighthouse results and summarizing them with one would have to be dealt with regardless of this method (e.g. discard runs with errors in them? discard runs where audits became N/A?), and after that most of that audit object should be straightforward to reconstruct. |
I just pushed my reimplementation of the scoring for reference. List of special cases should be easy to spot. Only thing left out is the redirect audit https://github.com/koraa/helix-harmonicabsorber/blob/6562a2757fd72b79fea07cf55895bb35a6fde1e4/src/report.js#L76 Thanks for exhaustive reply :) I'll reread it after my vacation! |
FWIW all "calculate audit score" is rather straightforward - take a number, return a number. only csp-xss is different |
The other note I'll throw out there, score will depend also on the device type (metrics today) and possibly fraggle rock modes in the future. |
Hi there!
It would be really useful if Audit featured a static method
scoreFromRaw(raw: Number) => Number
that could be used to map the raw value into the score range (and maybe even an inverserawFromScore(score)
.This is currently mostly available with the p10 and median config options, but some classes like UnusedBytes/ByteEfficiencyMethod use custom conversion methods.
This feature would be really useful for anyone doing custom processing to measurements. E.g. I am trying to experiment with new methods to derive average scores.
(In my case generating an estimate of audit's raw values separately, mapping the estimate into the score space and then taking the weighted average instead of calculating the weighted average multiple times and then generating an estimate from that; as previously discussed in #11570)
In the same vein, it would be really useful if
lighthouse-core/scoring.js
was exposed and featured a function that could derive the weighted average fromauditName => auditScore
instead of requiring the entireauditName => auditObject
where auditObject at least needs to includescoreDisplayMode
. While I understand that scoreDisplayMode here is used to convey error information this error information may be better encoded as a score undefined or null value and scoringModes MANUAL & INFORMATIVE should be compile time constant so their weight should be zero anyways (or am I missing something here?).Thank you!!
The text was updated successfully, but these errors were encountered: