Skip to content

Commit

Permalink
feedback. extract _normalizeAuditScore
Browse files Browse the repository at this point in the history
  • Loading branch information
paulirish committed Mar 5, 2018
1 parent f80e742 commit cccf448
Show file tree
Hide file tree
Showing 14 changed files with 129 additions and 60 deletions.
38 changes: 19 additions & 19 deletions docs/scoring.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Goal
The goal of this document is to explain how scoring works in Lighthouse and what to do to improve your Lighthouse scores across the four sections of the report.
The goal of this document is to explain how scoring works in Lighthouse and what to do to improve your Lighthouse scores across the four sections of the report.

Note 1: if you want a **nice spreadsheet** version of this doc to understand weighting and scoring, check out the [scoring spreadsheet](https://docs.google.com/spreadsheets/d/1dXH-bXX3gxqqpD1f7rp6ImSOhobsT1gn_GQ2fGZp8UU/edit?ts=59fb61d2#gid=0)

Expand All @@ -9,22 +9,22 @@ Note 1: if you want a **nice spreadsheet** version of this doc to understand wei
Note 2: if you receive a **score of 0** in any Lighthouse category, that usually indicates an error on our part. Please file an [issue](https://github.com/GoogleChrome/lighthouse/issues) so our team can look into it.

# Performance

### What performance metrics does Lighthouse measure?
Lighthouse measures the following performance metrics:
Lighthouse measures the following performance metrics:

- [First meaningful paint](https://developers.google.com/web/tools/lighthouse/audits/first-meaningful-paint): first meaningful paint is defined as when the browser first puts any “meaningful” element/set of “meaningful” elements on the screen. What is meaningful is determined from a series of heuristics.
- [First interactive](https://developers.google.com/web/tools/lighthouse/audits/first-interactive): first interactive is defined as the first point at which the page could respond quickly to input. It doesn't consider any point in time before first meaningful paint. The way this is implemented is primarily based on heuristics.
- [First meaningful paint](https://developers.google.com/web/tools/lighthouse/audits/first-meaningful-paint): first meaningful paint is defined as when the browser first puts any “meaningful” element/set of “meaningful” elements on the screen. What is meaningful is determined from a series of heuristics.
- [First interactive](https://developers.google.com/web/tools/lighthouse/audits/first-interactive): first interactive is defined as the first point at which the page could respond quickly to input. It doesn't consider any point in time before first meaningful paint. The way this is implemented is primarily based on heuristics.
*Note: this metric is currently in beta, which means that the underlying definition of this metric is in progress.*
- [Consistently interactive](https://developers.google.com/web/tools/lighthouse/audits/consistently-interactive): defined as the first point at which everything is loaded such that the page will quickly respond to any user input throughout the page.
- [Consistently interactive](https://developers.google.com/web/tools/lighthouse/audits/consistently-interactive): defined as the first point at which everything is loaded such that the page will quickly respond to any user input throughout the page.
*Note: this metric is currently in beta, which means that the underlying definition of this metric is in progress.*
- [Perceptual Speed Index (pSI)](https://developers.google.com/web/tools/lighthouse/audits/speed-index): pSI measures how many pixels are painted at each given time interval on the viewport. The earlier the pixels are painted, the better you score on metric since we want an experience where most of the content is shown on the screen during the first few moments of initiating the page load. Loading more content earlier makes your end user feel like the website is loading quickly, which contributes to a positive user experience. Therefore, the lower the pSI score, the better.
- [Perceptual Speed Index (pSI)](https://developers.google.com/web/tools/lighthouse/audits/speed-index): pSI measures how many pixels are painted at each given time interval on the viewport. The earlier the pixels are painted, the better you score on metric since we want an experience where most of the content is shown on the screen during the first few moments of initiating the page load. Loading more content earlier makes your end user feel like the website is loading quickly, which contributes to a positive user experience. Therefore, the lower the pSI score, the better.
- [Estimated Input Latency](https://developers.google.com/web/tools/lighthouse/audits/estimated-input-latency): this audit measures how fast your app is in responding to user input. Our benchmark is that the estimated input latency should be under 50 ms (see documentation [here](https://developers.google.com/web/tools/lighthouse/audits/estimated-input-latency) as to why).

*Some **variability** when running on real-world sites is to be expected as sites load different ads, scripts, and network conditions vary for each visit. Note that Lighthouse can especially experience inconsistent behaviors when it runs in the presence of anti-virus scanners, other extensions or programs that interfere with page load, and inconsistent ad behavior. Please try to run without anti-virus scanners or other extensions/programs to get the cleanest results, or alternatively, run Lighthouse on WebPageTest for the most consistent results [here](https://www.webpagetest.org/easy.php).*

### How are the scores weighted?
Lighthouse returns a performance score from 0-100. A score of 0 usually indicates an error with performance measurement (so file an issue in the Lighthouse repo if further debugging is needed), and 100 is the best possible ideal score (really hard to get). Usually, any score above a 90 gets you in the top ~5% of performant websites.
Lighthouse returns a performance score from 0-100 (technically returned as 0-1, but you can do the math ;). A score of 0 usually indicates an error with performance measurement (so file an issue in the Lighthouse repo if further debugging is needed), and 100 is the best possible ideal score (really hard to get). Usually, any score above a 90 gets you in the top ~5% of performant websites.

The performance score is determined from the **performance metrics only**. The Opportunities/Diagnostics sections do not directly contribute to the performance score.

Expand All @@ -36,30 +36,30 @@ The metric results are not weighted equally. Currently the weights are:
* 1X - perceptual speed index
* 1X - estimated input latency

These weights were determined based on heuristics, and the Lighthouse team is working on formalizing this approach through more field data.
These weights were determined based on heuristics, and the Lighthouse team is working on formalizing this approach through more field data.

### How do performance metrics get scored?
Once Lighthouse is done gathering the raw performance metrics for your website (metrics reported in miliseconds), it converts them into a score by mapping the raw performance number to a number between 0-100 by looking where your raw performance metric falls on the Lighthouse scoring distribution. The Lighthouse scoring distribution is a log normal distribution that is derived from the performance metrics of real website performance data (see sample distribution [here](https://www.desmos.com/calculator/zrjq6v1ihi)).

Once we finish computing the percentile equivalent of your raw performance score, we take the weighted average of all the performance metrics (per the weighting above, with 5x weight given to first meaningful weight, first interactive, and consistently interactive). Finally, we apply a coloring to the score (green, orange, and red) depending on what "bucket" your score falls in. Roughly, this maps to:
- Red (poor score): 0-44.
- Orange (average): 45-74
- Green (good): 75-100.
Once we finish computing the percentile equivalent of your raw performance score, we take the weighted average of all the performance metrics (per the weighting above, with 5x weight given to first meaningful weight, first interactive, and consistently interactive). Finally, we apply a coloring to the score (green, orange, and red) depending on what "bucket" your score falls in. Roughly, this maps to:
- Red (poor score): 0-44.
- Orange (average): 45-74
- Green (good): 75-100.

### What can developers do to improve their performance score?
*Note: we've built [a little calculator](https://docs.google.com/spreadsheets/d/1dXH-bXX3gxqqpD1f7rp6ImSOhobsT1gn_GQ2fGZp8UU/edit?ts=59fb61d2#gid=283330180) that can help you understand what thresholds you should be aiming for achieving a certain Lighthouse performance score. *

Lighthouse has a whole section in the report on improving your performance score under the “Opportunities” section. There are detailed suggestions and documentation that explains the different suggestions and how to implement them. Additionally, the diagnostics section lists additional guidance that developers can explore to further experiment and tweak with their performance.
Lighthouse has a whole section in the report on improving your performance score under the “Opportunities” section. There are detailed suggestions and documentation that explains the different suggestions and how to implement them. Additionally, the diagnostics section lists additional guidance that developers can explore to further experiment and tweak with their performance.


# PWA
### How is the PWA score calculated?
The PWA score is calculated based on the [Baseline PWA checklist](https://developers.google.com/web/progressive-web-apps/checklist#baseline), which lists 14 requirements. Lighthouse tests for 11 out of the 14 requirements automatically, with the other 3 being manual checks. Each of the 11 audits for the PWA section of the report is weighted equally, so implementing any of the audits correctly will increase your overall score by ~9 points.
### How is the PWA score calculated?
The PWA score is calculated based on the [Baseline PWA checklist](https://developers.google.com/web/progressive-web-apps/checklist#baseline), which lists 14 requirements. Lighthouse tests for 11 out of the 14 requirements automatically, with the other 3 being manual checks. Each of the 11 audits for the PWA section of the report is weighted equally, so implementing any of the audits correctly will increase your overall score by ~9 points.

# Accessibility
### How is the accessibility score calculated?
The accessibility score is a weighted average of all the different audits (the weights for each audit can be found in [the scoring spreadsheet](https://docs.google.com/spreadsheets/d/1dXH-bXX3gxqqpD1f7rp6ImSOhobsT1gn_GQ2fGZp8UU/edit?ts=59fb61d2#gid=0)). Each audit is a pass/fail (meaning there is no room for partial points for getting an audit half-right). For example, that means if half your buttons have screenreader friendly names, and half don't, you don't get "half" of the weighted average-you get a 0 because it needs to be implemented *throughout* the page.
The accessibility score is a weighted average of all the different audits (the weights for each audit can be found in [the scoring spreadsheet](https://docs.google.com/spreadsheets/d/1dXH-bXX3gxqqpD1f7rp6ImSOhobsT1gn_GQ2fGZp8UU/edit?ts=59fb61d2#gid=0)). Each audit is a pass/fail (meaning there is no room for partial points for getting an audit half-right). For example, that means if half your buttons have screenreader friendly names, and half don't, you don't get "half" of the weighted average-you get a 0 because it needs to be implemented *throughout* the page.

# Best Practices
### How is the Best Practices score calculated?
Each audit in the Best Practices section is equally weighted. Therefore, implementing each audit correctly will increase your overall score by ~6 points.
### How is the Best Practices score calculated?
Each audit in the Best Practices section is equally weighted. Therefore, implementing each audit correctly will increase your overall score by ~6 points.
4 changes: 1 addition & 3 deletions docs/understanding-results.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@ The top-level Lighthouse Results object (LHR) is what the lighthouse node module
| userAgent | The user agent string of the version of Chrome that was used by Lighthouse. |
| initialUrl | The URL that was supplied to Lighthouse and initially navigated to. |
| url | The URL that Lighthouse ended up auditing after redirects were followed. |
| score | The overall score `0-100`, a weighted average of all category scores. *NOTE: Only the PWA category has a weight by default* |
| [audits](#audits) | An object containing the results of the audits. |
| [runtimeConfig](#runtime-config) | An object containing information about the configuration used by Lighthouse. |
| [timing](#timing) | An object containing information about how long Lighthouse spent auditing. |
Expand Down Expand Up @@ -56,7 +55,7 @@ An object containing the results of the audits, keyed by their name.
| error | `boolean` | Set to true if there was an an exception thrown within the audit. The error message will be in `debugString`.
| rawValue | <code>boolean&#124;number</code> | The unscored value determined by the audit. Typically this will match the score if there's no additional information to impart. For performance audits, this value is typically a number indicating the metric value. |
| displayValue | `string` | The string to display in the report alongside audit results. If empty, nothing additional is shown. This is typically used to explain additional information such as the number and nature of failing items. |
| score | <code>boolean&#124;number</code> | The scored value determined by the audit as either boolean or a number `0-100`. If the audit is a boolean, the implication is `score ? 100 : 0`. |
| score | <code>boolean&#124;number</code> | The scored value determined by the audit as a number `0-1`, representing displayed scores of 0-100. |
| scoreDisplayMode | <code>"binary"&#124;"numeric"</code> | A string identifying how granular the score is meant to be indicating, i.e. is the audit pass/fail (score of 1 or 0), or are there shades of gray (scores between 0-1 exclusive). |
| details | `Object` | Extra information found by the audit necessary for display. The structure of this object varies from audit to audit. The structure of this object is somewhat stable between minor version bumps as this object is used to render the HTML report.
| extendedInfo | `Object` | Extra information found by the audit. The structure of this object varies from audit to audit and is generally for programmatic consumption and debugging, though there is typically overlap with `details`. *WARNING: The structure of this object is not stable and cannot be trusted to follow semver* |
Expand Down Expand Up @@ -185,7 +184,6 @@ An array containing the different categories, their scores, and the results of t
| Name | Type | Description |
| -- | -- | -- |
| id | `string` | The string identifier of the category. |
| score | `number` | The numeric score `0-100` of the audit. Audits with a boolean score result are converted with `score ? 100 : 0`. |
| weight | `number` | The weight of the audit's score in the overall category score. |
| result | `Object` | The actual audit result, a copy of the audit object found in [audits](#audits). *NOTE: this property will likely be removed in upcoming releases; use the `id` property to lookup the result in the `audits` property.* |

Expand Down
48 changes: 35 additions & 13 deletions lighthouse-core/audits/audit.js
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,36 @@ class Audit {
};
}

/**
* @param {!Audit} audit
* @param {!AuditResult} result
* @return {{score: number, scoreDisplayMode: string}}
*/
static _normalizeAuditScore(audit, result) {
let score = typeof result.score === 'undefined' ? result.rawValue : result.score;

if (typeof score === 'boolean' || score === null) {
score = score ? 1 : 0;
}

if (!Number.isFinite(score)) {
throw new Error(`Invalid score: ${score}`);
}


if (score > 1) {
throw new Error(`Audit score for ${audit.meta.name} is > 1`);
}

const scoreDisplayMode = result.scoreDisplayMode || audit.meta.scoreDisplayMode ||
Audit.SCORING_MODES.BINARY;

return {
score,
scoreDisplayMode,
};
}

/**
* @param {!Audit} audit
* @param {!AuditResult} result
Expand All @@ -104,31 +134,23 @@ class Audit {
throw new Error('generateAuditResult requires a rawValue');
}

let score = typeof result.score === 'undefined' ? result.rawValue : result.score;
let displayValue = result.displayValue;
// TODO: remove this bizarre fallback logic. (see #458)
if (typeof displayValue === 'undefined') {
displayValue = result.rawValue ? result.rawValue : '';
}

const {score, scoreDisplayMode} = Audit._normalizeAuditScore(audit, result);

// The same value or true should be '' it doesn't add value to the report
// TODO: throw in this case. Why do we even do this?
if (displayValue === score) {
displayValue = '';
}

if (typeof score === 'boolean' || score === null) {
score = score ? 1 : 0;
}

if (!Number.isFinite(score)) {
throw new Error(`Invalid score: ${score}`);
}

const scoreDisplayMode = result.scoreDisplayMode || audit.meta.scoreDisplayMode ||
Audit.SCORING_MODES.BINARY;

let auditDescription = audit.meta.description;
if (audit.meta.failureDescription) {
if (!score || (typeof score === 'number' && score < 100)) {
if (score < 1) {
auditDescription = audit.meta.failureDescription;
}
}
Expand Down
4 changes: 2 additions & 2 deletions lighthouse-core/audits/byte-efficiency/total-byte-weight.js
Original file line number Diff line number Diff line change
Expand Up @@ -61,8 +61,8 @@ class TotalByteWeight extends ByteEfficiencyAudit {
results = results.sort((itemA, itemB) => itemB.totalBytes - itemA.totalBytes).slice(0, 10);

// Use the CDF of a log-normal distribution for scoring.
// <= 1600KB: score≈100
// 4000KB: score=50
// <= 1600KB: score≈1.00
// 4000KB: score=.50
// >= 9000KB: score≈0
const score = ByteEfficiencyAudit.computeLogNormalScore(
totalBytes,
Expand Down
6 changes: 3 additions & 3 deletions lighthouse-core/audits/byte-efficiency/uses-long-cache-ttl.js
Original file line number Diff line number Diff line change
Expand Up @@ -207,9 +207,9 @@ class CacheHeaders extends Audit {
);

// Use the CDF of a log-normal distribution for scoring.
// <= 4KB: score≈100
// 768KB: score=50
// >= 4600KB: score≈5
// <= 4KB: score≈1.00
// 768KB: score=0.50
// >= 4600KB: score≈0.05
const score = ByteEfficiencyAudit.computeLogNormalScore(
totalWastedBytes / 1024,
SCORING_POINT_OF_DIMINISHING_RETURNS,
Expand Down
2 changes: 1 addition & 1 deletion lighthouse-core/audits/critical-request-chains.js
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,7 @@ class CriticalRequestChains extends Audit {
/**
* Audits the page to give a score for First Meaningful Paint.
* @param {!Artifacts} artifacts The artifacts from the gather phase.
* @return {!AuditResult} The score from the audit, ranging from 0-100.
* @return {!AuditResult}
*/
static audit(artifacts) {
const devtoolsLogs = artifacts.devtoolsLogs[Audit.DEFAULT_PASS];
Expand Down
4 changes: 2 additions & 2 deletions lighthouse-core/audits/dobetterweb/dom-size.js
Original file line number Diff line number Diff line change
Expand Up @@ -67,8 +67,8 @@ class DOMSize extends Audit {
stats.width.pathToElement[stats.width.pathToElement.length - 1];

// Use the CDF of a log-normal distribution for scoring.
// <= 1500: score≈100
// 3000: score=50
// <= 1500: score≈1.00
// 3000: score=0.50
// >= 5970: score≈0
const score = Audit.computeLogNormalScore(
stats.totalDOMNodes,
Expand Down
6 changes: 3 additions & 3 deletions lighthouse-core/audits/first-meaningful-paint.js
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ class FirstMeaningfulPaint extends Audit {
* @see https://github.com/GoogleChrome/lighthouse/issues/26
* @see https://docs.google.com/document/d/1BR94tJdZLsin5poeet0XoTW60M0SjvOJQttKT-JK8HI/view
* @param {!Artifacts} artifacts The artifacts from the gather phase.
* @return {!Promise<!AuditResult>} The score from the audit, ranging from 0-100.
* @return {!Promise<!AuditResult>}
*/
static audit(artifacts) {
const trace = artifacts.traces[this.DEFAULT_PASS];
Expand Down Expand Up @@ -95,8 +95,8 @@ class FirstMeaningfulPaint extends Audit {
});

// Use the CDF of a log-normal distribution for scoring.
// < 1100ms: score≈100
// 4000ms: score=50
// < 1100ms: score≈1.00
// 4000ms: score=0.50
// >= 14000ms: score≈0
const firstMeaningfulPaint = traceOfTab.timings.firstMeaningfulPaint;
const score = Audit.computeLogNormalScore(
Expand Down
Loading

0 comments on commit cccf448

Please sign in to comment.