Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core(lhr): strictly numeric scores, add scoreDisplayMode #4690

Merged
merged 8 commits into from
Mar 14, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,9 @@ _Some incomplete notes_
* **Driver** - Interfaces with [Chrome Debugging Protocol](https://developer.chrome.com/devtools/docs/debugger-protocol) ([API viewer](https://chromedevtools.github.io/debugger-protocol-viewer/))
* **Gatherers** - Uses Driver to collect information about the page. Minimal post-processing.
* **Artifacts** - output of a gatherer
* **Audit** - Tests for a single feature/optimization/metric. Using the Artifacts as input, an audit evaluates a test and resolves to a score which may be pass/fail/numeric. Formatting note: The meta description may contain markdown links and meta title may contain markdown code.
* **Audit** - Tests for a single feature/optimization/metric. Using the Artifacts as input, an audit evaluates a test and resolves to a numeric score. See [Understanding Results](./understanding-results.md) for details of the results object.
* **Computed Artifacts** - Generated on-demand from artifacts, these add additional meaning, and are often shared amongst multiple audits.

### Audit/Report terminology
* **Category** - Roll-up collection of audits and audit groups into a user-facing section of the report (eg. `Best Practices`). Applies weighting and overall scoring to the section. Examples: PWA, Accessibility, Best Practices.
* **Audit description** - Short user-visible title for the successful audit. eg. “All image elements have `[alt]` attributes.”
Expand Down
6 changes: 3 additions & 3 deletions docs/puppeteer.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ browser.on('targetchanged', async target => {
style.appendChild(document.createTextNode(content));
document.head.appendChild(style);
}

const css = '* {color: red}';

if (page && page.url() === url) {
Expand All @@ -51,7 +51,7 @@ const lhr = await lighthouse(url, {
logLevel: 'info',
});

console.log(`Lighthouse score: ${lhr.score}`);
console.log(`Lighthouse scores: ${lhr.reportCategories.map(c => c.score).join(', ')}`);

await browser.close();
})();
Expand Down Expand Up @@ -90,7 +90,7 @@ const browser = await puppeteer.connect({browserWSEndpoint: webSocketDebuggerUrl

// Run Lighthouse.
const lhr = await lighthouse(URL, opts, null);
console.log(`Lighthouse score: ${lhr.score}`);
console.log(`Lighthouse scores: ${lhr.reportCategories.map(c => c.score).join(', ')}`);

await browser.disconnect();
await chrome.kill();
Expand Down
38 changes: 19 additions & 19 deletions docs/scoring.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Goal
The goal of this document is to explain how scoring works in Lighthouse and what to do to improve your Lighthouse scores across the four sections of the report.
The goal of this document is to explain how scoring works in Lighthouse and what to do to improve your Lighthouse scores across the four sections of the report.

Note 1: if you want a **nice spreadsheet** version of this doc to understand weighting and scoring, check out the [scoring spreadsheet](https://docs.google.com/spreadsheets/d/1dXH-bXX3gxqqpD1f7rp6ImSOhobsT1gn_GQ2fGZp8UU/edit?ts=59fb61d2#gid=0)

Expand All @@ -9,22 +9,22 @@ Note 1: if you want a **nice spreadsheet** version of this doc to understand wei
Note 2: if you receive a **score of 0** in any Lighthouse category, that usually indicates an error on our part. Please file an [issue](https://github.com/GoogleChrome/lighthouse/issues) so our team can look into it.

# Performance

### What performance metrics does Lighthouse measure?
Lighthouse measures the following performance metrics:
Lighthouse measures the following performance metrics:

- [First meaningful paint](https://developers.google.com/web/tools/lighthouse/audits/first-meaningful-paint): first meaningful paint is defined as when the browser first puts any “meaningful” element/set of “meaningful” elements on the screen. What is meaningful is determined from a series of heuristics.
- [First interactive](https://developers.google.com/web/tools/lighthouse/audits/first-interactive): first interactive is defined as the first point at which the page could respond quickly to input. It doesn't consider any point in time before first meaningful paint. The way this is implemented is primarily based on heuristics.
- [First meaningful paint](https://developers.google.com/web/tools/lighthouse/audits/first-meaningful-paint): first meaningful paint is defined as when the browser first puts any “meaningful” element/set of “meaningful” elements on the screen. What is meaningful is determined from a series of heuristics.
- [First interactive](https://developers.google.com/web/tools/lighthouse/audits/first-interactive): first interactive is defined as the first point at which the page could respond quickly to input. It doesn't consider any point in time before first meaningful paint. The way this is implemented is primarily based on heuristics.
*Note: this metric is currently in beta, which means that the underlying definition of this metric is in progress.*
- [Consistently interactive](https://developers.google.com/web/tools/lighthouse/audits/consistently-interactive): defined as the first point at which everything is loaded such that the page will quickly respond to any user input throughout the page.
- [Consistently interactive](https://developers.google.com/web/tools/lighthouse/audits/consistently-interactive): defined as the first point at which everything is loaded such that the page will quickly respond to any user input throughout the page.
*Note: this metric is currently in beta, which means that the underlying definition of this metric is in progress.*
- [Perceptual Speed Index (pSI)](https://developers.google.com/web/tools/lighthouse/audits/speed-index): pSI measures how many pixels are painted at each given time interval on the viewport. The earlier the pixels are painted, the better you score on metric since we want an experience where most of the content is shown on the screen during the first few moments of initiating the page load. Loading more content earlier makes your end user feel like the website is loading quickly, which contributes to a positive user experience. Therefore, the lower the pSI score, the better.
- [Perceptual Speed Index (pSI)](https://developers.google.com/web/tools/lighthouse/audits/speed-index): pSI measures how many pixels are painted at each given time interval on the viewport. The earlier the pixels are painted, the better you score on metric since we want an experience where most of the content is shown on the screen during the first few moments of initiating the page load. Loading more content earlier makes your end user feel like the website is loading quickly, which contributes to a positive user experience. Therefore, the lower the pSI score, the better.
- [Estimated Input Latency](https://developers.google.com/web/tools/lighthouse/audits/estimated-input-latency): this audit measures how fast your app is in responding to user input. Our benchmark is that the estimated input latency should be under 50 ms (see documentation [here](https://developers.google.com/web/tools/lighthouse/audits/estimated-input-latency) as to why).

*Some **variability** when running on real-world sites is to be expected as sites load different ads, scripts, and network conditions vary for each visit. Note that Lighthouse can especially experience inconsistent behaviors when it runs in the presence of anti-virus scanners, other extensions or programs that interfere with page load, and inconsistent ad behavior. Please try to run without anti-virus scanners or other extensions/programs to get the cleanest results, or alternatively, run Lighthouse on WebPageTest for the most consistent results [here](https://www.webpagetest.org/easy.php).*

### How are the scores weighted?
Lighthouse returns a performance score from 0-100. A score of 0 usually indicates an error with performance measurement (so file an issue in the Lighthouse repo if further debugging is needed), and 100 is the best possible ideal score (really hard to get). Usually, any score above a 90 gets you in the top ~5% of performant websites.
Lighthouse returns a performance score from 0-100 (technically returned as 0-1, but you can do the math ;). A score of 0 usually indicates an error with performance measurement (so file an issue in the Lighthouse repo if further debugging is needed), and 100 is the best possible ideal score (really hard to get). Usually, any score above a 90 gets you in the top ~5% of performant websites.

The performance score is determined from the **performance metrics only**. The Opportunities/Diagnostics sections do not directly contribute to the performance score.

Expand All @@ -36,30 +36,30 @@ The metric results are not weighted equally. Currently the weights are:
* 1X - perceptual speed index
* 1X - estimated input latency

These weights were determined based on heuristics, and the Lighthouse team is working on formalizing this approach through more field data.
These weights are heuristics, and the Lighthouse team is working on formalizing the weighting system through more field data.

### How do performance metrics get scored?
Once Lighthouse is done gathering the raw performance metrics for your website (metrics reported in miliseconds), it converts them into a score by mapping the raw performance number to a number between 0-100 by looking where your raw performance metric falls on the Lighthouse scoring distribution. The Lighthouse scoring distribution is a log normal distribution that is derived from the performance metrics of real website performance data (see sample distribution [here](https://www.desmos.com/calculator/zrjq6v1ihi)).

Once we finish computing the percentile equivalent of your raw performance score, we take the weighted average of all the performance metrics (per the weighting above, with 5x weight given to first meaningful weight, first interactive, and consistently interactive). Finally, we apply a coloring to the score (green, orange, and red) depending on what "bucket" your score falls in. Roughly, this maps to:
- Red (poor score): 0-44.
- Orange (average): 45-74
- Green (good): 75-100.
Once we finish computing the percentile equivalent of your raw performance score, we take the weighted average of all the performance metrics (per the weighting above, with 5x weight given to first meaningful weight, first interactive, and consistently interactive). Finally, we apply a coloring to the score (green, orange, and red) depending on what "bucket" your score falls in. Roughly, this maps to:
- Red (poor score): 0-44.
- Orange (average): 45-74
- Green (good): 75-100.

### What can developers do to improve their performance score?
*Note: we've built [a little calculator](https://docs.google.com/spreadsheets/d/1dXH-bXX3gxqqpD1f7rp6ImSOhobsT1gn_GQ2fGZp8UU/edit?ts=59fb61d2#gid=283330180) that can help you understand what thresholds you should be aiming for achieving a certain Lighthouse performance score. *

Lighthouse has a whole section in the report on improving your performance score under the “Opportunities” section. There are detailed suggestions and documentation that explains the different suggestions and how to implement them. Additionally, the diagnostics section lists additional guidance that developers can explore to further experiment and tweak with their performance.
Lighthouse has a whole section in the report on improving your performance score under the “Opportunities” section. There are detailed suggestions and documentation that explains the different suggestions and how to implement them. Additionally, the diagnostics section lists additional guidance that developers can explore to further experiment and tweak with their performance.


# PWA
### How is the PWA score calculated?
The PWA score is calculated based on the [Baseline PWA checklist](https://developers.google.com/web/progressive-web-apps/checklist#baseline), which lists 14 requirements. Lighthouse tests for 11 out of the 14 requirements automatically, with the other 3 being manual checks. Each of the 11 audits for the PWA section of the report is weighted equally, so implementing any of the audits correctly will increase your overall score by ~9 points.
### How is the PWA score calculated?
The PWA score is calculated based on the [Baseline PWA checklist](https://developers.google.com/web/progressive-web-apps/checklist#baseline), which lists 14 requirements. Lighthouse tests for 11 out of the 14 requirements automatically, with the other 3 being manual checks. Each of the 11 audits for the PWA section of the report is weighted equally, so implementing any of the audits correctly will increase your overall score by ~9 points.

# Accessibility
### How is the accessibility score calculated?
The accessibility score is a weighted average of all the different audits (the weights for each audit can be found in [the scoring spreadsheet](https://docs.google.com/spreadsheets/d/1dXH-bXX3gxqqpD1f7rp6ImSOhobsT1gn_GQ2fGZp8UU/edit?ts=59fb61d2#gid=0)). Each audit is a pass/fail (meaning there is no room for partial points for getting an audit half-right). For example, that means if half your buttons have screenreader friendly names, and half don't, you don't get "half" of the weighted average-you get a 0 because it needs to be implemented *throughout* the page.
The accessibility score is a weighted average of all the different audits (the weights for each audit can be found in [the scoring spreadsheet](https://docs.google.com/spreadsheets/d/1dXH-bXX3gxqqpD1f7rp6ImSOhobsT1gn_GQ2fGZp8UU/edit?ts=59fb61d2#gid=0)). Each audit is a pass/fail (meaning there is no room for partial points for getting an audit half-right). For example, that means if half your buttons have screenreader friendly names, and half don't, you don't get "half" of the weighted average-you get a 0 because it needs to be implemented *throughout* the page.

# Best Practices
### How is the Best Practices score calculated?
Each audit in the Best Practices section is equally weighted. Therefore, implementing each audit correctly will increase your overall score by ~6 points.
### How is the Best Practices score calculated?
Each audit in the Best Practices section is equally weighted. Therefore, implementing each audit correctly will increase your overall score by ~6 points.
14 changes: 6 additions & 8 deletions docs/understanding-results.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@ The top-level Lighthouse Results object (LHR) is what the lighthouse node module
| userAgent | The user agent string of the version of Chrome that was used by Lighthouse. |
| initialUrl | The URL that was supplied to Lighthouse and initially navigated to. |
| url | The URL that Lighthouse ended up auditing after redirects were followed. |
| score | The overall score `0-100`, a weighted average of all category scores. *NOTE: Only the PWA category has a weight by default* |
| [audits](#audits) | An object containing the results of the audits. |
| [runtimeConfig](#runtime-config) | An object containing information about the configuration used by Lighthouse. |
| [timing](#timing) | An object containing information about how long Lighthouse spent auditing. |
Expand Down Expand Up @@ -50,14 +49,14 @@ An object containing the results of the audits, keyed by their name.
| Name | Type | Description |
| -- | -- | -- |
| name | `string` | The string identifier of the audit in kebab case. |
| description | `string` | The brief description of the audit. The text can change depending on if the audit passed or failed. |
| description | `string` | The brief description of the audit. The text can change depending on if the audit passed or failed. It may contain markdown code. |
| helpText | `string` | A more detailed description that describes why the audit is important and links to Lighthouse documentation on the audit, markdown links supported. |
| debugString | <code>string&#124;undefined</code> | A string indicating some additional information to the user explaining an unusual circumstance or reason for failure. |
| error | `boolean` | Set to true if there was an an exception thrown within the audit. The error message will be in `debugString`.
| rawValue | <code>boolean&#124;number</code> | The unscored value determined by the audit. Typically this will match the score if there's no additional information to impart. For performance audits, this value is typically a number indicating the metric value. |
| displayValue | `string` | The string to display in the report alongside audit results. If empty, nothing additional is shown. This is typically used to explain additional information such as the number and nature of failing items. |
| score | <code>boolean&#124;number</code> | The scored value determined by the audit as either boolean or a number `0-100`. If the audit is a boolean, the implication is `score ? 100 : 0`. |
| scoringMode | <code>"binary"&#124;"numeric"</code> | A string identifying how granular the score is meant to be indicating, i.e. is the audit pass/fail or are there shades of gray 0-100. *NOTE: This does not necessarily mean `typeof audit.score === audit.scoringMode`, an audit can have a score of 40 with a scoringMode of `"binary"` meant to indicate display should be failure.* |
| score | <code>number</code> | The scored value determined by the audit as a number `0-1`, representing displayed scores of 0-100. |
| scoreDisplayMode | <code>"binary"&#124;"numeric"</code> | A string identifying how the score should be interpreted i.e. is the audit pass/fail (score of 1 or 0), or are there shades of gray (scores between 0-1 inclusive). |
| details | `Object` | Extra information found by the audit necessary for display. The structure of this object varies from audit to audit. The structure of this object is somewhat stable between minor version bumps as this object is used to render the HTML report.
| extendedInfo | `Object` | Extra information found by the audit. The structure of this object varies from audit to audit and is generally for programmatic consumption and debugging, though there is typically overlap with `details`. *WARNING: The structure of this object is not stable and cannot be trusted to follow semver* |
| manual | `boolean` | Indicator used for display that the audit does not have results and is a placeholder for the user to conduct manual testing. |
Expand All @@ -74,10 +73,10 @@ An object containing the results of the audits, keyed by their name.
"description": "Uses HTTPS",
"failureDescription": "Does not use HTTPS",
"helpText": "HTTPS is the best. [Learn more](https://learn-more)",
"score": false,
"score": 0,
"rawValue": false,
"displayValue": "2 insecure requests found",
"scoringMode": "binary",
"scoreDisplayMode": "binary",
"details": {
"type": "list",
"header": {
Expand Down Expand Up @@ -185,7 +184,6 @@ An array containing the different categories, their scores, and the results of t
| Name | Type | Description |
| -- | -- | -- |
| id | `string` | The string identifier of the category. |
| score | `number` | The numeric score `0-100` of the audit. Audits with a boolean score result are converted with `score ? 100 : 0`. |
| weight | `number` | The weight of the audit's score in the overall category score. |
| result | `Object` | The actual audit result, a copy of the audit object found in [audits](#audits). *NOTE: this property will likely be removed in upcoming releases; use the `id` property to lookup the result in the `audits` property.* |

Expand All @@ -205,7 +203,7 @@ An array containing the different categories, their scores, and the results of t
"weight": 1,
"result": {
"name": "is-on-https",
"score": false,
"score": 0,
...
}
}
Expand Down
Loading