-
Notifications
You must be signed in to change notification settings - Fork 11
Production Web Apps Performance Study Q4/16 - Q1/17 #1
Comments
@addyosmani nice work. Could you explain what I requested access to the page sets document, not sure it is intentionally private. |
Thanks!
High variance was indeed a problem with certain URLs. One of the challenges with studying web performance at scale is data sets are susceptible to varying quantities of noise. Some of the sets I originally used (prior to filtering) were only using a framework through a transitive dependency. One example of this were sites that pulled in all of Angular for just an ad, and so, even if the page would have otherwise had a decent TTI, their third-party includes were pushing their TTIs out very heavily. Some of the other data sets I used had URLs that suffered from the same problem so I manually removed them after some manual tracing. This study was mostly looking at pages using a framework for their core content.
The current TTI metric we've implemented in Lighthouse and WebPageTest will very occasionally return infinity for URLs (especially if they keep the main thread busy for a long time). I locally filtered out -1s/infinity values when looking at medians to account for this. My hope is eventually TTI will be reliable enough for filtering like this to not be required. |
Goals
Sample information
6000+ production sites using one of React, Angular 1.0, Ember, Polymer, jQuery or Vue.js. Site URLs were obtained from a combination of Libscore.io, BigQuery, BuiltWith-style sites, framework wikis. Sample sets were 10% eye-balled to verify usage of frameworks. Sets not reliable were discarded from the final study.
URLs: https://docs.google.com/a/google.com/spreadsheets/d/1_gqtaEwjoJGbekgeEaYLbUyR4kcp5E7uZuMHYgLJjGY/edit?usp=sharing
Trivia: All in all, 85,000 WebPageTest results were generated as part of this study. Yipes.
Tools used in study
WebPageTest.org (with enhancements such a JS cost, TTI, aggregated V8 statistics added thanks to Pat Meenan as the project progressed), Catapult (internal Google tool), Chrome Tracing.
Summary observations
This data may be useful to developers as it shows:
Where are the medians and aggregates?
The primary goals of this study were to highlight trends looking at the different data-sets available to me as a whole. Initially, I focused on summarizing this data at a per-framework level (e.g React apps in set 1 exhibited characteristic A). After reviewing this with the Chrome team, we decided presenting per-framework breakdowns were more susceptible to the takeaway being "oh, so I should just use framework X over Y because it is 2% better" instead of the important takeaways about parse/compile being a problem we all face.
To that end, the below charts are generated locally by fetching each of the WebPageTest reports for data-sets, iterating over a particular dimension (e.g Time-to-interactive, JS parse time) and getting the medians for different sets that are then plumbed into either Google Sheets or Numbers for charting. If you wish to recreate that setup yourself, you can grab the CSVs from the below reports.
Raw WebPageTest runs - Round 2 (January, 2017)
Raw WebPageTest runs - Round 1 - Older study (December, 2016)
I put together this graphic when internally sharing the first version of this study. I decided to redo it as at least the network throttling setup from this study wasn't the same between the 2-3 web perf tooling systems used. This meant that while overall weight, time in script (parse/eval), FMP and load time were fine, the TTI numbers could not be concretely confirmed as 100% accurate. Instead, I redid the study once we added support for TTI to WebPageTest and I'd trust the numbers there (Round 2) a lot more.
Other data sets generated (Dec, 2016)
Note: many of the below data sets were generated before we installed Moto G4s in WebPageTest and had to use the Moto G1 instead. Some of the data sets will also be using earlier versions of the time-to-interactive metric and should not be directly compared in most cases to the latest data from 2017. This is historical data that's interesting and may be worth reexploring where particular data-sets didn't end up making it to the final study results.
The text was updated successfully, but these errors were encountered: