Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark numbers jump around a lot. #6

Closed
aickin opened this issue Jan 14, 2017 · 1 comment
Closed

Benchmark numbers jump around a lot. #6

aickin opened this issue Jan 14, 2017 · 1 comment

Comments

@aickin
Copy link

aickin commented Jan 14, 2017

While working on #5, I noticed that the benchmark numbers, especially the qps numbers on the server, bounce around a lot from one run to the next, in some cases by as much as 50%. This often means that the benchmark is not as well designed as it could be, and the results are hard to trust.

I have a few suggestions based on my experience writing benchmarks in the past:

  1. The benchmark shouldn't run a real server. The only difference between the three test pages is the renderToString method they call, so why not just test that? By testing it in a server setup, you make it more likely that external factors will affect the results. For example, TCP stack tuning parameters can artificially limit your connections and make it look like the code is performing worse than it is. By getting rid of the server part, you would test the actual difference between the code, which is just the renderToString methods.
  2. The tests don't ramp up reasonably for SSR testing. V8's JITing means that code that runs slow early on can run significantly faster as it gets called more frequently and the code is compiled to native. Server-side rendering code in production is likely to spend most of its time in a fully JITed state, so when testing SSR, it's important to ramp up the code for a while before taking measurements. However, all of the tests in this suite (both renderToString and server tests), run in less than one second on my machine, which means that they likely aren't getting the full benefit of JIT. You can see this clearly if you start up a server and run the benchmark twice; the qps results from the first run will be dramatically lower than the second run. To fix this, I would suggest using the excellent benchmark.js library, which loops your test code a bunch to warm it up before measuring. As an added benefit, benchmark.js has some logic to determine how many times it needs to run a test function in order to achieve statistical significance.
  3. The tests don't express statistical significance. Without error measurements, it's hard to tell if differences in test results mean anything. This is another great reason to use benchmark.js.
  4. The workload is not taxing enough. In the React test, for example, the rendered app HTML is only 35 HTML elements comprising 2,601 characters. On my laptop when fully JITed, this takes from about 0.13ms to 0.16ms, depending on the framework. Frankly, that sort of difference is probably not going to be meaningful for almost any production scenario, as it's very unlikely that an operation that takes less than 2 tenths of a millisecond is going to be a large portion of your server response time. If that's the kind of page you're rendering, any of these frameworks are fine from a perf perspective. I tried bumping up the size of the page by changing the banner and the list from 5 elements to 500, which made the document size 3,004 HTML elements and 245,508 characters, and the render times went up to about 10-16ms. Still probably not a huge difference for many use cases, but it's more representative of a use case where the frameworks actually show differing perf.

I hope this is helpful, and thanks for all you're doing to move the web forward!

@imsobear
Copy link
Collaborator

Fix 3 #7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants