Benchmark numbers jump around a lot. #6

aickin · 2017-01-14T23:39:38Z

While working on #5, I noticed that the benchmark numbers, especially the qps numbers on the server, bounce around a lot from one run to the next, in some cases by as much as 50%. This often means that the benchmark is not as well designed as it could be, and the results are hard to trust.

I have a few suggestions based on my experience writing benchmarks in the past:

The benchmark shouldn't run a real server. The only difference between the three test pages is the renderToString method they call, so why not just test that? By testing it in a server setup, you make it more likely that external factors will affect the results. For example, TCP stack tuning parameters can artificially limit your connections and make it look like the code is performing worse than it is. By getting rid of the server part, you would test the actual difference between the code, which is just the renderToString methods.
The tests don't ramp up reasonably for SSR testing. V8's JITing means that code that runs slow early on can run significantly faster as it gets called more frequently and the code is compiled to native. Server-side rendering code in production is likely to spend most of its time in a fully JITed state, so when testing SSR, it's important to ramp up the code for a while before taking measurements. However, all of the tests in this suite (both renderToString and server tests), run in less than one second on my machine, which means that they likely aren't getting the full benefit of JIT. You can see this clearly if you start up a server and run the benchmark twice; the qps results from the first run will be dramatically lower than the second run. To fix this, I would suggest using the excellent benchmark.js library, which loops your test code a bunch to warm it up before measuring. As an added benefit, benchmark.js has some logic to determine how many times it needs to run a test function in order to achieve statistical significance.
The tests don't express statistical significance. Without error measurements, it's hard to tell if differences in test results mean anything. This is another great reason to use benchmark.js.
The workload is not taxing enough. In the React test, for example, the rendered app HTML is only 35 HTML elements comprising 2,601 characters. On my laptop when fully JITed, this takes from about 0.13ms to 0.16ms, depending on the framework. Frankly, that sort of difference is probably not going to be meaningful for almost any production scenario, as it's very unlikely that an operation that takes less than 2 tenths of a millisecond is going to be a large portion of your server response time. If that's the kind of page you're rendering, any of these frameworks are fine from a perf perspective. I tried bumping up the size of the page by changing the banner and the list from 5 elements to 500, which made the document size 3,004 HTML elements and 245,508 characters, and the render times went up to about 10-16ms. Still probably not a huge difference for many use cases, but it's more representative of a use case where the frameworks actually show differing perf.

I hope this is helpful, and thanks for all you're doing to move the web forward!

The text was updated successfully, but these errors were encountered:

imsobear · 2017-01-15T16:41:11Z

Fix 3 #7

This was referenced Jan 15, 2017

Compile out all calls to process.env.NODE_ENV #5

Closed

Compiled out process.env.NODE_ENV for Rax and React benchmarks #10

Closed

imsobear closed this as completed Mar 12, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark numbers jump around a lot. #6

Benchmark numbers jump around a lot. #6

aickin commented Jan 14, 2017

imsobear commented Jan 15, 2017

Benchmark numbers jump around a lot. #6

Benchmark numbers jump around a lot. #6

Comments

aickin commented Jan 14, 2017

imsobear commented Jan 15, 2017