Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

benchmark: use autocannon instead of wrk #7180

Closed

Conversation

bzoz
Copy link
Contributor

@bzoz bzoz commented Jun 6, 2016

(this continues the work from #6949)

Checklist
  • documentation is changed or added
  • the commit message follows commit guidelines
Affected core subsystem(s)

benchmark

Description of change

Switches http benchmarks to use autocannon npm module instead of wrk. This allows benchmarks to be executed on all supported platforms, including Windows. Running benchmarks with wrk(gist) and autocannon(gist) on a 2 core virtual machine gives very similar results.

Switches all HTTP benchmarks to use autocanon (npm module) instead of
wrk. This allows for running all HTTP benchmarks under Windows.
@nodejs-github-bot nodejs-github-bot added the benchmark Issues and PRs related to the benchmark subsystem. label Jun 6, 2016
@mcollina
Copy link
Member

mcollina commented Jun 6, 2016

Not sure if I get to vote this, but LGTM.

One thing, we might want to flag the version range of autocannon in the NPM install step. If we are going forward with this, I can release autocannon as 1.0.0 to ease maintenance (so we can say "install anything in the 1.x.x line").
Does that make sense?

@mscdex mscdex added the http Issues or PRs related to the http subsystem. label Jun 6, 2016
@mscdex
Copy link
Contributor

mscdex commented Jun 6, 2016

It seems like this could cause reliability issues since it's node-based and we'd be benchmarking changes to node's http implementation. So if a performance regression for example slips in that could affect benchmark results from version to version.

@mcollina
Copy link
Member

mcollina commented Jun 6, 2016

@mscdex the way this PR is built, it does not leverage the current node build, but rather the "global" / installed version of node.

Benchmarks are always comparative anyway, they need to be taken on the same box, using the same tools. This PR does not change that, as it will use a third node build that stays the same between the two run.

This adds some benefits: we can add some more benchmarks related to http pipelining, something that is not currently supported by wrk.

@mscdex
Copy link
Contributor

mscdex commented Jun 6, 2016

@mcollina Right, but that's exactly what I mean. Typically the global node version will change more often than say wrk or whatever and node is more likely to change in its http implementation than other tools. shrug

@mcollina
Copy link
Member

mcollina commented Jun 6, 2016

@mscdex I do not think results are comparable if they are not taken on the very same conditions. This usually means taking a 30-60 minute break from your laptop, or use another one for that time period. The only way I can get consistent results is to a) shuts down the Internet and b) close every other program I am running. From time to time, some crazy stuff starts up anyway, and then I do get different results.
And all of this is in a short time period.

I have been stabbed by this too many times to know that any slight change matters. I have seen different numbers (+/- 10%) coming out by plugging an external monitor (HDMI) in my MBP (try, maybe it's just me).
I think that comparing two results taken at different point in time, under different conditions suffer from the exact same problem you describe. I wouldn't compare results anyway, because I would not remember what were the exact condition that measurement was taken.

@rvagg
Copy link
Member

rvagg commented Jun 8, 2016

@nodejs/benchmarking your call as far as I'm concerned

@mhdawson
Copy link
Member

mhdawson commented Jun 8, 2016

Will try to take a look sometime today

@mhdawson
Copy link
Member

mhdawson commented Jun 9, 2016

Overall I like using a tool which is more platform independent. So I'm generally a +1 on the change.

I think that we probably want to call out the concern from @mscdex in the benchmark/README.md under ## Prerequisites.

What I'd suggest is that we explain that it will use the node currently in the path when the benchmarks are run and that if you want to compare results with previous runs that you should make sure that the node currently in the path does not change between those runs.

@mscdex
Copy link
Contributor

mscdex commented Jun 9, 2016

I wasn't so much concerned about the version of node changing between (immediate) runs as I was about being able to reliably compare results across different (global) versions of node (e.g. running a benchmark with a global node v5.x, then upgrading to node v6.x and re-running the same benchmark and trying to compare the two results).

@mhdawson
Copy link
Member

mhdawson commented Jun 9, 2016

@mscdex I don't think I'm quite following. You will need to keep the environment consistent if you want to compare runs (whether separate by a short or long time) and was suggesting that we call out that the global node version is one of the things you have to keep consistent.

If the concern is that the global changes for other reasons, then I'd suggest part of the benchmarking setup should be to have an install of node somewhere else and that it be added to the path so that its used for the benchmark runs.

@mscdex
Copy link
Contributor

mscdex commented Jun 10, 2016

@mhdawson For environments that are dedicated to running benchmarks, keeping consistent global node versions isn't a problem. On my own personal machine for example, where I run almost all of my benchmarks these days, I typically keep my global copy of node updated to the latest "stable"/"current" version.

Let's say I ran a particular benchmark back in January of this year and had whatever latest node v5.x installed globally at that point. Then when node v6.x was released, I upgraded my global copy to that and re-ran the same benchmark. I now run a greater risk of unreliability (IMHO) when I start comparing the two results because node's http implementation could have changed dramatically (heck, this could even happen within a single release line of node). If I were to use a global copy of wrk instead of node for the benchmark, then I would be more confident in being able to compare my two benchmark results because tools like wrk are not as likely to change in their http implementation as much as node (at least at this point).

@mcollina
Copy link
Member

mcollina commented Jun 10, 2016

@mscdex let me add some inline comments

Let's say I ran a particular benchmark back in January of this year and had whatever latest node v5.x installed globally at that point. Then when node v6.x was released, I upgraded my global copy to that and re-ran the same benchmark.

Between that laps of time, you will have installed several OS updates and updated various other software on your machine. The number will not be comparable anyway.

I now run a greater risk of unreliability (IMHO) when I start comparing the two results because node's http implementation could have changed dramatically (heck, this could even happen within a single release line of node).

Autocannon does not rely on node HTTP layer, just on net and a bunch of npm modules. Some optimization on streams or V8 could affect the numbers, but less so.

The point for a benchmarking tool is to drive the benchmarked server at 100% CPU. As long as it does that, there should be no relevant change in numbers.

This are the results take on node v4 and node v6:

$ nvm use v6
Now using node v6.2.0 (npm v3.8.9)
$ autocannon -c 100 -d 5 http://localhost:3000/hello
Running 5s test @ http://localhost:3000/hello
100 connections

Stat         Avg     Stdev    Max
Latency (ms) 4.92    3.53     162
Req/Sec      18532.8 728.2    19167
Bytes/Sec    2.08 MB 84.44 kB 2.23 MB

93k requests in 5s, 10.29 MB read
$ nvm use v4
Now using node v4.4.2 (npm v2.15.0)
$ autocannon -c 100 -d 5 http://localhost:3000/hello
Running 5s test @ http://localhost:3000/hello
100 connections

Stat         Avg     Stdev     Max
Latency (ms) 4.92    4.51      217
Req/Sec      18529.6 952.33    19759
Bytes/Sec    2.05 MB 110.83 kB 2.23 MB

93k requests in 5s, 10.28 MB read

@bzoz
Copy link
Contributor Author

bzoz commented Jun 15, 2016

I've added note about node version to README.md, PTAL

@mhdawson
Copy link
Member

@mscdex in your scenario instead of using the global version you could use another copy of node, make sure it is first in your path and then keep that for when you want to compare to older results.

@mscdex
Copy link
Contributor

mscdex commented Jun 15, 2016

@mhdawson Maintaining another, separate copy of node just for benchmarking isn't something I'd personally want to do.

@mcollina
Copy link
Member

I think comparing the benchmarks taken months from each other is not a use case for the internal node benchmarking suite. The code of the benchmarks might change from two versions of node anyway.

I think we need a "benchmarking suite" that does not reside on the node.js tree, and even that might be hard to compare between versions across time.

@mhdawson
Copy link
Member

What we have at benchmarking.nodejs.org is a mix. Some of the benchmarks (ex acme air) are not in the node.js tree while some run benchmarks which are (ex startup footprint).

I'm thinking we need broader input on this, maybe we should add for CTC discussion ?

@orangemocha
Copy link
Contributor

@mscdex and others: How about adding the target Node (the one to be benchmarked) to the path so that you can use a single executable? Or we could make the benchmarks use the target Node by default, and let users override it with an environment variable when they want to use a separate version.

@mscdex
Copy link
Contributor

mscdex commented Jun 16, 2016

@orangemocha As I mentioned, that would still involve me having to maintain a separate copy of node just for the purposes of benchmarking, which is not something I want to do.

@bzoz
Copy link
Contributor Author

bzoz commented Jun 17, 2016

@mscdex, how about we provide a way to use user-defined installation of node? User could set environmental variable with path to node to use for autocannon.

I could also add a script that would automatically download and set up this local node installation.

@mscdex
Copy link
Contributor

mscdex commented Jun 17, 2016

@bzoz My problem isn't where, it's that node would be used as the http client.

@bzoz
Copy link
Contributor Author

bzoz commented Jun 20, 2016

Why? It would be different node than the one being benchmarked. Its version could be easily kept constant.

@jbergstroem
Copy link
Member

Although not in a position to affect this decision I must say that I agree with @mscdex's concerns. You generally want a consistent benchmark platform, not something with moving parts or consistent performance enhancements.

@sieroslawos
Copy link

@bzoz fair enought to me, although I get @jbergstroem's and @mscdex's point. tough one...

@bzoz
Copy link
Contributor Author

bzoz commented Jun 21, 2016

@jbergstroem, @mscdex: I understand your concerns, it would be best if we could use wrk under Windows, but we can't.

Now, autocannon will use node that is first on PATH. You can easily assure its version stays the same. Either by putting special node version used for benchmarking first in PATH before starting http benchmarks or by using nvm. This way you can have consistent benchmark platform on all systems that node will ever support.

@mscdex
Copy link
Contributor

mscdex commented Jun 21, 2016

@bzoz Well, at the very least Windows can execute Linux binaries now, so that may be less of a problem now?

I should also add that I am not tied to wrk specifically. If there is an equivalent stable http benchmarking tool that is cross platform, then I would be open to switching to whatever. However, I am still -1 on using anything node-based as a benchmarking http client.

@bzoz
Copy link
Contributor Author

bzoz commented Jun 21, 2016

@mscdex: It only works on Windows 10.

Besides that, I think using node as benchmarker will keep things more consistent 😉. Microsoft can change their "linux implementation" whenever they want without notice.

@Fishrock123
Copy link
Contributor

Would it be more ideal to benchmark with multiple tools?

@bzoz
Copy link
Contributor Author

bzoz commented Jul 7, 2016

How about we do what @Fishrock123 suggested.

I can add support for autocannon next to wrk. On Linux both tools could be used for benchmark. On Windows we would use only autocannon. By default when running benchmarks we would try to use both tools. Benchmark would fail if none of the tools could be found. Does this sounds good?

@bzoz
Copy link
Contributor Author

bzoz commented Aug 17, 2016

Continued in #8140

@bzoz bzoz closed this Aug 17, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
benchmark Issues and PRs related to the benchmark subsystem. http Issues or PRs related to the http subsystem.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants