benchmark jenkins job? #8157

AndreasMadsen · 2016-08-18T12:02:37Z

Version: master
Platform: NA
Subsystem: benchmark

A problem with our current benchmark system is that it takes a long time to compare two node versions. While many of the benchmarks likely use too many iterations (and thus time), that is not the case for all of them. See for example #8139, http/simple.js takes 4 hours per node version and it appears that is necessary in this case.

I personally don't own an extra computer to run benchmarks on, thus I think it would be nice if we could create a jenkins job on ci.nodejs.org that would compare two node version on a benchmark subset like http/simple.js.

I don't know how the CI setup works, but it would have to be a machine (virtual or physical) that has a constant performance. Otherwise we would get false positive results like we have seen here: #7425 (comment)

I don't think we need a machine for every platform, just those where we prioritize performance (windows and linux?).

The text was updated successfully, but these errors were encountered:

AndreasMadsen · 2016-08-18T12:03:42Z

/cc @nodejs/benchmarking @nodejs/future-ci

jbergstroem · 2016-08-18T15:22:55Z

@AndreasMadsen we have a dedicated bare-bones machine to run benchmarks. These are published here: https://benchmarking.nodejs.org

The benchmark group could probably look into adding more tests.

gareth-ellis · 2016-08-18T15:38:30Z

I'm looking into a way of collecting regular results from the benchmarks directory - due to the sheer volume of data, we need to find a way of summarizing the scores, whilst still exposing the individual numbers.

The other problem to overcome, is to decide how we would run the benchmarks. We need to have a way of changing only one thing (we're interested in performance of Node across releases, so this is what we should change), all other components (the machine, the tests run etc) should stay the same.

The benchmarks directory is under constant change, so we need a way of ensuring that we either stick at a particular (back-level) version of benchmarks, and reuse those, or to run latest benchmarks and include an older version of node to compare to.

This could cause us issues where there's tests that we're changing based on new functionality that is not backwards compatibility.

Some comments on this would be good, so we can come to a decision and get the benchmarks running.

AndreasMadsen · 2016-08-18T18:47:30Z

@jbergstroem @gareth-ellis I don't want continues observations of performances (though that is important), I want the ability to say "please compare master and this PR using this benchmark".

Changes in benchmarks shouldn't be an issue, we could just consistently use that in the master.

Essentially I want it to run:

git checkout master

# build master
./configure
make
mv ./node ./node-master

# build pr
apply-pr $PR_ID
./configure
make
mv ./node ./node-pr

# run benchmark
./node benchmark/compare.js --old ./node-master --new ./node-pr --filter $JOB_FILTER --runs $JOB_RUNS -- $JOB_CATEGORY | tee output.csv
cat output.csv | Rscript benchmark/compare.R

edit: updated bash job to be syntactically correct

jbergstroem · 2016-08-18T19:45:30Z

@AndreasMadsen I was just saying that the benchmark group is coordinating access/time spent on our only dedicated hardware box which is most suitable for reliable results. I guess I should've been more clear about what I meant with "adding tests", as in, exposing a job or similar that we could run from Jenkins (albeit with limited access).

mhdawson · 2016-08-22T22:39:46Z

@AndreasMadsen the other consideration is that if we run on the regular machines, tying up one of the machines for 4+ hours could affect the progress off the regular builds which are much faster.

As @gareth-ellis mentioned consolidating the data down to something consumable is the challenge for regular runs.

When you say you want to run a particular benchmark for a particular PR. Is there a key subset of the benchmarks that you find yourself wanting to run as opposed to all of them ? It might be easier to tackle being able to run those first.

AndreasMadsen · 2016-08-23T07:22:34Z

@AndreasMadsen the other consideration is that if we run on the regular machines, tying up one of the machines for 4+ hours could affect the progress off the regular builds which are much faster.

I understand, but it is the same reason why we don't want to do it on our own machines.

As @gareth-ellis mentioned consolidating the data down to something consumable is the challenge for regular runs.

In this case we have the Rscript ./node benchmark/compare.R which does that. The job should just output the raw csv and the statistical summary.

When you say you want to run a particular benchmark for a particular PR. Is there a key subset of the benchmarks that you find yourself wanting to run as opposed to all of them ? It might be easier to tackle being able to run those first.

Yes, it is a subset. But the subset depends on the PR, thus I think this should be configurable parameters in the job. Please see the bash job I outlined for how this is typically accomplished.

gareth-ellis · 2016-08-23T10:09:15Z

@AndreasMadsen , do you need to have ./node in the Rscript line?

Rscript ./node benchmark/compare.R

When I run with that, I get
Error: unexpected input in ""
Execution halted

However removing ./node seems to generate some numbers.

I've prototyped something in my fork of benchmarking:

https://github.com/gareth-ellis/benchmarking/blob/master/experimental/benchmarks/community-benchmark/run.sh

I'll open a PR shortly to get some comments.

For a ci run - we'd need a job where we can supply:

mandatory BRANCH
mandatory PULL_ID
mandatory CATEGORY
optional RUNS
optional FILTER

We will need to be careful if we do implement this, that the machine is still able to continue the regular regression runs - though expanding them to include at least a subset of the benchmarks from http://github.com/nodejs/node would be beneficial

AndreasMadsen · 2016-08-23T10:12:09Z

No sorry about that. It is just Rscript benchmark/compare.R

AndreasMadsen · 2016-08-23T10:15:19Z

@gareth-ellis also the default --runs is 30, not 5. I think it will be confusing with a different default. Perhaps you should just make it like you implemented --filter.

Trott · 2017-07-10T03:46:50Z

@nodejs/build @nodejs/benchmarking Is this something that might realistically happen? Is there consensus that it would be desirable? Any clear pre-requisites (like more Ci resources)? Is there a path forward on this?

AndreasMadsen · 2017-07-10T06:35:19Z

@mcollina gave us an update in nodejs/benchmarking#58 a while ago, see nodejs/benchmarking#58 (comment)

We are currently in the process of getting that wired up to the network. It's taking a while, but it's out of our hands (physical things), as we are in the process of changing provider.

mcollina · 2017-07-10T06:49:08Z

So, the server is available and wired up to the net. It is a matter of attaching it to the rest of the infrastructure and starting making a good use of it.
Who can I work with on this? How do the credentials/etc settings work?

cc @jasnell @piccoloaiutante

refack · 2017-07-12T02:12:26Z

@nodejs/build @nodejs/benchmarking Is this something that might realistically happen? Is there consensus that it would be desirable? Any clear pre-requisites (like more Ci resources)? Is there a path forward on this?

I would actually consider one of the linuxONE machines, there are several available, and they are super fast. I'm also assuming they could be configured to have some guarantee of available resources.
Another "optimization" we could have is the ability to use a prebuilt binary as baseline (latest v8.x or nightly)

AndreasMadsen · 2017-07-12T05:49:56Z

I would actually consider one of the linuxONE machines, there are several available, and they are super fast. I'm also assuming they could be configured to have some guarantee of available resources.

Are those dedicated machines?

Another "optimization" we could have is the ability to use a prebuilt binary as baseline (latest v8.x or nightly)

Building node.js takes almost no time compared to the running the benchmarks.

gibfahn · 2017-07-12T10:49:27Z

Are those dedicated machines?

This is the issue, we need a machine that is only used for benchmarks (AIUI). We have 3 linuxone machines (2 test, one release), and they're all in pretty heavy use.

I suspect that currently the release machines are mostly idle, other than an hour or two building nightlies, but I'm not sure it'd be feasible to use them.

If we could run this on a normal CI test machine, then that would be a lot easier to work with. Our faster machines (like the linuxone boxes) are frequently idle, and we could probably survive for an hour or two with just one (similar to when people run the stress-test job).

mcollina · 2017-07-12T11:00:56Z

We have a machine ready to hook up for this cc @jasnell.
How can we get it configured?

gibfahn · 2017-07-12T11:03:08Z

We have a machine ready to hook up for this cc @jasnell.
How can we get it configured?

You mean there's a machine in CI already, or that you have one you want to donate?

Either way you should raise an issue in the build repo, and we'll have to sort out ansible configs for it.

gareth-ellis · 2017-07-12T11:19:33Z

I think this is going to be a longer process. In an ideal world we would have infinite time across all platforms, in reality we don't have that.

We should start with the hardware that is being donated, then we can move forward and evaluate the benefit/possibility of adding additional platforms for benchmarking.

Once the machine is in the CI, we can do some exploratory runs to evaluate the results we get, and move forward from there.

refack · 2017-07-12T13:26:53Z

This is the issue, we need a machine that is only used for benchmarks (AIUI). We have 3 linuxone machines (2 test, one release), and they're all in pretty heavy use.

But if they can run the benchmark suite in short time (I'd assume < 20m) IMHO it's still a good option. We could mitigate resource fluctuations by running the comparison in a checkerboard pattern.

gareth-ellis · 2017-07-12T13:38:12Z

Not convinced that running the entire suite is an option, I see the running of benchmarks being targeted subset for a PR, or perhaps as an overall summary - this would have to be a reduced amount, last time I tried to run a compare on the whole suite, I kicked it off on Friday, then I had to abandon the run on Monday as it was still running.

(this wasnt on a zLinux / LinuxOne machine, however it was a fairly powerful machine. LinuxOne comes with clockspeeds of either 4.3Ghz or 5Ghz, so that is one reason that we see the tests and builds run quicke - even though you can't really compare 5Ghz LinuxOne with 5Ghz on Intel / power etc, they're different architecture so may achieve more or less in one clock cycle).

See nodejs/benchmarking#58 (comment) for my summary.

However, as I said, running the whole suite should not really be how we do things, targeted testing would be key.

AndreasMadsen · 2017-07-12T14:58:36Z

But if they can run the benchmark suite in short time (I'd assume < 20m) IMHO it's still a good option.

The entire suite would take days. Statistically, it is also completely useless to run the entire suite because the risk of false positives is too high.

apapirovski · 2018-04-11T12:55:37Z

This seems to be resolved now that we have a benchmark job in the CI. If any additional functionality or whatever is needed then a new issue should be opened. This one has become really stale.

AndreasMadsen added meta Issues and PRs related to the general management of the project. benchmark Issues and PRs related to the benchmark subsystem. labels Aug 18, 2016

gareth-ellis mentioned this issue Aug 23, 2016

Prototype for running a subset of benchmarks on a pull request nodejs/benchmarking#58

Merged

joyeecheung mentioned this issue Dec 25, 2016

benchmark: allow benchmarks to specify flags #10448

Closed

3 tasks

mcollina mentioned this issue Jul 12, 2017

Dedicated machine for benchmarks nodejs/build#791

Closed

apapirovski closed this as completed Apr 11, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

benchmark jenkins job? #8157

benchmark jenkins job? #8157

AndreasMadsen commented Aug 18, 2016

AndreasMadsen commented Aug 18, 2016

jbergstroem commented Aug 18, 2016

gareth-ellis commented Aug 18, 2016

AndreasMadsen commented Aug 18, 2016 •

edited

Loading

jbergstroem commented Aug 18, 2016

mhdawson commented Aug 22, 2016

AndreasMadsen commented Aug 23, 2016

gareth-ellis commented Aug 23, 2016

AndreasMadsen commented Aug 23, 2016

AndreasMadsen commented Aug 23, 2016 •

edited

Loading

Trott commented Jul 10, 2017

AndreasMadsen commented Jul 10, 2017

mcollina commented Jul 10, 2017

refack commented Jul 12, 2017

AndreasMadsen commented Jul 12, 2017

gibfahn commented Jul 12, 2017 •

edited

Loading

mcollina commented Jul 12, 2017

gibfahn commented Jul 12, 2017

gareth-ellis commented Jul 12, 2017

refack commented Jul 12, 2017

gareth-ellis commented Jul 12, 2017

AndreasMadsen commented Jul 12, 2017

apapirovski commented Apr 11, 2018

benchmark jenkins job? #8157

benchmark jenkins job? #8157

Comments

AndreasMadsen commented Aug 18, 2016

AndreasMadsen commented Aug 18, 2016

jbergstroem commented Aug 18, 2016

gareth-ellis commented Aug 18, 2016

AndreasMadsen commented Aug 18, 2016 • edited Loading

jbergstroem commented Aug 18, 2016

mhdawson commented Aug 22, 2016

AndreasMadsen commented Aug 23, 2016

gareth-ellis commented Aug 23, 2016

AndreasMadsen commented Aug 23, 2016

AndreasMadsen commented Aug 23, 2016 • edited Loading

Trott commented Jul 10, 2017

AndreasMadsen commented Jul 10, 2017

mcollina commented Jul 10, 2017

refack commented Jul 12, 2017

AndreasMadsen commented Jul 12, 2017

gibfahn commented Jul 12, 2017 • edited Loading

mcollina commented Jul 12, 2017

gibfahn commented Jul 12, 2017

gareth-ellis commented Jul 12, 2017

refack commented Jul 12, 2017

gareth-ellis commented Jul 12, 2017

AndreasMadsen commented Jul 12, 2017

apapirovski commented Apr 11, 2018

AndreasMadsen commented Aug 18, 2016 •

edited

Loading

AndreasMadsen commented Aug 23, 2016 •

edited

Loading

gibfahn commented Jul 12, 2017 •

edited

Loading