Install Linux perf on some Linux machines #1274

mmarchini · 2018-05-16T17:20:53Z

Linux perf has been broken on V8 since the Turbofan/Ignition pipeline became the default compiler. Recently on V8 6.7, we got it back to work (through a flag). Since there are some Node.js tools and some huge Node.js deployments relying on Linux perf (and other external profilers), having tests will help to keep these tools more stable.

For those tests to work, we need Linux perf available on at least some Linux machines. We could start on Ubuntu 16.04 machines and if we want we can install it on other machines later. The package can be installed with: apt install linux-tools-generic. Is it feasible?

Ref: nodejs/node#20783

The text was updated successfully, but these errors were encountered:

mhdawson · 2018-05-16T22:19:20Z

@rvagg would it make sense to have it installed in the images supporting the containerized builds ?

@mmarchini which of these two is it:

the tests are part of the core Node.js tests but detect if perf is available and don't run if its not
the tests are separate and we'll have a different job to run them?

mmarchini · 2018-05-16T22:27:01Z

The first one (if the PR lands)

rvagg · 2018-05-17T01:52:34Z

Are we talking about the broken --perf-basic-prof stuff here? What would tests look like for this? Is it a simple matter of passing the output through perf and asserting on the results?

I don't think we're going to be able to jam this into the containers unfortunately, even though that's a great place for these kinds of tests. perf is tied to the kernel version and that's detached from the OS once you're using containers since you use the host kernel version. Unless we have a match of host OS and container OS. Right now we do have a match, 16.04 on host and in container, but they are not updated at the same time so they get out of sync a bit (I'm not sure minor kernel versions matter much for perf, however). But I'm not sure we want to be restricted to always running the same host OS and container OS for these "sharedlibs" containers.

A better approach might just be installing perf on one of our Ubuntu LTS hosts, maybe 18.04, and let the tests detect its presence and run if its there and skip if it's not. We can lock this in to our ansible scripts so we always get linux-tools-generic installed on some of those hosts.

bnoordhuis · 2018-05-17T05:13:42Z

perf is tied to the kernel version and that's detached from the OS once you're using containers since you use the host kernel version.

That's the theory but in practice you can just mix them. For example: I ran stock perf 3.13 on Ubuntu 14.04 for a while against the latest, continually upgraded mainline kernel and it worked just fine.

(Yes, I run mainline kernels. I even test -rc's.)

joyeecheung · 2018-05-17T05:53:13Z

I usually just clone the Linux master and cd into the perf source to compile it when I need perf on my servers. Does not seem to cause problems when I upgrade kernels from time to time.

joyeecheung · 2018-05-17T06:12:00Z

Can we use the benchmark CI machine for this? It would be nice to have perf there, I am thinking about using perf stat in a core benchmark to measure the start up time for our lazy-loading effort.

mmarchini · 2018-05-17T14:25:26Z

Are we talking about the broken --perf-basic-prof stuff here?

Yes. It will be unbroken once V8 6.7 lands and the tests would help us to know in advance if changes on V8 break it again.

Is it a simple matter of passing the output through perf and asserting on the results?

Yes. The proposed test can be seen here: nodejs/node#20783

let the tests detect its presence and run if its there and skip if it's not

The proposed test already does that.

A better approach might just be installing perf on one of our Ubuntu LTS hosts, maybe 18.04

+1 for that, I don't think we need to run this test on containers (at least not for now)

Can we use the benchmark CI machine for this? It would be nice to have perf there, I am thinking about using perf stat in a core benchmark to measure the start up time for our lazy-loading effort.

Looks like a good idea, as long as we have perf on at least one machine that runs tests for nodejs/node-v8 as well.

mhdawson · 2018-05-17T21:14:49Z

Installing on the benchmarking machine sounds good to me (although I think it would be the 2 additional benchmark machines as opposed to the ones used for the nightly runs). The only challenge is that if we only have it installed on a single machine then we'd want to run the test nightly as opposed to being part of every regression test (we not want, but probably only be able to). If nightly is ok, then we can easily setup a job to run once a day pinned to the benchmarking machine, just like the benchmark jobs.

Ideally, we'd really like it to be run on every PR that upgrade V8 as well. If we had the nightly job then it would just be a matter of ensuring that was added to the list of what to run to validate when updating V8.

rvagg · 2018-05-17T22:38:42Z

let me get linux-tools-generic on some ubuntu machines we already have in CI, the test that's in the WIP should pick it up and skip otherwise so I think that'll solve this

perf on the benchmarking machines would be a good idea regardless, I'll look at that at the same time, the intel/nearform ones mostly go through ansible so I think we can just put the changes in there.

mhdawson · 2018-05-17T22:42:05Z

@rvagg getting it on some of the ubuntu machines is good, but I think we want to be sure the tests runs regularly and I don't think we can be confident we'll get that regularly through chance.

Just to say I don't think we only want to rely on the chance that it runs on those machines during the regular regression runs. Once it is installed on some of the machines we can set up a job that only runs on that subset and make sure it runs at some interval.

If we have it on enough machines we could make that job(the one that runs on the subset) run as part of the regular regression job as opposed to at some interval.

mmarchini · 2018-05-18T14:08:33Z

Ideally, we'd really like it to be run on every PR that upgrade V8 as well

I still think it's more important to run on every test run at nodejs/node-v8. Usually we open the PR to update V8 after the version branch-cut, so if the update breaks perf it might be already too late to fix it (remember: perf is not officially supported by the V8 team). Ideally the test should also run on the PR, but the priority should be to make sure it runs on nodejs/node-v8.

joyeecheung · 2018-05-21T18:45:30Z

Looking at nodejs/node#20783 I think it is also possible to write the test as a benchmark similar to how the http/http2 benchmarks are structured (also have external dependencies like wrk), and write a test for the benchmark controlling the parameters so that it does not take much time to run...or if we can get the benchmark job run on arbitrary base and PR, we can simply run the benchmark for v8 updates (which is also worth doing regardless of the perf test).

mmarchini · 2018-05-22T14:06:50Z

we can simply run the benchmark for v8 updates (which is also worth doing regardless of the perf test)

Totally agree, running benchmarks on v8 updates would be a good idea.

I think it is also possible to write the test as a benchmark

Because it's easier to setup the infra for benchmarks or there are concerns about the test speed? The test takes only a few seconds to run (should be below 10s even on slower machines).

BTW, I'm open to help install perf on the required machines.

joyeecheung · 2018-05-22T15:58:25Z

@mmarchini Mostly because it depends on external tools for stats, similar to how the HTTP/HTTP2 benchmarks work. Also the benchmark machines are supposed to only run one job at a time for the stability of the results, so no benchmarks should be run when running tests on them.

Although come to think of it, maybe we could put the post-mortem and perf tests in a new directory under test that don't get run by default (since the results should only change during V8 updates), and use https://ci.nodejs.org/job/node-stress-single-test/build?delay=0sec to run the subset?

mmarchini · 2018-05-22T16:06:05Z

maybe we could put the post-mortem and perf tests in a new directory under test that don't get run by default

I like this idea. I'll add a commit to nodejs/node#20783 moving those tests to a new directory.

rvagg · 2018-05-30T23:27:17Z

I'm weary about this one. I've done some experimenting and debian is a bit of a mess but ubuntu maintains linux-tools-generic pretty nicely. However, it's strict about matching kernel version to perf version, they have to be exactly the same down to tags:

$ perf
WARNING: perf not found for kernel 4.4.0-119

  You may need to install the following packages for this specific kernel:
    linux-tools-4.4.0-119-generic
    linux-cloud-tools-4.4.0-119-generic

  You may also want to install one of the following packages to keep up to date:
    linux-tools-generic
    linux-cloud-tools-generic
$ echo $?
2

^ in this case it's because the server is running 4.4.0-119 but it's had one or two kernel updates without reboot since that point and is up to 4.4.0-127, ready to run once a reboot happens. So linux-tools-generic installs linux-tools-4.4.0-127 and gives that error. Reboot and it's all fine because we have 4.4.0-127 everywhere. Note that it's not actually just a "warning", it's fatal with a non-zero exit code and it doesn't run anything useful.

We don't currently tie rebooting to software updates. It's very common for a new kernel to be installed but not activated until a reboot and those reboots may not happen for long periods of time. Updates may be done manually or as part of another Ansible run against the machine but we still end up in that awkward state if there's a kernel update but no reboot.

I think we're going to run into the same thing, but maybe worse, in Docker give the host/container mismatch problem and the longer caching of container layers for building.

Perhaps this error is only for packaged perf and if you build from source it's different? I was always under the impression it was strictly tied no matter how you get it.

joyeecheung · 2018-05-31T06:37:15Z

Perhaps this error is only for packaged perf and if you build from source it's different?

i think that is the case. From my server

root@vultr:~# perf --version
perf version 4.16.rc7.g3eb2ce8
root@vultr:~# uname -a
Linux vultr.guest 4.13.7-041307-generic #201710141430 SMP Sat Oct 14 14:31:12 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

The perf there was built from source

Ansible role to install Linux perf on Ubuntu by cloning the Linux source code and building tools/perf to avoid Kernel mismatch errors. Ref: nodejs#1274

Install Linux perf on Ubuntu 16.04 machines through jenkins/worker/create playbook. Ref: nodejs#1274

Ansible role to install Linux perf on Ubuntu by cloning the Linux source code and building tools/perf to avoid Kernel mismatch errors. PR-URL: #1231 Ref: #1274 Reviewed-By: Jon Moss <me@jonathanmoss.me>

maclover7 · 2018-06-09T20:28:45Z

Fixed by #1321

mmarchini mentioned this issue May 16, 2018

test: add test for Linux perf nodejs/node#20783

Closed

7 tasks

mmarchini pushed a commit to mmarchini/build that referenced this issue Jun 6, 2018

ansible: add role to install Linux perf on Ubuntu

a56753f

Ansible role to install Linux perf on Ubuntu by cloning the Linux source code and building tools/perf to avoid Kernel mismatch errors. Ref: nodejs#1274

mmarchini mentioned this issue Jun 6, 2018

ansible: add role to install Linux perf on Ubuntu #1321

Merged

mmarchini pushed a commit to mmarchini/build that referenced this issue Jun 8, 2018

ansible: install perf on jenkins playbook

0a77993

Install Linux perf on Ubuntu 16.04 machines through jenkins/worker/create playbook. Ref: nodejs#1274

mmarchini pushed a commit to mmarchini/build that referenced this issue Jun 8, 2018

ansible: install perf through jenkins playbook

14819eb

Install Linux perf on Ubuntu 16.04 machines through jenkins/worker/create playbook. Ref: nodejs#1274

maclover7 added infra ci-public ansible labels Jun 9, 2018

maclover7 closed this as completed Jun 9, 2018

mmarchini mentioned this issue Jun 14, 2018

run make test-v8-updates as a sub-job of node-test-commit-v8-linux #1342

Closed

mmarchini mentioned this issue Apr 16, 2019

Linux perf tests are not running for node-test-commit-v8-linux #1774

Closed

ghost deleted a comment Oct 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Install Linux perf on some Linux machines #1274

Install Linux perf on some Linux machines #1274

mmarchini commented May 16, 2018 •

edited

Loading

mhdawson commented May 16, 2018

mmarchini commented May 16, 2018

rvagg commented May 17, 2018

bnoordhuis commented May 17, 2018

joyeecheung commented May 17, 2018

joyeecheung commented May 17, 2018

mmarchini commented May 17, 2018

mhdawson commented May 17, 2018

rvagg commented May 17, 2018

mhdawson commented May 17, 2018 •

edited

Loading

mmarchini commented May 18, 2018

joyeecheung commented May 21, 2018 •

edited

Loading

mmarchini commented May 22, 2018

joyeecheung commented May 22, 2018 •

edited

Loading

mmarchini commented May 22, 2018

rvagg commented May 30, 2018

joyeecheung commented May 31, 2018

maclover7 commented Jun 9, 2018

Install Linux perf on some Linux machines #1274

Install Linux perf on some Linux machines #1274

Comments

mmarchini commented May 16, 2018 • edited Loading

mhdawson commented May 16, 2018

mmarchini commented May 16, 2018

rvagg commented May 17, 2018

bnoordhuis commented May 17, 2018

joyeecheung commented May 17, 2018

joyeecheung commented May 17, 2018

mmarchini commented May 17, 2018

mhdawson commented May 17, 2018

rvagg commented May 17, 2018

mhdawson commented May 17, 2018 • edited Loading

mmarchini commented May 18, 2018

joyeecheung commented May 21, 2018 • edited Loading

mmarchini commented May 22, 2018

joyeecheung commented May 22, 2018 • edited Loading

mmarchini commented May 22, 2018

rvagg commented May 30, 2018

joyeecheung commented May 31, 2018

maclover7 commented Jun 9, 2018

mmarchini commented May 16, 2018 •

edited

Loading

mhdawson commented May 17, 2018 •

edited

Loading

joyeecheung commented May 21, 2018 •

edited

Loading

joyeecheung commented May 22, 2018 •

edited

Loading