Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Install Linux perf on some Linux machines #1274

Closed
mmarchini opened this issue May 16, 2018 · 18 comments
Closed

Install Linux perf on some Linux machines #1274

mmarchini opened this issue May 16, 2018 · 18 comments

Comments

@mmarchini
Copy link
Contributor

mmarchini commented May 16, 2018

Linux perf has been broken on V8 since the Turbofan/Ignition pipeline became the default compiler. Recently on V8 6.7, we got it back to work (through a flag). Since there are some Node.js tools and some huge Node.js deployments relying on Linux perf (and other external profilers), having tests will help to keep these tools more stable.

For those tests to work, we need Linux perf available on at least some Linux machines. We could start on Ubuntu 16.04 machines and if we want we can install it on other machines later. The package can be installed with: apt install linux-tools-generic. Is it feasible?

Ref: nodejs/node#20783

@mhdawson
Copy link
Member

@rvagg would it make sense to have it installed in the images supporting the containerized builds ?

@mmarchini which of these two is it:

  • the tests are part of the core Node.js tests but detect if perf is available and don't run if its not
  • the tests are separate and we'll have a different job to run them?

@mmarchini
Copy link
Contributor Author

The first one (if the PR lands)

@rvagg
Copy link
Member

rvagg commented May 17, 2018

Are we talking about the broken --perf-basic-prof stuff here? What would tests look like for this? Is it a simple matter of passing the output through perf and asserting on the results?

I don't think we're going to be able to jam this into the containers unfortunately, even though that's a great place for these kinds of tests. perf is tied to the kernel version and that's detached from the OS once you're using containers since you use the host kernel version. Unless we have a match of host OS and container OS. Right now we do have a match, 16.04 on host and in container, but they are not updated at the same time so they get out of sync a bit (I'm not sure minor kernel versions matter much for perf, however). But I'm not sure we want to be restricted to always running the same host OS and container OS for these "sharedlibs" containers.

A better approach might just be installing perf on one of our Ubuntu LTS hosts, maybe 18.04, and let the tests detect its presence and run if its there and skip if it's not. We can lock this in to our ansible scripts so we always get linux-tools-generic installed on some of those hosts.

@bnoordhuis
Copy link
Member

perf is tied to the kernel version and that's detached from the OS once you're using containers since you use the host kernel version.

That's the theory but in practice you can just mix them. For example: I ran stock perf 3.13 on Ubuntu 14.04 for a while against the latest, continually upgraded mainline kernel and it worked just fine.

(Yes, I run mainline kernels. I even test -rc's.)

@joyeecheung
Copy link
Member

I usually just clone the Linux master and cd into the perf source to compile it when I need perf on my servers. Does not seem to cause problems when I upgrade kernels from time to time.

@joyeecheung
Copy link
Member

Can we use the benchmark CI machine for this? It would be nice to have perf there, I am thinking about using perf stat in a core benchmark to measure the start up time for our lazy-loading effort.

@mmarchini
Copy link
Contributor Author

Are we talking about the broken --perf-basic-prof stuff here?

Yes. It will be unbroken once V8 6.7 lands and the tests would help us to know in advance if changes on V8 break it again.

Is it a simple matter of passing the output through perf and asserting on the results?

Yes. The proposed test can be seen here: nodejs/node#20783

let the tests detect its presence and run if its there and skip if it's not

The proposed test already does that.

A better approach might just be installing perf on one of our Ubuntu LTS hosts, maybe 18.04

+1 for that, I don't think we need to run this test on containers (at least not for now)

Can we use the benchmark CI machine for this? It would be nice to have perf there, I am thinking about using perf stat in a core benchmark to measure the start up time for our lazy-loading effort.

Looks like a good idea, as long as we have perf on at least one machine that runs tests for nodejs/node-v8 as well.

@mhdawson
Copy link
Member

Installing on the benchmarking machine sounds good to me (although I think it would be the 2 additional benchmark machines as opposed to the ones used for the nightly runs). The only challenge is that if we only have it installed on a single machine then we'd want to run the test nightly as opposed to being part of every regression test (we not want, but probably only be able to). If nightly is ok, then we can easily setup a job to run once a day pinned to the benchmarking machine, just like the benchmark jobs.

Ideally, we'd really like it to be run on every PR that upgrade V8 as well. If we had the nightly job then it would just be a matter of ensuring that was added to the list of what to run to validate when updating V8.

@rvagg
Copy link
Member

rvagg commented May 17, 2018

let me get linux-tools-generic on some ubuntu machines we already have in CI, the test that's in the WIP should pick it up and skip otherwise so I think that'll solve this

perf on the benchmarking machines would be a good idea regardless, I'll look at that at the same time, the intel/nearform ones mostly go through ansible so I think we can just put the changes in there.

@mhdawson
Copy link
Member

mhdawson commented May 17, 2018

@rvagg getting it on some of the ubuntu machines is good, but I think we want to be sure the tests runs regularly and I don't think we can be confident we'll get that regularly through chance.

Just to say I don't think we only want to rely on the chance that it runs on those machines during the regular regression runs. Once it is installed on some of the machines we can set up a job that only runs on that subset and make sure it runs at some interval.

If we have it on enough machines we could make that job(the one that runs on the subset) run as part of the regular regression job as opposed to at some interval.

@mmarchini
Copy link
Contributor Author

Ideally, we'd really like it to be run on every PR that upgrade V8 as well

I still think it's more important to run on every test run at nodejs/node-v8. Usually we open the PR to update V8 after the version branch-cut, so if the update breaks perf it might be already too late to fix it (remember: perf is not officially supported by the V8 team). Ideally the test should also run on the PR, but the priority should be to make sure it runs on nodejs/node-v8.

@joyeecheung
Copy link
Member

joyeecheung commented May 21, 2018

Looking at nodejs/node#20783 I think it is also possible to write the test as a benchmark similar to how the http/http2 benchmarks are structured (also have external dependencies like wrk), and write a test for the benchmark controlling the parameters so that it does not take much time to run...or if we can get the benchmark job run on arbitrary base and PR, we can simply run the benchmark for v8 updates (which is also worth doing regardless of the perf test).

@mmarchini
Copy link
Contributor Author

we can simply run the benchmark for v8 updates (which is also worth doing regardless of the perf test)

Totally agree, running benchmarks on v8 updates would be a good idea.

I think it is also possible to write the test as a benchmark

Because it's easier to setup the infra for benchmarks or there are concerns about the test speed? The test takes only a few seconds to run (should be below 10s even on slower machines).

BTW, I'm open to help install perf on the required machines.

@joyeecheung
Copy link
Member

joyeecheung commented May 22, 2018

@mmarchini Mostly because it depends on external tools for stats, similar to how the HTTP/HTTP2 benchmarks work. Also the benchmark machines are supposed to only run one job at a time for the stability of the results, so no benchmarks should be run when running tests on them.

Although come to think of it, maybe we could put the post-mortem and perf tests in a new directory under test that don't get run by default (since the results should only change during V8 updates), and use https://ci.nodejs.org/job/node-stress-single-test/build?delay=0sec to run the subset?

@mmarchini
Copy link
Contributor Author

maybe we could put the post-mortem and perf tests in a new directory under test that don't get run by default

I like this idea. I'll add a commit to nodejs/node#20783 moving those tests to a new directory.

@rvagg
Copy link
Member

rvagg commented May 30, 2018

I'm weary about this one. I've done some experimenting and debian is a bit of a mess but ubuntu maintains linux-tools-generic pretty nicely. However, it's strict about matching kernel version to perf version, they have to be exactly the same down to tags:

$ perf
WARNING: perf not found for kernel 4.4.0-119

  You may need to install the following packages for this specific kernel:
    linux-tools-4.4.0-119-generic
    linux-cloud-tools-4.4.0-119-generic

  You may also want to install one of the following packages to keep up to date:
    linux-tools-generic
    linux-cloud-tools-generic
$ echo $?
2

^ in this case it's because the server is running 4.4.0-119 but it's had one or two kernel updates without reboot since that point and is up to 4.4.0-127, ready to run once a reboot happens. So linux-tools-generic installs linux-tools-4.4.0-127 and gives that error. Reboot and it's all fine because we have 4.4.0-127 everywhere. Note that it's not actually just a "warning", it's fatal with a non-zero exit code and it doesn't run anything useful.

We don't currently tie rebooting to software updates. It's very common for a new kernel to be installed but not activated until a reboot and those reboots may not happen for long periods of time. Updates may be done manually or as part of another Ansible run against the machine but we still end up in that awkward state if there's a kernel update but no reboot.

I think we're going to run into the same thing, but maybe worse, in Docker give the host/container mismatch problem and the longer caching of container layers for building.

Perhaps this error is only for packaged perf and if you build from source it's different? I was always under the impression it was strictly tied no matter how you get it.

@joyeecheung
Copy link
Member

Perhaps this error is only for packaged perf and if you build from source it's different?

i think that is the case. From my server

root@vultr:~# perf --version
perf version 4.16.rc7.g3eb2ce8
root@vultr:~# uname -a
Linux vultr.guest 4.13.7-041307-generic #201710141430 SMP Sat Oct 14 14:31:12 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

The perf there was built from source

mmarchini pushed a commit to mmarchini/build that referenced this issue Jun 6, 2018
Ansible role to install Linux perf on Ubuntu by cloning the Linux source
code and building tools/perf to avoid Kernel mismatch errors.

Ref: nodejs#1274
mmarchini pushed a commit to mmarchini/build that referenced this issue Jun 8, 2018
Install Linux perf on Ubuntu 16.04 machines through
jenkins/worker/create playbook.

Ref: nodejs#1274
mmarchini pushed a commit to mmarchini/build that referenced this issue Jun 8, 2018
Install Linux perf on Ubuntu 16.04 machines through
jenkins/worker/create playbook.

Ref: nodejs#1274
maclover7 pushed a commit that referenced this issue Jun 9, 2018
Ansible role to install Linux perf on Ubuntu by cloning the Linux source
code and building tools/perf to avoid Kernel mismatch errors.

PR-URL: #1231
Ref: #1274
Reviewed-By: Jon Moss <me@jonathanmoss.me>
@maclover7
Copy link
Contributor

Fixed by #1321

@ghost ghost deleted a comment Oct 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants