deps: don't busy loop in v8 cpu profiler thread #25268

tunniclm · 2015-05-08T13:25:31Z

Backport 6964a9e0685fa186d9d9b7907be17505e839db1a from upstream v8.

Original commit message:

Make CPU profiler do not hog 100% of CPU.

Tick event processor should not stay in a tight loop
when there's nothing to do. It can go sleep until next sample event.

LOG=N
BUG=v8:3967
Committed: https://crrev.com/6964a9e0685fa186d9d9b7907be17505e839db1a
Cr-Commit-Position: refs/heads/master@{#28211}

Fixes #25137
Related: #9439, #8789

Backport 6964a9e0685fa186d9d9b7907be17505e839db1a from upstream v8. Original commit message: Make CPU profiler do not hog 100% of CPU. Tick event processor should not stay in a tight loop when there's nothing to do. It can go sleep until next sample event. LOG=N BUG=v8:3967 Committed: https://crrev.com/6964a9e0685fa186d9d9b7907be17505e839db1a Cr-Commit-Position: refs/heads/master@{#28211} Fixes nodejs#25137 Related: nodejs#9439, nodejs#8789

misterdjules · 2015-05-08T23:36:35Z

LGTM, thank you!

misterdjules · 2015-05-09T00:10:06Z

@tunniclm Testing this PR, I cannot even reproduce the 100% CPU issue in the first place. I've tried to reproduce it by running:

$ node --prof -e "setInterval(function(){}, 10)"

without success. How do you reproduce this problem?

tunniclm · 2015-05-11T08:15:34Z

@misterdjules The problem is with the CpuProfiler API, which is a native V8 API. The problem can be reproduced via a Node add on that starts the profiler in this way. The --prof option is not affected as the code path is different with that option (--prof writes events/samples to file, whereas CpuProfiler does runtime processing of the events between samples to keep an up-to-date mapping of code addresses.)

tunniclm · 2015-05-11T09:34:05Z

An example addon to reproduce the problem (for Node 0.12 / v8 3.28), start_profiler.cc:

#include "node.h"
#include "v8.h"
#include "v8-profiler.h"

using namespace v8;

void Start(const FunctionCallbackInfo<Value>& args) {
    Isolate* isolate = Isolate::GetCurrent();
    HandleScope scope(isolate);
    isolate->GetCpuProfiler()->StartProfiling(String::NewFromUtf8(isolate, "OurProfile"));
}

void Initialize(Handle<Object> exports) {
    NODE_SET_METHOD(exports, "start", Start);
}

NODE_MODULE(start_profiler, Initialize)

With binding.gyp:

{
    "targets": [
        {
            "target_name": "start_profiler",
            "sources": [ "start_profiler.cc" ]
        }
    ]
}

And test.js:

require('./build/Release/start_profiler').start();
setInterval(function(){}, 10);

mhdawson · 2015-05-11T20:05:41Z

lgtm

misterdjules · 2015-05-11T22:04:41Z

@tunniclm I just noticed that there is no regression test in the changes from V8 upstream. I think it would be helpful if we could come up with a regression test, as it would prevent us from breaking that again.

We would probably need to write it in a way similar to the tests in test/addons. If we can come up with something that works well, then we might also consider adding the addons tests suite to the tests suite that are run by most of the CI jobs (like for instance node-accept-pull-request).

I'm OK to land this without a regression test for now as such a regression would not be a severe issue, but I think it would help us to have a regression test sooner than later.

misterdjules · 2015-05-11T22:05:23Z

@tunniclm Also, thank you very much for the more detailed information. I was able to reproduce the problem and verify the fix in this PR, so LGTM.

misterdjules · 2015-05-11T22:09:20Z

Triggered node-accept-pull-request to land this PR, so it should be merged in v0.12 soon. Thank you @tunniclm!

Backport 6964a9e0685fa186d9d9b7907be17505e839db1a from upstream v8. Original commit message: Make CPU profiler do not hog 100% of CPU. Tick event processor should not stay in a tight loop when there's nothing to do. It can go sleep until next sample event. LOG=N BUG=v8:3967 Committed: https://crrev.com/6964a9e0685fa186d9d9b7907be17505e839db1a Cr-Commit-Position: refs/heads/master@{#28211} Fixes nodejs#25137 Related: nodejs#9439, nodejs#8789 PR: nodejs#25268 PR-URL: nodejs#25268 Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com> Reviewed-By: Julien Gilli <jgilli@fastmail.fm>

misterdjules · 2015-05-12T01:17:47Z

@joyent/node-coreteam The node-accept-pull-request Jenkins job keeps failing because test/simple/test-tls-server-verify.js takes more than 60 seconds to run on our Windows 2012 server Jenkins agent. It doesn't fail on that machine, it takes around 80 seconds and thus times out.

I made the decision to move forward with this and land this PR anyway since it was the only test failing, and I was able to make sure the test actually passes on two different Windows systems (2012 server and Windows 7).

As a result, landed in 80cdae8 and created #25279.

misterdjules · 2015-05-12T01:19:47Z

FWIW, also added a reference to this floating patch in the documentation on our V8 upgrade process in the Wiki.

tunniclm · 2015-05-18T08:44:06Z

@misterdjules I have a test that could work, but it currently has a dependency on an npm module for measuring CPU usage. Do the tests in test/addons need to be self-contained, without external dependencies?

misterdjules · 2015-06-05T23:18:14Z

@tunniclm Currently the only tests with external dependencies that I'm aware of are the ones in test/timers and test/external. The former has a dependency on the libfaketime library, which is in tools/faketime, the latter has dependencies on several npm modules.

These tests are not run automatically by the CI platform when landing pull requests. I think the reason for that are:

To keep the execution time of the tests suite that is run on every commit reasonable. The current timeout for a single test is 60 seconds, npm install or building libfaketime could take longer.
To prevent errors in the dependencies installation process from making the tests suite fail. For instance, if for some reason a GitHub repository or the npm registry is not available (which is rare), then we don't want to block the CI platform.

What is the npm module that you'd need to use in this case? If we can't manage to write this test without a dependency on a npm module, we could explore different options:

Commit the node_modules directory for this test and if it depends on a binary add-on, run npm rebuild before it runs. We would probably want to do that anyway instead of running npm install every time.
Run that test only when doing a release. That way it doesn't run on every commit, but at least we can catch issues before we publish a new release.

We might need to reorganize the tests directory a bit.

@tunniclm Thoughts?

tunniclm · 2015-06-08T10:28:50Z

@misterdjules At present, the test relies on the usage module to gather the process CPU usage. I just ran the test on Windows and found usage doesn't support Windows. Having said that, the high CPU issue is not solved on Windows anyway.

misterdjules · 2015-06-08T23:48:10Z

@tunniclm Given that the high CPU usage issue is not solved on Windows, I would think that the best solution currently would be to:

Add a regression test for this issue to test/external/v8-profiler-cpu-usage and include the npm dependencies in the repository (the whole node_modules directory).
When running python tools/test.py external, run each test in test/external/ subdirectories and run npm rebuild as a pre-run step.
Add external tests to the tests run by the CI platform during the node-test-commit job (on UNIX and on Windows.
Move test/external/ssl-options to test/ssl/options-matrix. test/external/ssl-options currently takes several minutes to complete, and the tests run for every commit should take only a few seconds at most, with a default timeout of 60 seconds, so this test shouldn't run automatically.

It seems that adding Windows support to https://github.com/arunoda/node-usage could be reasonably simple, so that approach would work on Windows too when node-usage supports Windows CPU usage gathering.

Before moving on though, I would like to get some feedback from @joyent/node-tsc and especially @orangemocha and @mhdawson because that's a slight departure from what has been done in tests run automatically by the CI platform in the past.

orangemocha · 2015-06-09T06:19:09Z

@misterdjules arguably we need a better way of defining test suites than just using subdirectories. Our test runner already supports some basic predicates to specify the expected test outcomes based on the run environment. We currently use this for defining flaky tests (see https://github.com/joyent/node/blob/v0.12/test/internet/internet.status#L5). One of the possible outcomes is SKIP, which means that the test is not run. By adding an environment var 'runtype' we might be able to use these status files to configure what gets run in CI versus other test run types.

But I think it would be best to defer these changes till after we have reconciled CI with io.js. So the directory restructuring you suggest sounds reasonable as an interim solution.

Regarding the npm rebuild phase, can that not be encapsulated in the test itself?

misterdjules · 2015-06-09T16:36:38Z

@orangemocha The npm rebuild phase could be encapsulated in the test itself, but it seems that even if abstracting something like common.npmRebuild(callback) in test/common.js, there would be some sort of duplication in every test that needs this mechanism. Having that in the tests driver would allow for more code reuse.

I don't have a very strong opinion though, if doing the npm rebuild phase in each test works and is implemented cleanly (with reusable code in test/common.js), it's already much better than having no test at all.

Thank you!

orangemocha · 2015-06-10T13:10:03Z

@misterdjules if it can be implemented in a reusable way in test/common.js, I think that would be the best option. Otherwise we'd be moving test logic into the test runner, which seems less robust.

mhdawson · 2015-06-10T16:55:18Z

@misterdjules in respect to node-usage I'd want to check its behaviour on pLinuz, zLinux and AIX

mhdawson · 2015-06-10T17:01:45Z

@ misterdjules is there some existing doc which describes the different test "buckets" and when we believe each set should be run ? A table might be most useful that lists the bucket (directory name right now) when it run (part of make test, at PR accept time, or as an extended test etc.)

That would help me to respond to your question

As I understand it now the main difference is that we'd add running test/external to what the CI runs by default ?

misterdjules · 2015-06-10T17:15:59Z

@mhdawson There is no documentation about that, it would be great to have it and link it from the contributing guide, both on nodejs.org and in CONTRIBUTING.md in the repository.

Yes, my suggestion is to add tests in test/external to what the CI runs by default. The main goal though is to add tests that require external dependencies but that run quickly to the set of tests run by the node-test-commit jobs on Jenkins. There is more than one way to do that, one of them being to add test/external to the list of tests that are run by this job.

Currently, "quickly" means "in less than 60 seconds". I'm not sure if npm rebuild can run under 60 seconds reliably on all platforms when installing a small number of native dependencies, but I think it's worth experimenting instead of leaving these tests out of the default CI tests suite from the start.

Nodejs-Jenkins added pr v0.12 labels May 8, 2015

tjfontaine removed pr v0.12 labels May 8, 2015

Nodejs-Jenkins added pr v0.12 labels May 8, 2015

misterdjules added this to the 0.12.3 milestone May 8, 2015

misterdjules added the P-2 label May 8, 2015

misterdjules closed this May 12, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deps: don't busy loop in v8 cpu profiler thread #25268

deps: don't busy loop in v8 cpu profiler thread #25268

tunniclm commented May 8, 2015

misterdjules commented May 8, 2015

misterdjules commented May 9, 2015

tunniclm commented May 11, 2015

tunniclm commented May 11, 2015

mhdawson commented May 11, 2015

misterdjules commented May 11, 2015

misterdjules commented May 11, 2015

misterdjules commented May 11, 2015

misterdjules commented May 12, 2015

misterdjules commented May 12, 2015

tunniclm commented May 18, 2015

misterdjules commented Jun 5, 2015

tunniclm commented Jun 8, 2015

misterdjules commented Jun 8, 2015

orangemocha commented Jun 9, 2015

misterdjules commented Jun 9, 2015

orangemocha commented Jun 10, 2015

mhdawson commented Jun 10, 2015

mhdawson commented Jun 10, 2015

misterdjules commented Jun 10, 2015

deps: don't busy loop in v8 cpu profiler thread #25268

deps: don't busy loop in v8 cpu profiler thread #25268

Conversation

tunniclm commented May 8, 2015

misterdjules commented May 8, 2015

misterdjules commented May 9, 2015

tunniclm commented May 11, 2015

tunniclm commented May 11, 2015

mhdawson commented May 11, 2015

misterdjules commented May 11, 2015

misterdjules commented May 11, 2015

misterdjules commented May 11, 2015

misterdjules commented May 12, 2015

misterdjules commented May 12, 2015

tunniclm commented May 18, 2015

misterdjules commented Jun 5, 2015

tunniclm commented Jun 8, 2015

misterdjules commented Jun 8, 2015

orangemocha commented Jun 9, 2015

misterdjules commented Jun 9, 2015

orangemocha commented Jun 10, 2015

mhdawson commented Jun 10, 2015

mhdawson commented Jun 10, 2015

misterdjules commented Jun 10, 2015