-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
strong-agent API not working when run using node command #3
Comments
The way strong-agent checks CPU usage in v0.4.14 is rather inefficient on some platforms but that's fixed in v0.1.0. About the segfault, can you try capturing a backtrace in gdb? Here is how you do that:
Can you include the output of |
Also, @jondubois, can you verify that the version of strong-agent you are using in both cases? Use |
uname -a Linux ip-10-182-204-34 3.2.21-1.32.6.amzn1.x86_64 #1 SMP Sat Jun 23 02:32:15 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux node -p process.config
node -p process.versions GDB backtrace (with commands used) I ran this on a basic Node.js http server script (file: s2.js)
|
Is it possible that you originally used strong-agent with an older version of v0.11? Assuming you installed it with
Also:
|
I created a new RedHat instance on EC2 (instead of the Amazon distro I was using earlier) and the problem persists. It did work though when I installed the latest version of strong-agent using: env V=1 npm_config_debug=1 npm install strong-agent That said, when I run the top command while my node process is being profiled, the %CPU of that node process shows up as 99.9% and stays around that level even though the server is idle (same as what happened when I was using strong-agent v0.4.14). node -p process.versions is: { http_parser: '2.3', |
Just to confirm, it's using 100% CPU when strong-agent is loaded? What's CPU consumption like without? Can you try the following? You may have to
Can you post the output of |
Interesting. The perf report appears to be normal. I had the CPU monitoring off for the first 15 seconds and turned it on for the remaining 15 seconds and it didn't go up to 100% as the top command suggests. The numbers coming up on the StrongOps dashboard appear to be accurate. I guess the top command doesn't play nice with strong-agent? Here's the perf:
|
That looks pretty normal, and yes, |
I'm still getting the performance issue - I don't know why it's showing up again. Usage according to StrongOps is 80% which doesn't sound right. To display the perf.data header info, please use --header/--header-only options. Samples: 3K of event 'cpu-clock' Overhead Command Shared Object Symbol
(For a higher level overview, try: perf report --sort comm,dso) |
@jondubois The call graphs suggest that the CPU profiler is running. It's not so much the CPU profiler that is the cost center here but the fact that it queries the kernel for the current time, something that is an expensive operation on most virtualized systems. If your node version is <= v0.10.26 you should consider upgrading. I added a workaround in nodejs/node-v0.x-archive@f9ced08 but it requires that the kernel is compiled with |
I'm using Node.js v0.11.14. Running on an EC2 instance - I tried running on both the Amazon distro and RedHat and I get the same issue. /boot/config-2.6.35.14-97.44.amzn1.x86_64:# CONFIG_HZ_100 is not set |
The v0.11 series don't contain the fix, I haven't forward-ported it yet. |
So, summary @jondubois, is
|
@sam-github, the seg fault happened when I installed strong-agent using: npm install strong-agent Installing it using: env V=1 npm_config_debug=1 npm install strong-agent Took care of the seg fault. So now the only issue remaining is that strong-agent appears to be using a lot of CPU. It reports an idle process as using around 75% of the total CPU (but it should really be approaching 0% since it's idle - Running it without strong-agent profiling confirms this theory). Could the command I used to install strong-agent (with config_debug=1) be the cause of this issue? I'll wait for the fix to be forward ported. |
@jondubois The debug build is compiled at -O0 so yes, it will be slower. The difference is not normally noticeable unless you use the heapdiff component, the page on the dashboard that graphs the makeup of the JS heap over time. If with node.js v0.10, you still get the segfault with release builds of the latest strong-agent, please post the stack trace and I'll take a look. The easiest way to get one is to turn on core dumps (
(I mention node.js v0.10 explicitly because the last couple of v0.11 releases have known bugs.) |
Got an email from Jon: On Mon, Nov 24, 2014 at 11:42 AM, Jonathan Gros-Dubois <[elided]> wrote:
Jon, what you are seeing is not unexpected. The CPU profiler wakes up 1,000/second to record a sample (including when the process itself is sleeping) and that's in aggregate reasonably expensive. It's not designed to be kept running indefinitely; you normally turn it on for 5 to 30 seconds to get some quick insights into what your application is doing. I'm not sure what our documentation says about it, but perhaps we need to be more explicit about that caveat (/cc @crandmck.) The next strong-agent release will have a watchdog mode where the profiler is automatically activated and deactivated when a script is taking too long to execute (paid feature, however.) I'm probably going to extend it so that it suspends the profiler when the process is sleeping. That would let you run the profiler for longer periods of time without undue overhead. |
I see. I was hoping I could use it as a long-running CPU monitoring + reporting tool but it appears to be intended for short-term debugging. I guess Strongloop doesn't offer any such tool (for tracking CPU use over time)? In any case, thanks for your time and patience :) |
Not yet / not quite. We can track aggregate CPU usage (what I've been experimenting with adjusting the sample frequency to reduce overhead while still retaining enough granularity to extract meaningful data, e.g. sampling at 333 Hz instead of 1000 Hz. It definitely lowers the overhead but it's not clear yet where the sweet spot is; maybe there isn't one and the frequency needs to be configurable or dynamic. Longer term, we may start instrumenting V8's generated machine code with RDTSC (Read Timestamp Counter) instructions at function entry/exit and loop edges, although there are some caveats:
|
* Block SIGPROF when in the epoll_pwait() system call so the event loop doesn't keep waking up on signal delivery. The clock_gettime() system call that libuv does after EINTR is very expensive on virtualized systems. * Replace sched_yield() with nanosleep() in V8's tick event processor thread. The former only yields the CPU when there is another process scheduled on the same CPU. * Fix a bug in the epoll_pwait() system call wrapper in libuv, see libuv/libuv#4. Refs strongloop/strong-agent#3 and strongloop-internal/scrum-cs#37.
@jondubois If you're willing to be a guinea pig, please give bnoordhuis/node@24852b5 a try. On my EC2 box, it reduces the overhead from the CPU profiler by about 50%. |
I added a note to the docs stating that CPU profiling is not meant to run indefinitely. |
Reduce the overhead of the CPU profiler by replacing sched_yield() with nanosleep() in V8's tick event processor thread. The former only yields the CPU when there is another process scheduled on the same CPU. Before this commit, the thread would effectively busy loop and consume 100% CPU time. By forcing a one nanosecond sleep period rounded up to the task scheduler's granularity (about 50 us on Linux), CPU usage for the processor thread now hovers around 10-20% for a busy application. Refs strongloop/strong-agent#3 and strongloop-internal/scrum-cs#37.
Reduce the overhead of the CPU profiler by suppressing SIGPROF signals when sleeping / polling for events. Avoids unnecessary wakeups when the CPU profiler is active. Depends on libuv/libuv#15. Refs strongloop/strong-agent#3 and strongloop-internal/scrum-cs#37.
I will try running it using your patch sometime this week when I can find the time. Thanks. |
Reduce the overhead of the CPU profiler by suppressing SIGPROF signals when sleeping / polling for events. Avoids unnecessary wakeups when the CPU profiler is active. Depends on https://github.com/libuv/libuv#15. Ref: strongloop/strong-agent#3 PR-URL: #8791 Reviewed-by: Trevor Norris <trev.norris@gmail.com>
Reduce the overhead of the CPU profiler by suppressing SIGPROF signals when sleeping / polling for events. Avoids unnecessary wakeups when the CPU profiler is active. Depends on https://github.com/libuv/libuv#15. Ref: strongloop/strong-agent#3 PR-URL: nodejs#8791 Reviewed-by: Trevor Norris <trev.norris@gmail.com>
Reduce the overhead of the CPU profiler by replacing sched_yield() with nanosleep() in V8's tick event processor thread. The former only yields the CPU when there is another process scheduled on the same CPU. Before this commit, the thread would effectively busy loop and consume 100% CPU time. By forcing a one nanosecond sleep period rounded up to the task scheduler's granularity (about 50 us on Linux), CPU usage for the processor thread now hovers around 10-20% for a busy application. PR-URL: #8789 Ref: strongloop/strong-agent#3 Reviewed-by: Trevor Norris <trev.norris@gmail.com>
Reduce the overhead of the CPU profiler by replacing sched_yield() with nanosleep() in V8's tick event processor thread. The former only yields the CPU when there is another process scheduled on the same CPU. Before this commit, the thread would effectively busy loop and consume 100% CPU time. By forcing a one nanosecond sleep period rounded up to the task scheduler's granularity (about 50 us on Linux), CPU usage for the processor thread now hovers around 10-20% for a busy application. PR-URL: nodejs/node-v0.x-archive#8789 Ref: strongloop/strong-agent#3 Reviewed-by: Trevor Norris <trev.norris@gmail.com> Signed-off-by: Jeroen Ooms <jeroenooms@gmail.com>
I'm running Node.js v0.11.14
I have a custom multi-process Node.js deployment and I want to track each process PID under a common app name. I couldn't get it working with strong-agent version >= 1.0.0 - It launches properly (the correct app name and PID show up on StrongOps) but I get a segmentation fault as soon as I turn on the CPU profiler from StrongOps.
This error happens regardless of what process I'm trying to track - I created a really basic Node.js server and the segmentation fault still happens (so it's not related to my specific project).
Note that everything works fine if I launch using:
Unfortunately, this is not possible in my case since I have a custom multi-process architecture involving sticky load-balancer, worker and data-store processes.
So I would like to be able to run my app using:
and then use:
to track my metrics.
I did manage to get it working (kind of) using strong-agent version 0.4.14 (it doesn't crash with a seg fault) but when I check my CPU load using the top command (while my app is running with strong-agent), it shows that the process which I am monitoring is always using approx 100% of CPU even when it should in fact be idle - I tested with and without strong-agent and I can confirm that the CPU approaches 0% when strong-agent is not running. It looks like strong-agent is adding a lot of CPU overhead when it's running (maybe this is a bug with version 0.4.14).
Using v0.4.14 I can see graphs come up in StrongOps but they report very high CPU usage with a baseline of around 80% which is not accurate.
The text was updated successfully, but these errors were encountered: