Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trace event support for Node.js #9304

Closed
wants to merge 22 commits into from
Closed

Trace event support for Node.js #9304

wants to merge 22 commits into from

Conversation

matthewloring
Copy link

@matthewloring matthewloring commented Oct 26, 2016

Checklist
  • make -j8 test (UNIX), or vcbuild test nosign (Windows) passes
  • documentation is changed or added
  • commit message follows commit guidelines
Affected core subsystem(s)

src

Description of change

Background: nodejs/diagnostics#30, nodejs/diagnostics#53

This PR adds support for trace-event tracing to Node.js. It provides a mechanism to centralize tracing information generated by V8, Node core, and userspace code. The PR contains a few components:

  • A trace writer responsible for serializing traces and cycling the output files so that no individual file becomes to large.
  • A buffer for aggregating traces to allow for batched flushes.
  • An agent which initializes the tracing controller and ensures that trace serialization is done on a separate thread.
  • A set of macros for generating trace events.

Generating Trace Events

This PR introduces macros for recording trace events. Synchronous operations can be traced by wrapping them with the appropriate macros:

TRACE_EVENT_BEGIN0("MY_CATEGORY", "SomethingCostly")
doSomethingCostly()
TRACE_EVENT_END0("MY_CATEGORY", "SomethingCostly")

Asynchronous operations can be traced using:

void StartAsyncOperation(int operation_id, ...) {
  TRACE_EVENT_NESTABLE_ASYNC_BEGIN0("category", "operation", operation_id);
  ...
}
void EndAsyncOperation(int operation_id, ...) {
  TRACE_EVENT_NESTABLE_ASYNC_END0("category", "operation", operation_id);
  ...
}

An example of async traces can be seen below. The example was generated by integrating the above macros into Async-Wrap.

Additional examples as well as a complete list of trace macros can be found in src/tracing/trace_event.h. This PR does not introduce any JS interface for generating events or tracing of Node core with the hope that the community can determine how the instrumentation of core should be approached and how this functionality can be exposed to end users.

Viewing Trace Events

The current trace serialization outputs traces in a format that can be visualized by navigating to chrome://tracing in Chrome and loading a trace output file.

w85xe3osv1u

We can zoom in on this visualization to view VM events alongside the opening and closing of TCP connections:

dsksrjkirjkirjkk69xc63ny39g3a3zhbdtmw4aaaaaelftksuqmcc

Usage

Tracing can be enabled by running:

node --trace-events-enabled server.js

By default, this will record all events in the "v8" and "node" categories. To trace other categories or ignore either of these categories, use the --trace-event-categories flag:

node --trace-events-enabled --trace-event-categories v8,custom-category server.js

This PR is the result of a lot of hard work by @misterpoe, @kjin, @lpy, @fmeawad, and @ofrobots.

Thanks for the review!

/cc @nodejs/diagnostics

@matthewloring matthewloring added the c++ Issues and PRs that require attention from people who are familiar with C++. label Oct 26, 2016
@nodejs-github-bot nodejs-github-bot added build Issues and PRs related to build files or the CI. c++ Issues and PRs that require attention from people who are familiar with C++. labels Oct 26, 2016
@mscdex mscdex removed the build Issues and PRs related to build files or the CI. label Oct 27, 2016
@AndreasMadsen
Copy link
Member

Awesome

An example of async traces can be seen below. The example was generated by integrating the above macros into Async-Wrap.

Should we consider doing this by default?

@matthewloring
Copy link
Author

@AndreasMadsen I think that could be a good idea and it did generate useful information but I want to keep this PR focused on the core tracing system. I am hoping that instrumentation to generate trace events can be left to a follow on.

@cjihrig
Copy link
Contributor

cjihrig commented Oct 27, 2016

Is it possible to add docs and tests?

@matthewloring
Copy link
Author

@cjihrig Working on tests now. Are the debugger docs the best place to put doc changes or should they go somewhere else?

@cjihrig
Copy link
Contributor

cjihrig commented Oct 27, 2016

Good question. Do you think there will be enough content to warrant a separate Tracing documentation page? If not, then the Debugger docs or even guides might be an appropriate place.

Others can weigh in here too.

@sam-github
Copy link
Contributor

In the example above, trace config doesn't look complex enough to warrant a JSON config file. It can be difficult to arrange for files on disk to appear when wanting to configure node behaviour.

I suggest the example JSON config file would be better expressed as

node --trace-include=v8,custom_category --trace-exclude=node ...

@matthewloring
Copy link
Author

@cjihrig I've added an end to end test and put some documentation in the debugger docs. Happy to move it if there is a better place.

@sam-github I've updated the description to list the full set of supported configuration options. Though I would anticipate included and excluded categories to be the options relevant to node, V8 does provide additional configuration.

@@ -0,0 +1,29 @@
'use strict';

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you include ../common

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

const cp = require('child_process');

const CODE = 'for (var i = 0; i < 100000; i++) { \"test\" + i }';
const FILE_NAME = 'node_trace.1.log';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file should probably be written to common.tmpDir.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

const CODE = 'for (var i = 0; i < 100000; i++) { \"test\" + i }';
const FILE_NAME = 'node_trace.1.log';

fs.access(FILE_NAME, (err) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use common.fileExists() to avoid a level of indentation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

const proc = cp.spawn(process.execPath,
[ '--enable-tracing', '-e', CODE ]);

proc.once('exit', () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add common.mustCall() around the callback.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@sam-github
Copy link
Contributor

@matthewloring I still think the configuration should be via CLI options. JSON should be avoided unless necessary, where necessary means "complex data structures", which this doesn't have, its a bunch of boolean or string flags. Its not even nice for dev use... typing CLI options is easy, seeing CLI docs in the help output is easy, having to write your options into a file for a one off trace run is not easy. Sure, do it for permanent config like .eslintrc, but ephemeral config like tracing? Doesn't make sense to me.

I have tooling that allows node monitoring and tracing to be enabled in production, and I've been looking forward to the v8 trace lib getting integrated, pretty excited about this.

Restarting node with new CLI options is easy, having to write my config into filesystem (somewhere, where?) is not convenient. There isn't even a way to guarantee its cleaned up afterwards. I can do it (well, assuming I'm not running in docker with a read-only FS I can do it), but its not a good way.

proc.once('exit', common.mustCall(() => {
assert(common.fileExists(FILE_NAME));
fs.readFile(FILE_NAME, (err, data) => {
fs.unlinkSync(FILE_NAME);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiniest of nits that can totally be ignored: No need to clean up after the test in the temp directory. Each test that needs to use the temp directory runs common.refreshTmpDir() which will clean up anything there.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks.

@matthewloring
Copy link
Author

@sam-github That makes a lot of sense. I'm happy to introduce command line flags for included/excluded categories and get rid of the json configuration. I'm going to give it a little time before changing this in case anyone has other ideas on configuration approaches.

@Fishrock123
Copy link
Contributor

Does this have any performance impact when off?

@fmeawad
Copy link

fmeawad commented Oct 28, 2016

@Fishrock123 The performance impact is minimal specially when off

For every call site, we check if the category is enabled or disabled the first time the call site was visited, and this value is cached so that we do not have to do the expensive category check again.

If tracing was off, then it was enabled: all cached references are updated. And we do that to avoid doing the category check every time we visit the call site.

But that limits the ability to change the enabled categories when tracing is already on. Instead, we need to stop change, then start again.

That being said, trace-events should not be used to track high frequency event*, sampling and runtime stats should be used instead (both are getting standardized in tracing).
*events in tight loops.

@trevnorris
Copy link
Contributor

@fmeawad Ignoring the tight loop scenario, a single http request can make 12-18 async calls. A high speed proxy can process upwards of 30k reqs/sec. Which results in 360k-540k async calls/sec. This ignores sync calls that would also be traced (e.g. writes to a socket that are immediately flushed to the kernel). At that volume, what performance impact would we expect to see with this both enabled and disabled?

@trevnorris
Copy link
Contributor

On the side, this PR doesn't replace async hooks in any way. The two are complimentary.

`--trace-config` flag:

```txt
node --enable-trace --trace-config=trace-config.json server.js
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--enable-tracing

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed.

node --enable-trace --trace-config=trace-config.json server.js
```

where `trace-config.json` may override any of the default configuration values:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More clarification would be helpful here:

  • Are the values below actually the defaults, or just examples?
  • What are the valid values for record_mode?
  • What do each of those boolean flags mean?
  • Are there any other known categories for the node/v8 engine? (I realize user code can add arbitrary tracing categories.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Configuration has been simplified to just specify enabled categories through a command line flag. Other configuration options are not useful at this time (only one record_mode supported by the buffer, systrace not being used by node, etc). We are working on getting a list of categories compiled here: https://github.com/v8/v8/wiki/Tracing%20V8. Currently node does not introduce any categories of its own. The node category introduced by this PR is just a placeholder and can be updated when the community decides how/where trace points should be added to core.

"enable_systrace": true,
"enable_argument_filter": true,
"included_categories": [ "v8", "node" ],
"excluded_categories": [ ]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does excluding categories actually do anything? As far as I can see, TraceConfig::IsCategoryGroupEnabled() ignores that excluded category list, and the list is not checked anywhere else either.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used by Chrome since they specify enabled categories using regular expressions which can be refined by excluded categories. The node/v8 category parser doesn't support regexes so this isn't needed at this time.

std::string str((std::istreambuf_iterator<char>(fin)),
std::istreambuf_iterator<char>());
TraceConfigParser::FillTraceConfig(env->isolate(), trace_config,
str.c_str());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the above code will crash if the trace config file is not found or has invalid contents. Is that acceptable? Or should there be a more helpful error message?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

@fmeawad
Copy link

fmeawad commented Oct 31, 2016

@trevnorris According to my math, If you are using a single thread for that, an async call can take as low as 1.5us, a trace event can take up to 4us, therefore I do not recommend using trace events for that. The off overhead is going to be high as well, it will be ~10% here.

But it seems this is an easy experiment to conduct, and that would be my recommendation here.

@matthewloring
Copy link
Author

matthewloring commented Oct 31, 2016

@trevnorris Just for a data point, adding the trace events used to generate the graphs above to async-wrap (but with tracing disabled) caused no change in request latency and ~2% reduction in requests per second for a server with no logic:

var server = http.createServer(function(req, res) {
  res.end('Hello world');
});

The % performance impact would be even lower if the server did anything before responding. We will definitely need to be careful about how many trace points are added and where they go.

@joshgav joshgav mentioned this pull request Oct 31, 2016
2 tasks
bool TraceConfigParser::GetBoolean(Isolate* isolate, Local<Context> context,
Local<Object> object, const char* property) {
Local<Value> value = GetValue(isolate, context, object, property);
if (value->IsNumber()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be:

if (value->IsNumber() || value->IsBoolean())

I'm not certain IsNumber() is false for a boolean literal, but I'm assuming that's the case.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

Mutex::ScopedLock stream_scoped_lock(stream_mutex_);
if (total_traces_ >= kTracesPerFile) {
total_traces_ = 0;
// Destroying the member JSONTraceWriter object appends "]"\n"" to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"]}" to be more precise. (And no \n.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks!

@matthewloring
Copy link
Author

@sam-github I've updated the category configuration to use the --enabled-categories command line flag.

@jasongin Thank you for the review. This round of comments should all be addressed by the last commit.

Node.js application.

The set of categories for which traces are recorded can be specified using the
`--enabled-categories` flag followed by a list of comma separated category names.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably the categories command-line option should have the word 'tracing' in it somehow, so that it's more obviously associated with tracing. Maybe --tracing-categories?

And is there any way to help avoid confusion with the other command-line options starting with --trace-, which are actually about stack traces?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that's a good point. Maybe --enable-trace-events and --trace-event-categories?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to have them all start with --trace-event so its clear they go together. For example:

--trace-events-enabled
--trace-event-categories

Not sure how this fits with the way we use enabled/disabled for existing options but having all options related to trace-events start with the same prefix might make it easier to know they are related.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the flags.

@mscdex mscdex added the trace_events Issues and PRs related to V8, Node.js core, and userspace code trace events. label Jan 29, 2017
@evanlucas
Copy link
Contributor

@matthewloring This isn't landing cleanly on v7.x. Mind submitting a backport PR?

@richardlau
Copy link
Member

If this is backported please also include #10959.

@matthewloring
Copy link
Author

@evanlucas I opened a backport PR here: #11106.

@Trott
Copy link
Member

Trott commented Mar 3, 2017

Adding possibly-gratuitous but hopefully-not-harmful do-not-land labels for 4.x and 6.x. /cc @MylesBorins (3 of 3)

@jasnell
Copy link
Member

jasnell commented Mar 3, 2017

ping @nodejs/lts ... do we want this in 4 and 6?
/cc @nodejs/ctc ... thoughts?

@matthewloring
Copy link
Author

The versions of V8 in 4 and 6 do not have the APIs that this change depends on.

@jasnell jasnell mentioned this pull request Apr 4, 2017
@watilde watilde mentioned this pull request Apr 30, 2017
2 tasks
watilde added a commit to watilde/node that referenced this pull request May 1, 2017
`node_trace.*.log` files are generated by `NodeTraceWriter`.

Refs: nodejs#9304
watilde added a commit that referenced this pull request May 2, 2017
`node_trace.*.log` files are generated by `NodeTraceWriter`.

Refs: #9304
PR-URL: #12754
Reviewed-By: Сковорода Никита Андреевич <chalkerx@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
anchnk pushed a commit to anchnk/node that referenced this pull request May 6, 2017
`node_trace.*.log` files are generated by `NodeTraceWriter`.

Refs: nodejs#9304
PR-URL: nodejs#12754
Reviewed-By: Сковорода Никита Андреевич <chalkerx@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
gibfahn pushed a commit that referenced this pull request Jun 18, 2017
`node_trace.*.log` files are generated by `NodeTraceWriter`.

Refs: #9304
PR-URL: #12754
Reviewed-By: Сковорода Никита Андреевич <chalkerx@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
gibfahn pushed a commit that referenced this pull request Jun 20, 2017
`node_trace.*.log` files are generated by `NodeTraceWriter`.

Refs: #9304
PR-URL: #12754
Reviewed-By: Сковорода Никита Андреевич <chalkerx@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
MylesBorins pushed a commit that referenced this pull request Jul 11, 2017
`node_trace.*.log` files are generated by `NodeTraceWriter`.

Refs: #9304
PR-URL: #12754
Reviewed-By: Сковорода Никита Андреевич <chalkerx@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. semver-minor PRs that contain new features and should be released in the next minor version. trace_events Issues and PRs related to V8, Node.js core, and userspace code trace events.
Projects
None yet
Development

Successfully merging this pull request may close these issues.