Title | AsyncHook API |
---|---|
Author | @trevnorris |
Status | ACCEPTED |
Date | 2017-01-27 |
Since its initial introduction along side the AsyncListener
API, the internal
class AsyncWrap
has slowly evolved to ensure a generalized API that would
serve as a solid base for module authors who wished to add listeners to the
event loop's life cycle. Some of the use cases AsyncWrap
has covered are long
stack traces, continuation local storage, profiling of asynchronous requests
and resource tracking. The public API is now exposed as 'async_hooks'
.
It must be clarified that the AsyncHook
API is not meant to abstract away
Node's implementation details. Observing how operations are performed, not
simply how they appear to perform from the public API, is a key point in its
usability. For a real world example, a user was having difficulty figuring out
why their http.get()
calls were taking so long. Because AsyncHook
exposed
the dns.lookup()
request to resolve the host being made by the net
module
it was trivial to write a performance measurement that exposed the hidden
culprit. Which was that DNS resolution was taking much longer than expected.
A small amount of abstraction is necessary to accommodate the conceptual nature
of the API. For example, the HTTPParser
is not destructed but instead placed
into an unused pool to be used later. Even so, at the time it is placed into
the pool the destroy()
callback will run and then the id assigned to that
instance will be removed. If that resource is again requested then it will be
assigned a new id and will run init()
again.
The intent for the initial release is to provide the most minimal set of API hooks that don't inhibit module authors from writing tools addressing anything in this problem space. In order to remain minimal, all potential features for initial release will first be judged on whether they can be achieved by the existing public API. If so then such a feature won't be included.
If the feature cannot be done with the existing public API then discussion will be had on whether requested functionality should be included in the initial API. If so then the best course of action to include the capabilities of the request will be discussed and performed. Meaning, the resulting API may not be the same as initially requested, but the user will be able to achieve the same end result.
Performance impact of AsyncHook
should be zero if not being used, and near
zero while being used. The performance overhead of AsyncHook
callbacks
supplied by the user should account for essentially all the performance
overhead.
Because of the potential ambiguity for those not familiar with the terms "handle" and "request" they should be well defined.
Handles are a reference to a system resource. Some resources are a simple
identifier. For example, file system handles are represented by a file
descriptor. Other handles are represented by libuv as a platform abstracted
struct, e.g. uv_tcp_t
. Each handle can be continually reused to access and
operate on the referenced resource.
Requests are short lived data structures created to accomplish one task. The callback for a request should always and only ever fire one time. Which is when the assigned task has either completed or encountered an error. Requests are used by handles to perform these tasks. Such as accepting a new connection or writing data to disk.
When both "handle" and "request" are being addressed it is simply called a "resource".
Here is a quick overview of the entire API. All of this API is explained in more detail further down.
// Standard way of requiring a module. Snake case follows core module
// convention.
const async_hooks = require('async_hooks');
// Return the id of the current execution context. Useful for tracking state
// and retrieving the resource of the current trigger without needing to use an
// AsyncHook.
const cid = async_hooks.currentId();
// Return the id of the resource responsible for triggering the callback of the
// current execution scope to fire.
const tid = async_hooks.triggerId();
// Propagating the correct trigger id to newly created asynchronous resource is
// important. To make that easier triggerIdScope() will make sure all resources
// created during callback() have that trigger. This also tracks nested calls
// and will unwind properly.
async_hooks.triggerIdScope(triggerId, callback);
// Create a new instance of AsyncHook. All of these callbacks are optional.
const asyncHook = async_hooks.createHook({ init, before, after, destroy });
// Allow callbacks of this AsyncHook instance to fire. This is not an implicit
// action after running the constructor, and must be explicitly run to begin
// executing callbacks.
asyncHook.enable();
// Disable listening for all new asynchronous events.
asyncHook.disable();
// The following are the callbacks that can be passed to createHook().
// init() is called during object construction. The resource may not have
// completed construction when this callback runs. So all fields of the
// resource referenced by "id" may not have been populated.
function init(id, type, triggerId, resource) { }
// before() is called just before the resource's callback is called. It can be
// called 0-N times for handles (e.g. TCPWrap), and should be called exactly 1
// time for requests (e.g. FSReqWrap).
function before(id) { }
// after() is called just after the resource's callback has finished, and will
// always fire. In the case of an uncaught exception after() will fire after
// the 'uncaughtException' handler, or domain, has handled the exception.
function after(id) { }
// destroy() is called when an AsyncWrap instance is destroyed. In cases like
// HTTPParser where the resource is reused, or timers where the resource is
// only a JS object, destroy() will be triggered manually to fire
// asynchronously after the after() hook has completed.
function destroy(id) { }
// The following is the recommended embedder API.
// AsyncEvent() is meant to be extended. Instantiating a new AsyncEvent() also
// triggers init(). If triggerId is omitted then currentId() is used.
const asyncEvent = new AsyncEvent(type[, triggerId]);
// Call before() hooks.
asyncEvent.emitBefore();
// Call after() hooks. It is important that before/after calls are unwound
// in the same order they are called. Otherwise an unrecoverable exception
// will be made.
asyncEvent.emitAfter();
// Call destroy() hooks.
asyncEvent.emitDestroy();
// Return the unique id assigned to the AsyncEvent instance.
asyncEvent.asyncId();
// Return the trigger id for the AsyncEvent instance.
asyncEvent.triggerId();
// The following calls are specific to the embedder API. This API is considered
// more low-level than the AsyncEvent API, and the AsyncEvent API is
// recommended over using the following.
// Return new unique id for a constructing resource. This value must be
// manually tracked by the user for triggering async events.
const id = async_hooks.newId();
// Retrieve the triggerId for the constructed resource. This value must be
// manually tracked by the user for triggering async events.
async_hooks.initTriggerId(id);
// Set the trigger id for the next asynchronous resource that is created. The
// value is reset after it's been retrieved. This API is similar to
// triggerIdScope() except it's more brittle because if a constructor fails the
// then the internal field may not be reset.
async_hooks.setInitTriggerId(id);
// Call the init() callbacks. It is recommended that resource be passed. If it
// is not then "null" will be passed to the init() hook.
async_hooks.emitInit(id, type[, triggerId[, resource]]);
// Call the before() callbacks. The reason for requiring both arguments is
// explained in further detail below.
async_hooks.emitBefore(id, triggerId);
// Call the after() callbacks.
async_hooks.emitAfter(id);
// Call the destroy() callbacks.
async_hooks.emitDestroy(id);
The object returned from require('async_hooks')
.
- Returns {Number}
Return the id of the current execution context. Useful to track when something fires. For example:
console.log(async_hooks.currentId()); // 1 - bootstrap
fs.open(path, (err, fd) => {
console.log(async_hooks.currentId()); // 2 - open()
}):
It is important to note that the id returned has to do with execution timing.
Not causality (which is covered by triggerId()
). For example:
const server = net.createServer(function onconnection(conn) {
// Returns the id of the server, not of the new connection. Because the
// on connection callback runs in the execution scope of the server's
// MakeCallback().
async_hooks.currentId();
}).listen(port, function onlistening() {
// Returns the id of a TickObject (i.e. process.nextTick()) because all
// callbacks passed to .listen() are wrapped in a nextTick().
async_hooks.currentId();
});
- Returns {Number}
Return the id of the resource responsible for calling the callback that is currently being executed. For example:
const server = net.createServer(conn => {
// Though the resource that caused (or triggered) this callback to
// be called was that of the new connection. Thus the return value
// of triggerId() is the id of "conn".
async_hooks.triggerId();
}).listen(port, () => {
// Even though all callbacks passed to .listen() are wrapped in a nextTick()
// the callback itself exists because the call to the server's .listen()
// was made. So the return value would be the id of the server.
async_hooks.triggerId();
});
triggerId
{Number}callback
{Function}- Returns {Undefined}
All resources created during the execution of callback
will be given
triggerId
. Unless it was otherwise 1) passed in as an argument to
AsyncEvent
2) set via setInitTriggerId()
or 3) a nested call to
triggerIdScope()
is made.
Meant to be used in conjunction with the AsyncEvent
API, and preferred over
setInitTriggerId()
because it is more error proof.
Example using this to make sure the correct triggerId
propagates to newly
created asynchronous resources:
class MyThing extends AsyncEvent {
constructor(foo, cb) {
this.foo = foo;
async_hooks.triggerIdScope(this.asyncId(), () => {
process.nextTick(cb);
});
}
}
callbacks
{Object}- Returns {AsyncHook}
createHook()
returns an AsyncHook
instance that contains information about
the callbacks that will fire during specific asynchronous events in the
lifetime of the event loop. The focal point of these calls centers around the
AsyncWrap
C++ class. These callbacks will also be called to emulate the
lifetime of resources that do not fit this model. For example, HTTPParser
instances are recycled to improve performance. So the destroy()
callback will
be called manually after a request is complete. Just before it's placed into
the unused resource pool.
All callbacks are optional. So, for example, if only resource cleanup needs to
be tracked then only the destroy()
callback needs to be passed. The
specifics of all functions that can be passed to callbacks
is in the section
Hook Callbacks
.
Error Handling: If any AsyncHook
callbacks throw the application will
print the stack trace and exit. The exit path does follow that of any uncaught
exception, except for the fact that it is not catchable by an uncaught
exception handler, so any 'exit'
callbacks will fire. Unless the application
is run with --abort-on-uncaught-exception
. In which case a stack trace will
be printed and the application will exit, leaving a core file.
The reason for this error handling behavior is that these callbacks are running at potentially volatile points in an object's lifetime. For example during class construction and destruction. Because of this, it is deemed necessary to bring down the process quickly as to prevent an unintentional abort in the future. This is subject to change in the future if a comprehensive analysis is performed to ensure an exception can follow the normal control flow without unintentional side effects.
- Returns {AsyncHook} A reference to
asyncHook
.
Enable the callbacks for a given AsyncHook
instance. When a hook is enabled
it is added to a global pool of hooks to execute. These hooks do not propagate
with a single asynchronous chain but will be completely disabled when
disable()
is called.
The reason that enable()
/disable()
only work on a global scale, instead of
allowing every asynchronous branch to track their own as was done in the
previous implementation, is because it is prohibitively expensive to track
all hook instances for every asynchronous execution chain.
Callbacks are not implicitly enabled after an instance is created. One goal of
the API is to require explicit action from the user. Though to help simplify
using AsyncHook
, enable()
returns the asyncHook
instance so the call can
be chained. For example:
const async_hooks = require('async_hooks');
const hook = async_hooks.createHook(callbacks).enable();
- Returns {AsyncHook} A reference to
asyncHook
.
Disable the callbacks for a given AsyncHook
instance from the global pool of
hooks to be executed. Once a hook has been disabled it will not fire again
until enabled.
For API consistency disable()
also returns the AsyncHook
instance.
Key events in the lifetime of asynchronous events have been categorized into four areas. On instantiation, before/after the callback is called and when the instance is destructed. For cases where resources are reused, instantiation and destructor calls are emulated.
id
{Number}type
{String}triggerId
{Number}resource
{Object}
Called when a class is constructed that has the possibility to trigger an
asynchronous event. This does not mean the instance must trigger
before()
/after()
before destroy()
is called. Only that the possibility
exists.
This behavior can be observed by doing something like opening a resource then closing it before the resource can be used. The following snippet demonstrates this.
require('net').createServer().listen(function() { this.close() });
// OR
clearTimeout(setTimeout(() => {}, 10));
Every new resource is assigned a unique id. Since JS only supports IEEE 754
floating floating point numbers, the maximum assignable id is 2^53 - 1
(also
defined in Number.MAX_SAFE_INTEGER
). At this size node can assign a new id
every 100 nanoseconds and not run out for over 28 years. So while another
method could be used to identify new resources that would truly never run out,
doing so is not deemed necessary at this time since. If an alternative is found
that has at least the same performance as using a double
then the
consideration will be made.
The type
is a String that represents the type of resource that caused
init()
to fire. Generally it will be the name of the resource's constructor.
Some examples include TCP
, GetAddrInfo
and HTTPParser
. Users will be able
to define their own type
when using the public embedder API.
Note: It's possible to have type
name collisions. So embedders are
recommended to use a prefix of sorts to prevent this.
triggerId
is the id
of the resource that caused (or "triggered") the
new resource to initialize and that caused init()
to fire.
The following is a simple demonstration of this:
const async_hooks = require('async_hooks');
asyns_hooks.createHook({
init: (id, type, triggerId) => {
const cId = async_hooks.currentId();
process._rawDebug(`${type}(${id}): trigger: ${triggerId} scope: ${cId}`);
}
}).enable();
require('net').createServer(c => {}).listen(8080);
Output hitting the server with nc localhost 8080
:
TCPWRAP(2): trigger: 1 scope: 1
TCPWRAP(4): trigger: 2 scope: 0
The second TCPWRAP
is the new connection from the client. When a new
connection is made the TCPWrap
instance is immediately constructed. This
happens outside of any JavaScript stack (side note: a currentId()
of 0
means it's being executed in "the void", with no JavaScript stack above it).
With only that information it would be impossible to link resources together in
terms of what caused them to be created. So triggerId
is given the task of
propagating what resource is responsible for the new resource's existence.
Below is another example with additional information about the calls to
init()
between the before()
and after()
calls. Specifically what the
callback to listen()
will look like. The output formatting is slightly more
elaborate to make calling context easier to see.
'use strict';
const async_hooks = require('async_hooks');
let ws = 0;
async_hooks.createHook({
init: (id, type, triggerId) => {
const cId = async_hooks.currentId();
process._rawDebug(' '.repeat(ws) +
`${type}(${id}): trigger: ${triggerId} scope: ${cId}`);
},
before: (id) => {
process._rawDebug(' '.repeat(ws) + 'before: ', id);
ws += 2;
},
after: (id) => {
ws -= 2;
process._rawDebug(' '.repeat(ws) + 'after: ', id);
},
destroy: (id) => {
process._rawDebug(' '.repeat(ws) + 'destroy:', id);
},
}).enable();
require('net').createServer(() => {}).listen(8080, () => {
// Let's wait 10ms before logging the server started.
setTimeout(() => {
console.log('>>>', async_hooks.currentId());
}, 10);
});
Output from only starting the server:
TCPWRAP(2): trigger: 1 scope: 1
TickObject(3): trigger: 2 scope: 1
before: 3
Timeout(4): trigger: 3 scope: 3
TIMERWRAP(5): trigger: 3 scope: 3
after: 3
destroy: 3
before: 5
before: 4
TTYWRAP(6): trigger: 4 scope: 4
SIGNALWRAP(7): trigger: 4 scope: 4
TTYWRAP(8): trigger: 4 scope: 4
>>> 4
TickObject(9): trigger: 4 scope: 4
after: 4
destroy: 4
after: 5
before: 9
after: 9
destroy: 9
destroy: 5
First notice that scope
and the value returned by currentId()
are always
the same. That's because currentId()
simply returns the value of the
current execution context; which is defined by before()
and after()
calls.
Now if we only use scope
to graph resource allocation we get the following:
TTYWRAP(6) -> Timeout(4) -> TIMERWRAP(5) -> TickObject(3) -> root(1)
No where here do we see the TCPWRAP
created; which was the reason for
console.log()
being called. This is because binding to a port without a
hostname is actually synchronous, but to maintain a completely asynchronous API
the user's callback is placed in a process.nextTick()
.
The graph only shows when a resource was created. Not why. So to track
the why use triggerId
.
id
{Number}
When an asynchronous operation is triggered (such as a TCP server receiving a
new connection) or completes (such as writing data to disk) a callback is
called to notify node. The before()
callback is called just before said
callback is executed. id
is the unique identifier assigned to the
resource about to execute the callback.
The before()
callback will be called 0-N times if the resource is a handle,
and exactly 1 time if the resource is a request.
id
{Number}
Called immediately after the callback specified in before()
is completed. If
an uncaught exception occurs during execution of the callback then after()
will run after 'uncaughtException'
or a domain
's handler runs.
id
{Number}
Called either when the class destructor is run or if the resource is manually
marked as free. For core C++ classes that have a destructor the callback will
fire during deconstruction. It is also called synchronously from the embedder
API emitDestroy()
.
Some resources, such as HTTPParser
, are not actually destructed but instead
placed in an unused resource pool to be used later. For these destroy()
will
be called just before the resource is placed on the unused resource pool.
Note: Some resources depend on GC for cleanup. So if a reference is made to
the resource
object passed to init()
it's possible that destroy()
is
never called. Causing a memory leak in the application. Of course if you know
the resource doesn't depend on GC then this isn't an issue.
Library developers that handle their own I/O will need to hook into the
AsyncWrap
API so that all the appropriate callbacks are called. To
accommodate this both a C++ and JS API is provided.
- Returns {AsyncEvent}
The class AsyncEvent
was designed to be extended from for embedder's async
resources. Using this users can easily trigger the lifetime events of their
own resources.
The init()
hook will trigger when AsyncEvent
is instantiated.
Example usage:
class DBQuery extends AsyncEvent {
construtor(db) {
this.db = db;
}
getInfo(query, callback) {
this.db.get(query, (err, data) => {
this.emitBefore();
callback(err, data)
this.emitAfter();
});
}
close() {
this.db = null;
this.emitDestroy();
}
}
- Returns {Undefined}
Call all before()
hooks and let them know a new asynchronous execution
context is being entered. If nested calls to emitBefore()
are made the stack
of id
s will be tracked and properly unwound.
- Returns {Undefined}
Call all after()
hooks. If nested calls to emitBefore()
were made then make
sure the stack is unwound properly. Otherwise an error will be thrown.
If the user's callback thrown an exception then emitAfter()
will
automatically be called for all id
's on the stack if the error is handled by
a domain or 'uncaughtException'
handler. So there is no need to guard against
this.
- Returns {Undefined}
Call all destroy()
hooks. This should only ever be called once. An error will
be thrown if it is. This must be manually called. If the resource is left
to be collected by the GC then the destroy()
hooks will never be called.
- Returns {Number}
Return the unique identifier assigned to the resource. Useful when used with
triggerIdScope()
.
- Returns {Number}
Return the same triggerId
that is passed to init()
hooks.
The following API can be used as an alternative to using AsyncEvent()
, but it
is left to the embedder to manually track values needed for all emitted events
(e.g. id
and triggerId
). It is very recommended that embedders instead
use AsyncEvent
.
- Returns {Number}
Return a new unique id meant for a newly created asynchronous resource. The value returned will never be assigned to another resource.
Generally this should be used during object construction. e.g.:
class MyClass {
constructor() {
this._id = async_hooks.newId();
this._triggerId = async_hooks.initTriggerId();
}
}
- Returns {Number}
There are several ways to set the triggerId
for an instantiated resource.
This API is how that value is retrieved. It returns the id
of the resource
responsible for the newly created resource being instantiated. For example:
id
{Number} Generated by callingnewId()
type
{String}triggerId
{Number} Default:currentId()
resource
{Object} Default:null
- Returns {Undefined}
Emit that a resource is being initialized. id
should be a value returned by
async_hooks.newId()
. Usage will probably be as follows:
class Foo {
constructor() {
this._id = async_hooks.newId();
this._triggerId = async_hooks.initTriggerId();
async_hooks.emitInit(this._id, 'Foo', this._triggerId, this);
}
}
In the circumstance that the embedder needs to define a different trigger id
than currentId()
, they can pass in that id manually.
It is suggested to have emitInit()
be the last call in the object's
constructor.
id
{Number} Generated bynewId()
triggerId
{Number}- Returns {Undefined}
Notify before()
hooks the resource is about to enter its execution call
stack. If the triggerId
of the resource is different from id
then pass
it in.
Example usage:
MyThing.prototype.done = function done() {
// First call the before() hooks. So currentId() shows the id of the
// resource wrapping the id that's been passed.
async_hooks.emitBefore(this._id);
// Run the callback.
this.callback();
// Call after() callbacks now that the old id has been restored.
async_hooks.emitAfter(this._id);
};
id
{Number} Generated bynewId()
- Returns {Undefined}
Notify after()
hooks the resource is exiting its execution call stack.
Even though the state of id
is tracked internally, passing it in is required
as a way to validate that the stack is unwinding properly.
For example, no two async stack should cross when emitAfter()
is called.
init # Foo
init # Bar
...
before # Foo
before # Bar
after # Foo <- Should be called after Bar
after # Bar
Note: It is not necessary to wrap the callback in a try
/finally
and
force emitAfter()
if the callback throws. That is automatically handled by
the fatal exception handler.
id
{Number} Generated bynewId()
- Returns {Undefined}
Notify hooks that a resource is being destroyed (or being moved to the free'd resource pool).
Note: The native API is not yet implemented in the initial PR associated with this EP, but will come soon enough afterward.
// Helper class users can inherit from, but is not necessary. If
// `AsyncHook::MakeCallback()` is used then all four callbacks will be
// called automatically.
class AsyncHook {
public:
AsyncHook(v8::Local<v8::Object> resource, const char* name);
~AsyncHook();
v8::MaybeLocal<v8::Value> MakeCallback(
const v8::Local<v8::Function> callback,
int argc,
v8::Local<v8::Value>* argv);
v8::Local<v8::Object> get_resource();
double get_uid();
private:
AsyncHook();
v8::Persistent<v8::Object> resource_;
const char* name_;
double uid_;
}
// Returns the id of the current execution context. If the return value is
// zero then no execution has been set. This will happen if the user handles
// I/O from native code.
double node::GetCurrentId();
// Return same value as async_hooks.triggerId();
double node::GetTriggerId();
// If the native API doesn't inherit from the helper class then the callbacks
// must be triggered manually. This triggers the init() callback. The return
// value is the uid assigned to the resource.
// TODO(trevnorris): This always needs to be called so that the resource can be
// placed on the Map for future query.
double node::EmitAsyncInit(v8::Local<v8::Object> resource);
// Emit the destroy() callback.
void node::EmitAsyncDestroy(double id);
// An API specific to emit before/after callbacks is unnecessary because
// MakeCallback will automatically call them for you.
// TODO(trevnorris): There should be a final optional parameter of
// "double triggerId" that's needed so async_hooks.triggerId() returns the
// correct value.
v8::MaybeLocal<v8::Value> node::MakeCallback(v8::Isolate* isolate,
v8::Local<v8::Object> recv,
v8::Local<v8::Function> callback,
int argc,
v8::Local<v8::Value>* argv,
double id);
Resources like HTTPParser
are reused throughout the lifetime of the
application. This means node will have to synthesize the init()
and
destroy()
calls. Also the id on the class instance will need to be changed
every time the resource is acquired for use.
Though for a shared resource like TimerWrap
we're not concerned with
reassigning the id because it isn't directly used by the user. Instead it is
used internally for JS created resources that are all assigned their own unique
id, and each of these JS resources are linked to the TimerWrap
instance.
Existing native modules that call node::MakeCallback
today will always have
their trigger id as 1
. Which is the root id. There will be two native APIs
that users can use. One will be a static API and another will be a class that
users can inherit from.
Recently V8 has implemented an API so that async hooks can properly interface with Promises. Unfortunately this API cannot be backported, so if this API is backported then there will be an API gap in which native Promises will break the async stack.
When data is written through StreamWrap
node first attempts to write as much
to the kernel as possible. If all the data can be flushed to the kernel then
the function exits without creating a WriteWrap
and calls the user's
callback in nextTick()
(which will only be called if the user passes a
callback). Meaning detection of the write won't be as straightforward as
watching for a WriteWrap
.