Skip to content

Commit

Permalink
fix: new "run context" management to fix numerous span hierarchy issu…
Browse files Browse the repository at this point in the history
…es (#2181)

This makes a number of changes to the `Instrumentation` API that is used by
the instrumentations under "lib/instrumentation/modules/". Tracking the
"current transaction" and "current span" is now entirely encapsulated in
tracking the current "run context" (see `RunContext.js`) via a "run context
manager" singleton (`agent._instrumentation._runCtxMgr`). This is the thing
that uses `async_hooks` (`AsyncHooksRunContextManager`) or not
(`BasicRunContextManager`) to track the run context for the currently running
JS. They use an interface (and some implementation) very similar to OTel's
ContextManager.

The primary reason our context management can't be closer to OTel's is
because our `apm.startTransaction(...)`, `apm.startSpan(...)`, `span.end()`,
et al APIs **change the current run context**.

# Instrumentation API used for run context tracking

- **Part** of run context tracking is handled by an async-hook tracking new
  async tasks. The rest of run context tracking is in explicit calls to
  "bind" a function call to a particular run context.

        ins.bindEmitter(ee)   // (unchanged) bind added event handlers to the curr run context
        ins.bindFunction(fn)  // (unchanged) bind fn to the curr run context
        ins.bindFunctionToRunContext(rc, fn)  // bind fn to a specific run context
        ins.bindFunctionToEmptyRunContext(fn) // an odd ball used to explicitly break run context
        ins.withRunContext(rc, fn, thisArg, ...args)
            // Equivalent to binding `fn` then calling it, but with less overhead.

- Creating and ending transactions and spans:

        ins.startTransaction(...) -> trans  // (unchanged)
        ins.startSpan(...) -> span          // (unchanged)
        ins.createSpan(...) -> span         // Create span, but don't change the current run context.
        ins.addEndedTransaction(trans)      // (unchanged)
        ins.addEndedSpan(trans)             // (unchanged)

- Getting and working with run contexts:

        ins.currRunContext() -> rc          // Get the current run context.
        // The following are mostly used internally by above methods.
        ins.supersedeWithTransRunContext(trans)
        ins.supersedeWithSpanRunContext(span)
        ins.supersedeWithEmptyRunContext()

# Behavior changes

This makes *no* changes to the public API.  There are, however, the following
changes in behavior.

1. If user code creates span A, then creates span B *in the same async task*:
   B will be a child of A.

                                    // BEFORE               AFTER
        apm.startTransaction('t0')  // transaction 't0'     transaction 't0'
        apm.startSpan('s1')         // |- span 's1'         `- span 's1'
        apm.startSpan('s2')         // `- span 's2'           `- span 's2'

2. Before this PR, an ended transaction would linger as
  `apm.currentTransaction`. Not any more.

Fixes: #1889
Fixes: #1239
  • Loading branch information
trentm authored Oct 27, 2021
1 parent a36ad83 commit cf0708c
Show file tree
Hide file tree
Showing 96 changed files with 1,942 additions and 869 deletions.
4 changes: 3 additions & 1 deletion .tav.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,9 @@ generic-pool:
commands: node test/instrumentation/modules/generic-pool.test.js
mimic-response:
versions: ^1.0.0
commands: node test/instrumentation/modules/mimic-response.test.js
commands:
- node test/instrumentation/modules/mimic-response.test.js
- node test/instrumentation/modules/http/github-179.test.js
got-very-old:
name: got
versions: '>=4.0.0 <9.0.0'
Expand Down
27 changes: 27 additions & 0 deletions CHANGELOG.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,33 @@ Notes:
[[release-notes-3.x]]
=== Node.js Agent version 3.x
==== Unreleased
[float]
===== Breaking changes
[float]
===== Features
[float]
===== Bug fixes
* A significant change was made to internal run context tracking (a.k.a. async
context tracking). There are no configuration changes or API changes for
custom instrumentation. ({pull}2181[#2181])
+
One behavior change is that multiple spans created synchronously (in the same
async task) will form parent/child relationships; before this change they would
all be siblings. This fixes HTTP child spans of Elasticsearch and aws-sdk
automatic spans to properly be children. ({issues}1889[#1889])
+
Another behavior change is that a span B started after having ended span A in
the same async task will *no longer* be a child of span A. ({pull}1964[#1964])
+
This fixes an issue with context binding of EventEmitters, where
`removeListener` would fail to actually remove if the same handler function was
added to multiple events.
[[release-notes-3.23.0]]
==== 3.23.0 2021/10/25
Expand Down
30 changes: 15 additions & 15 deletions DEVELOPMENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,24 +38,24 @@ environment variables:

## debug logging of `async_hooks` usage

The following patch to the agent's async-hooks.js can be helpful to learn
how its async hook tracks relationships between async operations:
When using the `AsyncHooksRunContextManager` the following debug printf in
the `init` async hook can be helpful to learn how its async hook tracks
relationships between async operations:

```diff
diff --git a/lib/instrumentation/async-hooks.js b/lib/instrumentation/async-hooks.js
index 1dd168f..f35877d 100644
--- a/lib/instrumentation/async-hooks.js
+++ b/lib/instrumentation/async-hooks.js
@@ -71,6 +71,9 @@ module.exports = function (ins) {
// type, which will init for each scheduled timer.
if (type === 'TIMERWRAP') return

+ const indent = ' '.repeat(triggerAsyncId % 80)
+ process._rawDebug(`${indent}${type}(${asyncId}): triggerAsyncId=${triggerAsyncId} executionAsyncId=${asyncHooks.executionAsyncId()}`);
diff --git a/lib/instrumentation/run-context/AsyncHooksRunContextManager.js b/lib/instrumentation/run-context/AsyncHooksRunContextManager.js
index 94376188..571539aa 100644
--- a/lib/instrumentation/run-context/AsyncHooksRunContextManager.js
+++ b/lib/instrumentation/run-context/AsyncHooksRunContextManager.js
@@ -60,6 +60,8 @@ class AsyncHooksRunContextManager extends BasicRunContextManager {
return
}

+ process._rawDebug(`${' '.repeat(triggerAsyncId % 80)}${type}(${asyncId}): triggerAsyncId=${triggerAsyncId} executionAsyncId=${asyncHooks.executionAsyncId()}`);
+
const transaction = ins.currentTransaction
if (!transaction) return

const context = this._stack[this._stack.length - 1]
if (context !== undefined) {
this._runContextFromAsyncId.set(asyncId, context)
```


Expand Down
14 changes: 14 additions & 0 deletions NOTICE.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
apm-agent-nodejs
Copyright 2011-2021 Elasticsearch B.V.

# Notice

This project contains several dependencies which have been vendored in
Expand Down Expand Up @@ -95,3 +98,14 @@ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.

## opentelemetry-js

- **path:** [lib/instrumentation/run-context/](lib/instrumentation/run-context/)
- **author:** OpenTelemetry Authors
- **project url:** https://github.com/open-telemetry/opentelemetry-js
- **original file:** https://github.com/open-telemetry/opentelemetry-js/tree/main/packages/opentelemetry-context-async-hooks/src
- **license:** Apache License 2.0, https://github.com/open-telemetry/opentelemetry-js/blob/main/packages/opentelemetry-context-async-hooks/LICENSE

Parts of "lib/instrumentation/run-context" have been adapted from or influenced
by TypeScript code in `@opentelemetry/context-async-hooks`.
23 changes: 13 additions & 10 deletions lib/agent.js
Original file line number Diff line number Diff line change
Expand Up @@ -50,26 +50,26 @@ function Agent () {

Object.defineProperty(Agent.prototype, 'currentTransaction', {
get () {
return this._instrumentation.currentTransaction
return this._instrumentation.currTransaction()
}
})

Object.defineProperty(Agent.prototype, 'currentSpan', {
get () {
return this._instrumentation.currentSpan
return this._instrumentation.currSpan()
}
})

Object.defineProperty(Agent.prototype, 'currentTraceparent', {
get () {
const current = this.currentSpan || this.currentTransaction
const current = this._instrumentation.currSpan() || this._instrumentation.currTransaction()
return current ? current.traceparent : null
}
})

Object.defineProperty(Agent.prototype, 'currentTraceIds', {
get () {
return this._instrumentation.ids
return this._instrumentation.ids()
}
})

Expand All @@ -86,6 +86,9 @@ Object.defineProperty(Agent.prototype, 'currentTraceIds', {
// - There may be in-flight tasks (in ins.addEndedSpan() and
// agent.captureError() for example) that will complete after this destroy
// completes. They should have no impact other than CPU/resource use.
// - The patching of core node functions when `asyncHooks=false` is *not*
// undone. This means run context tracking for `asyncHooks=false` is broken
// with in-process multiple-Agent use.
Agent.prototype.destroy = function () {
if (this._transport && this._transport.destroy) {
this._transport.destroy()
Expand Down Expand Up @@ -275,27 +278,27 @@ Agent.prototype.setFramework = function ({ name, version, overwrite = true }) {
}

Agent.prototype.setUserContext = function (context) {
var trans = this.currentTransaction
var trans = this._instrumentation.currTransaction()
if (!trans) return false
trans.setUserContext(context)
return true
}

Agent.prototype.setCustomContext = function (context) {
var trans = this.currentTransaction
var trans = this._instrumentation.currTransaction()
if (!trans) return false
trans.setCustomContext(context)
return true
}

Agent.prototype.setLabel = function (key, value, stringify) {
var trans = this.currentTransaction
var trans = this._instrumentation.currTransaction()
if (!trans) return false
return trans.setLabel(key, value, stringify)
}

Agent.prototype.addLabels = function (labels, stringify) {
var trans = this.currentTransaction
var trans = this._instrumentation.currTransaction()
if (!trans) return false
return trans.addLabels(labels, stringify)
}
Expand Down Expand Up @@ -419,11 +422,11 @@ Agent.prototype.captureError = function (err, opts, cb) {
const handled = opts.handled !== false // default true
const shouldCaptureAttributes = opts.captureAttributes !== false // default true
const skipOutcome = Boolean(opts.skipOutcome)
const span = this.currentSpan
const span = this._instrumentation.currSpan()
const timestampUs = (opts.timestamp
? Math.floor(opts.timestamp * 1000)
: Date.now() * 1000)
const trans = this.currentTransaction
const trans = this._instrumentation.currTransaction()
const traceContext = (span || trans || {})._context

// As an added feature, for *some* cases, we capture a stacktrace at the point
Expand Down
113 changes: 0 additions & 113 deletions lib/instrumentation/async-hooks.js

This file was deleted.

14 changes: 4 additions & 10 deletions lib/instrumentation/http-shared.js
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,8 @@ exports.instrumentRequest = function (agent, moduleName) {

if (isRequestBlacklisted(agent, req)) {
agent.logger.debug('ignoring blacklisted request to %s', req.url)
// don't leak previous transaction
agent._instrumentation.currentTransaction = null
// Don't leak previous transaction.
agent._instrumentation.supersedeWithEmptyRunContext()
} else {
var traceparent = req.headers.traceparent || req.headers['elastic-apm-traceparent']
var tracestate = req.headers.tracestate
Expand Down Expand Up @@ -152,7 +152,7 @@ exports.traceOutgoingRequest = function (agent, moduleName, method) {
// however a traceparent header must still be propagated
// to indicate requested services should not be sampled.
// Use the transaction context as the parent, in this case.
var parent = span || agent.currentTransaction
var parent = span || ins.currTransaction()
if (parent && parent._context) {
const headerValue = parent._context.toTraceParentString()
const traceStateValue = parent._context.toTraceStateString()
Expand Down Expand Up @@ -181,7 +181,7 @@ exports.traceOutgoingRequest = function (agent, moduleName, method) {
// Or if it's somehow preferable to listen for when a `response` listener
// is added instead of when `response` is emitted.
const emit = req.emit
req.emit = function (type, res) {
req.emit = function wrappedEmit (type, res) {
if (type === 'response') onresponse(res)
if (type === 'abort') onAbort(type)
return emit.apply(req, arguments)
Expand Down Expand Up @@ -228,17 +228,11 @@ exports.traceOutgoingRequest = function (agent, moduleName, method) {
}

function onresponse (res) {
// Work around async_hooks bug in Node.js 12.0 - 12.2 (https://github.com/nodejs/node/pull/27477)
ins._recoverTransaction(span.transaction)

agent.logger.debug('intercepted http.ClientRequest response event %o', { id: id })
ins.bindEmitter(res)

statusCode = res.statusCode

res.prependListener('end', function () {
agent.logger.debug('intercepted http.IncomingMessage end event %o', { id: id })

onEnd()
})
}
Expand Down
Loading

0 comments on commit cf0708c

Please sign in to comment.