Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new "run context" management to fix numerous span hierarchy issues #2181

Merged
merged 97 commits into from
Oct 27, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
97 commits
Select commit Hold shift + click to select a range
e2586e6
feat: new "run context" management to fix numerous span hierarchy issues
trentm Aug 6, 2021
b513be3
one half of a possible solution to avoid tracing if APM intake requests
trentm Aug 10, 2021
4df2746
missing part of the previous change to avoid instrumentation APM inta…
trentm Aug 10, 2021
e777572
fix so that an ended span is never used as a parent
trentm Aug 10, 2021
334d7ef
changes to S3 instrumentation so that its parent/child relationship w…
trentm Aug 10, 2021
f2367c0
Revert "missing part of the previous change to avoid instrumentation …
trentm Aug 10, 2021
f46fe0b
Revert "one half of a possible solution to avoid tracing if APM intak…
trentm Aug 10, 2021
17c0744
basic test/run-context structure and first test
trentm Aug 10, 2021
5e65b37
more run-context.test.js cases; starting to clear out obsoleted APIs
trentm Aug 17, 2021
c120b44
add test fixtures forgotten in previous commit
trentm Aug 17, 2021
7bf19d4
Merge branch 'master' into trentm/run-ctx-mgr
trentm Aug 17, 2021
2507f00
current state of ctxmgr work
trentm Aug 20, 2021
bceeacd
handling subtleties with how span.end() impacts the run context
trentm Aug 21, 2021
3a6dc8d
working through tests, XXXs
trentm Sep 2, 2021
4d2394a
Merge branch 'master' into trentm/run-ctx-mgr
trentm Sep 2, 2021
ea21754
breakdown.test.js passing
trentm Sep 3, 2021
a2795b7
setImmediates here were a race that failed reliably on node v10
trentm Sep 4, 2021
65fbd05
disable this attempt at help in using _mock_http_client, which just t…
trentm Sep 4, 2021
acc0af3
fix s3 instr: bind -> bindFunction recently
trentm Sep 4, 2021
0ed0e36
fix 'make check'
trentm Sep 7, 2021
5955326
tests: refactoring away lib/instrumentation/-specific mocks, from htt…
trentm Sep 7, 2021
67e6d85
Fix pg instrumentation for new run context manager.
trentm Sep 8, 2021
3da6e32
Merge branch 'master' into trentm/run-ctx-mgr
trentm Sep 8, 2021
5dadd73
fix hang in tedious.test.js due to tedious instr relying on _recoverT…
trentm Sep 8, 2021
ce9ada1
bless use of Instrumentation#testReset() alternative to 'currentTrans…
trentm Sep 8, 2021
423c62c
fix hang in mysql.test.js due to mysql instr relying on _recoverTrans…
trentm Sep 9, 2021
c3880f2
fix memcached tracing: it is now Instrumentation#currSpan(), also fix…
trentm Sep 9, 2021
1a295df
fix http2.test.js by updating the http2 pushStream instrumentation to…
trentm Sep 9, 2021
24a2fda
fix sqs.test.js, necessary guard because of the Metrics init/guard tw…
trentm Sep 9, 2021
60d65e4
bow down before standard
trentm Sep 9, 2021
73888de
workaround a bug in node v12.0.0-v12.2.0 (inclusive) that could resul…
trentm Sep 9, 2021
aab5dbb
all hail standard
trentm Sep 10, 2021
8a51c1d
refactor: remove unnecessary namespacing of some of these tests
trentm Sep 10, 2021
9a3aa00
tests: a shot in the dark that this flush is causing the CI breakdown…
trentm Sep 10, 2021
8525fdb
fix Instrumentation#stop to actually disable the RunContextManager
trentm Sep 10, 2021
b868e1a
tests: add logging to test a theory with the breaking breakdown.test.…
trentm Sep 10, 2021
b5a1b62
necessary guards now that Instrumentation#stop (called by agent.destr…
trentm Sep 10, 2021
aeffed1
drop the 'test.only' to see if I can get the CI breakdown.test.js run…
trentm Sep 10, 2021
5d277b4
tests: breakdown.test.js wait improvements to be less susceptible to …
trentm Sep 10, 2021
f6c60aa
tests: fix transaction.test.js by clearing the CapturingTransport at …
trentm Sep 10, 2021
8f9d334
fix breakdown.test.js for asyncHooks=false
trentm Sep 10, 2021
f6c3e47
clarify this comment
trentm Sep 14, 2021
023f074
drop Instrumentation#_recoverTransaction
trentm Sep 14, 2021
c1e028d
Merge branch 'master' into trentm/run-ctx-mgr
trentm Sep 14, 2021
e42f5ee
reduce unnecessary diff size
trentm Sep 14, 2021
71616cd
fix breakage in mimic-response@1.0.0 instrumentation
trentm Sep 14, 2021
d1993fd
thanks much, standard
trentm Sep 14, 2021
35b5584
change Instrumentation#currTx() to Instrumentation#currTransaction() …
trentm Sep 14, 2021
dec0707
tests: change (most) test usage of old ins.currentTransaction to ins.…
trentm Sep 14, 2021
80b5fee
drop some XXX plans
trentm Sep 15, 2021
81fd4ff
drop unneeded change to this file to reduce diff size
trentm Sep 15, 2021
40d9095
restore agent._transport guard in captureError() b/c it is needed if …
trentm Sep 15, 2021
169a53d
no point in these log.debugs; frankly it is fine if the 'currentTrans…
trentm Sep 15, 2021
b8f7714
clean out debug prints and aspiration statements
trentm Sep 15, 2021
8e53ba4
nicer comment for this change
trentm Sep 15, 2021
a241b98
drop unneeded dev comments
trentm Sep 20, 2021
c1c832f
Merge branch 'master' into trentm/run-ctx-mgr
trentm Sep 20, 2021
406dddf
improve and clean up run context binding for s3 instrumentation
trentm Sep 21, 2021
309b859
attempting to deal with run context due to startSpan in s3 and outgoi…
trentm Sep 21, 2021
509ddb1
more clear run context binding for the s3 client method callback + re…
trentm Sep 22, 2021
4388612
start additions to Instrumentation API to not have to reach into ins.…
trentm Sep 23, 2021
60d7799
refactor ins.ids property to ins.ids(); no need for property getters …
trentm Sep 23, 2021
b822cf8
refactor: api name changes to avoid re-using the same fn name on diff…
trentm Sep 23, 2021
22dcb72
refactor: restore setSpanOutcome impl on Instrumentation, can be chan…
trentm Sep 23, 2021
c627fdd
move examples/ to test cases where appropriate; a couple fixes
trentm Sep 23, 2021
f888963
async-hooks is no longer used, drop it
trentm Sep 27, 2021
37c72da
drop ins.currentTransaction and ins.currentSpan usage; reduce interna…
trentm Sep 27, 2021
c86813b
increase timeout for CI
trentm Sep 27, 2021
0c9bdf2
extracting notes for follow-up work separate from this PR
trentm Sep 27, 2021
4a7739f
drop obsolete activeSpan and bindingSpan
trentm Sep 27, 2021
1036a17
clear out review notes and later todos for pg.js
trentm Sep 27, 2021
4ce8bc3
more clearing out review notes
trentm Sep 27, 2021
4540ed3
working through older XXX notes-to-self
trentm Sep 27, 2021
de85763
skip this test for node v8 and patch-async, isn't worth fretting
trentm Sep 27, 2021
91efede
Merge branch 'master' into trentm/run-ctx-mgr
trentm Sep 27, 2021
2d1592e
try replaceActive impl as exitContext;enterContext
trentm Sep 28, 2021
0bfbcf3
refactor: s/replaceActive/replaceRunContext/
trentm Sep 28, 2021
f9b8b84
Theoretically better toString for run ctx managers
trentm Sep 28, 2021
60db653
some refactoring, XXX-count=25
trentm Sep 28, 2021
9e02f21
Merge branch 'master' into trentm/run-ctx-mgr
trentm Sep 28, 2021
fd13252
doc RunContext; some refactoring
trentm Sep 28, 2021
52fa57d
this new test dir was not actually being run :facepalm:
trentm Sep 28, 2021
3ef9e54
fixing this to work: node 8 had no assert.strict, span.sync with prom…
trentm Sep 28, 2021
f0e5ff4
refactoring method names to try to be more self-explanatory; some met…
trentm Sep 28, 2021
0893d28
refactor: AbstractRunContextManager
trentm Sep 28, 2021
dbdaf72
refactor: run-context under instrumentation, which is meant to comple…
trentm Sep 28, 2021
1ef673f
refactor: break up BasicRunContextManager.js
trentm Sep 28, 2021
8f5490b
adding notices
trentm Sep 28, 2021
107d189
documenting some added functions
trentm Sep 28, 2021
14e8b9c
update dev note; re-instate x bit for these examples
trentm Sep 29, 2021
5cafbc8
drop obsolete trans.sync tracking
trentm Sep 29, 2021
6b05f53
tweaks to doc comments
trentm Sep 29, 2021
28ea097
improve some comments/test code for review
trentm Sep 29, 2021
b3cd5c2
doc/style tweaks
trentm Sep 30, 2021
d82e35f
Merge branch 'master' into trentm/run-ctx-mgr
trentm Oct 12, 2021
2406627
Merge branch 'master' into trentm/run-ctx-mgr
trentm Oct 26, 2021
7972658
changelog addition
trentm Oct 27, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .tav.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,9 @@ generic-pool:
commands: node test/instrumentation/modules/generic-pool.test.js
mimic-response:
versions: ^1.0.0
commands: node test/instrumentation/modules/mimic-response.test.js
commands:
- node test/instrumentation/modules/mimic-response.test.js
- node test/instrumentation/modules/http/github-179.test.js
got-very-old:
name: got
versions: '>=4.0.0 <9.0.0'
Expand Down
27 changes: 27 additions & 0 deletions CHANGELOG.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,33 @@ Notes:
[[release-notes-3.x]]
=== Node.js Agent version 3.x

==== Unreleased

[float]
===== Breaking changes

[float]
===== Features

[float]
===== Bug fixes

* A significant change was made to internal run context tracking (a.k.a. async
context tracking). There are no configuration changes or API changes for
custom instrumentation. ({pull}2181[#2181])
+
One behavior change is that multiple spans created synchronously (in the same
async task) will form parent/child relationships; before this change they would
all be siblings. This fixes HTTP child spans of Elasticsearch and aws-sdk
automatic spans to properly be children. ({issues}1889[#1889])
+
Another behavior change is that a span B started after having ended span A in
the same async task will *no longer* be a child of span A. ({pull}1964[#1964])
+
This fixes an issue with context binding of EventEmitters, where
`removeListener` would fail to actually remove if the same handler function was
added to multiple events.


[[release-notes-3.23.0]]
==== 3.23.0 2021/10/25
Expand Down
30 changes: 15 additions & 15 deletions DEVELOPMENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,24 +38,24 @@ environment variables:

## debug logging of `async_hooks` usage

The following patch to the agent's async-hooks.js can be helpful to learn
how its async hook tracks relationships between async operations:
When using the `AsyncHooksRunContextManager` the following debug printf in
the `init` async hook can be helpful to learn how its async hook tracks
relationships between async operations:

```diff
diff --git a/lib/instrumentation/async-hooks.js b/lib/instrumentation/async-hooks.js
index 1dd168f..f35877d 100644
--- a/lib/instrumentation/async-hooks.js
+++ b/lib/instrumentation/async-hooks.js
@@ -71,6 +71,9 @@ module.exports = function (ins) {
// type, which will init for each scheduled timer.
if (type === 'TIMERWRAP') return

+ const indent = ' '.repeat(triggerAsyncId % 80)
+ process._rawDebug(`${indent}${type}(${asyncId}): triggerAsyncId=${triggerAsyncId} executionAsyncId=${asyncHooks.executionAsyncId()}`);
diff --git a/lib/instrumentation/run-context/AsyncHooksRunContextManager.js b/lib/instrumentation/run-context/AsyncHooksRunContextManager.js
index 94376188..571539aa 100644
--- a/lib/instrumentation/run-context/AsyncHooksRunContextManager.js
+++ b/lib/instrumentation/run-context/AsyncHooksRunContextManager.js
@@ -60,6 +60,8 @@ class AsyncHooksRunContextManager extends BasicRunContextManager {
return
}

+ process._rawDebug(`${' '.repeat(triggerAsyncId % 80)}${type}(${asyncId}): triggerAsyncId=${triggerAsyncId} executionAsyncId=${asyncHooks.executionAsyncId()}`);
+
const transaction = ins.currentTransaction
if (!transaction) return

const context = this._stack[this._stack.length - 1]
if (context !== undefined) {
this._runContextFromAsyncId.set(asyncId, context)
```


Expand Down
14 changes: 14 additions & 0 deletions NOTICE.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
apm-agent-nodejs
Copyright 2011-2021 Elasticsearch B.V.

# Notice

This project contains several dependencies which have been vendored in
Expand Down Expand Up @@ -95,3 +98,14 @@ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.

## opentelemetry-js

- **path:** [lib/instrumentation/run-context/](lib/instrumentation/run-context/)
- **author:** OpenTelemetry Authors
- **project url:** https://github.com/open-telemetry/opentelemetry-js
- **original file:** https://github.com/open-telemetry/opentelemetry-js/tree/main/packages/opentelemetry-context-async-hooks/src
- **license:** Apache License 2.0, https://github.com/open-telemetry/opentelemetry-js/blob/main/packages/opentelemetry-context-async-hooks/LICENSE

Parts of "lib/instrumentation/run-context" have been adapted from or influenced
by TypeScript code in `@opentelemetry/context-async-hooks`.
23 changes: 13 additions & 10 deletions lib/agent.js
Original file line number Diff line number Diff line change
Expand Up @@ -50,26 +50,26 @@ function Agent () {

Object.defineProperty(Agent.prototype, 'currentTransaction', {
get () {
return this._instrumentation.currentTransaction
return this._instrumentation.currTransaction()
}
})

Object.defineProperty(Agent.prototype, 'currentSpan', {
get () {
return this._instrumentation.currentSpan
return this._instrumentation.currSpan()
}
})

Object.defineProperty(Agent.prototype, 'currentTraceparent', {
get () {
const current = this.currentSpan || this.currentTransaction
const current = this._instrumentation.currSpan() || this._instrumentation.currTransaction()
return current ? current.traceparent : null
}
})

Object.defineProperty(Agent.prototype, 'currentTraceIds', {
get () {
return this._instrumentation.ids
return this._instrumentation.ids()
}
})

Expand All @@ -86,6 +86,9 @@ Object.defineProperty(Agent.prototype, 'currentTraceIds', {
// - There may be in-flight tasks (in ins.addEndedSpan() and
// agent.captureError() for example) that will complete after this destroy
// completes. They should have no impact other than CPU/resource use.
// - The patching of core node functions when `asyncHooks=false` is *not*
// undone. This means run context tracking for `asyncHooks=false` is broken
// with in-process multiple-Agent use.
Agent.prototype.destroy = function () {
if (this._transport && this._transport.destroy) {
this._transport.destroy()
Expand Down Expand Up @@ -275,27 +278,27 @@ Agent.prototype.setFramework = function ({ name, version, overwrite = true }) {
}

Agent.prototype.setUserContext = function (context) {
var trans = this.currentTransaction
var trans = this._instrumentation.currTransaction()
if (!trans) return false
trans.setUserContext(context)
return true
}

Agent.prototype.setCustomContext = function (context) {
var trans = this.currentTransaction
var trans = this._instrumentation.currTransaction()
if (!trans) return false
trans.setCustomContext(context)
return true
}

Agent.prototype.setLabel = function (key, value, stringify) {
var trans = this.currentTransaction
var trans = this._instrumentation.currTransaction()
if (!trans) return false
return trans.setLabel(key, value, stringify)
}

Agent.prototype.addLabels = function (labels, stringify) {
var trans = this.currentTransaction
var trans = this._instrumentation.currTransaction()
if (!trans) return false
return trans.addLabels(labels, stringify)
}
Expand Down Expand Up @@ -419,11 +422,11 @@ Agent.prototype.captureError = function (err, opts, cb) {
const handled = opts.handled !== false // default true
const shouldCaptureAttributes = opts.captureAttributes !== false // default true
const skipOutcome = Boolean(opts.skipOutcome)
const span = this.currentSpan
const span = this._instrumentation.currSpan()
const timestampUs = (opts.timestamp
? Math.floor(opts.timestamp * 1000)
: Date.now() * 1000)
const trans = this.currentTransaction
const trans = this._instrumentation.currTransaction()
const traceContext = (span || trans || {})._context

// As an added feature, for *some* cases, we capture a stacktrace at the point
Expand Down
113 changes: 0 additions & 113 deletions lib/instrumentation/async-hooks.js

This file was deleted.

14 changes: 4 additions & 10 deletions lib/instrumentation/http-shared.js
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,8 @@ exports.instrumentRequest = function (agent, moduleName) {

if (isRequestBlacklisted(agent, req)) {
agent.logger.debug('ignoring blacklisted request to %s', req.url)
// don't leak previous transaction
agent._instrumentation.currentTransaction = null
// Don't leak previous transaction.
agent._instrumentation.supersedeWithEmptyRunContext()
} else {
var traceparent = req.headers.traceparent || req.headers['elastic-apm-traceparent']
var tracestate = req.headers.tracestate
Expand Down Expand Up @@ -152,7 +152,7 @@ exports.traceOutgoingRequest = function (agent, moduleName, method) {
// however a traceparent header must still be propagated
// to indicate requested services should not be sampled.
// Use the transaction context as the parent, in this case.
var parent = span || agent.currentTransaction
var parent = span || ins.currTransaction()
if (parent && parent._context) {
const headerValue = parent._context.toTraceParentString()
const traceStateValue = parent._context.toTraceStateString()
Expand Down Expand Up @@ -181,7 +181,7 @@ exports.traceOutgoingRequest = function (agent, moduleName, method) {
// Or if it's somehow preferable to listen for when a `response` listener
// is added instead of when `response` is emitted.
const emit = req.emit
req.emit = function (type, res) {
req.emit = function wrappedEmit (type, res) {
if (type === 'response') onresponse(res)
if (type === 'abort') onAbort(type)
return emit.apply(req, arguments)
Expand Down Expand Up @@ -228,17 +228,11 @@ exports.traceOutgoingRequest = function (agent, moduleName, method) {
}

function onresponse (res) {
// Work around async_hooks bug in Node.js 12.0 - 12.2 (https://github.com/nodejs/node/pull/27477)
ins._recoverTransaction(span.transaction)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

REVIEW NOTE: See separate, long review note about _recoverTransaction being dropped.


agent.logger.debug('intercepted http.ClientRequest response event %o', { id: id })
ins.bindEmitter(res)

statusCode = res.statusCode

res.prependListener('end', function () {
agent.logger.debug('intercepted http.IncomingMessage end event %o', { id: id })

onEnd()
})
}
Expand Down
Loading