Use Context to stop tracing #530

pauldraper · 2020-03-28T14:38:51Z

There are circumstances for instrumented actions where:

The action is frequent and of low interest: a healthcheck, polling a message queue, etc.
An OpenTracing exporter uses libraries that themselves may be instrumented (risking infinite tracing).
If the current layer (e.g. RPC) happens to offer sufficiently detailed tracing, lower HTTP/DNS/TCP/UDP layers do not need to be traced when invoked with this library. This need is heighten by the use of server/client spans, which AFAIK needs to remove spans in between them. (SpanKind with layers #526).

(1) has come up in #173

(2) has been a problem in opentelemetry-js (open-telemetry/opentelemetry-js#332) HTTP-based exporters, since the Node.js stdlib is instrumented globally. The solution in that has was adding a special HTTP header x-opentelemetry-outgoing-request headers, that the http instrumentation ignores. In essence it uses HTTP headers as a poor-man's context API. (This may not scale to other APIs.)

It would be nice to cut off all automatic tracing "below this current scope" by setting a context key. The default tracer implementation would create no-op spans if the context disables spans.

EDIT: Case (1) may call for sampling, rather than full disabling.

The text was updated successfully, but these errors were encountered:

dyladan · 2020-03-28T15:31:24Z

OpenTelemetry Ruby does this and I really like the idea

Oberon00 · 2020-03-30T07:29:04Z

See the Python solution, which I think is rather elegant: open-telemetry/opentelemetry-python#181 (updated in open-telemetry/opentelemetry-python#395 for OTEP 66).

You introduce a conventional context variable "suppres_instrumentation" that all instrumentations check and the span processors set to true while exporting.

dyladan · 2020-03-30T12:52:13Z

Why do the instrumentations check? Would you not just have the tracer check the context and return a non-reporting span?

Oberon00 · 2020-03-30T14:28:30Z

That would also work, but personally I prefer the more explicit no-magic approach here. Also, the instrumentation overhead can be further reduced if the instrumentation checks manually and shortcuts, instead of doing all the usual instrumentation with a no-op Span. Checking Span.IsRecording would probably work in most cases though.

dyladan · 2020-03-30T14:32:07Z

I think that is the primary use for the isRecording flag. I just prefer to only have to implement such things in a single place so that there is less risk of forgetting. I'm playing around with such an implementation in JS right now and I set the context in the span processor, then return non-recording spans from the tracer if that context is set. It works well and required very few changes.

Oberon00 · 2020-03-30T15:37:53Z

@toumorokoshi @codeboten FYI

pauldraper · 2020-03-30T22:17:24Z

I agree with @dyladan .

(A) The tracer is already managing context (activeSpan, withActiveSpan).

(B) Fewer places is better.

Oberon00 · 2020-03-31T07:47:19Z

@pauldraper The tracer will stop "managing" context but it will still need to access the context when creating a new span, see #527

toumorokoshi · 2020-04-01T16:21:19Z

Hi, another opentelemetry-python contributor chiming in, thanks @Oberon00!

Why do the instrumentations check? Would you not just have the tracer check the context and return a non-reporting span?

I think have the tracer check the flag is a good idea. I don't think it's too much magic, and comes at the benefit of not having to require instrumentation authors to implement this pattern. We can control this behavior on the sdk level, which enables stronger guarantees on behavior consistency.

I think that is the primary use for the isRecording flag.

It doesn't look like that's the intention of IsRecording, at least from what I see in the spec:

Returns true if this Span is recording information like events with the AddEvent operation, attributes using SetAttributes, status with SetStatus, etc. There should be no parameter.

This flag SHOULD be used to avoid expensive computations of a Span attributes or
events in case when a Span is definitely not recorded. Note that any child
span's recording is determined independently from the value of this flag
(typically based on the sampled flag of a TraceFlag on
SpanContext).

I haven't seen examples of these expensive computations yet, but it seems unrelated to whether the span itself should be exported. I would argue for a separate context variable (suppressInstrumentation to borrow our current python convention).

The additional pro/con of a suppressInstrumentation flag separate from the Span itself is the fact that both metrics and spans can share that value. But maybe, it would be better if we had those as two separate flags as most likely you will only want to suppress one or the other, most likely in the exporter.

So maybe:

suppressTraceInstrumentation and suppressMetricInstrumentation?

Oberon00 · 2020-04-01T17:43:33Z

I haven't seen examples of these expensive computations yet

You have to look at everything the instrumentation does for each span. E.g. the sum of all the things that the Python WSGI instrumentation does per request is relatively expensive.

dyladan · 2020-04-01T17:58:38Z

@toumorokoshi I was saying the primary use of isRecording is to avoid expensive operations to gather span attributes. @Oberon00 had recommended checking the context for instrumentation disabled flag within instrumentations because it would also avoid said expensive computations.

In my opinion the "correct" implementation on the instrumentation side would be:

instrumentation calls startSpan
tracer sees instrumentation disabled, returns a non-recording span
instrumentation sees span is non-recording and avoids expensive operations

This avoids the need for every instrumentation to check this flag. Because there will be many instrumentations, we cannot guarantee none of them will forget.

The flag is set on the context by span processors before they call exporters, or optionally in the exporter. I have no strong opinion on this difference.

If an instrumentation wants to disable the instrumentation of the lower-level protocols, it may also set the flag. e.g. some database driver which uses http to talk to the database may want to disable the http instrumentation in favor of instrumenting the protocol they've built on top of it.

If starting the span itself requires one of these expensive operations, then nothing is preventing the instrumentations from also checking the context flag.

toumorokoshi · 2020-04-01T23:51:26Z

I haven't seen examples of these expensive computations yet

You have to look at everything the instrumentation does for each span. E.g. the sum of all the things that the Python WSGI instrumentation does per request is relatively expensive.

I think I understand. You're referring to adding a check, per integration, to see if the IsRecording flag is set. if it's false, then we avoid adding attributes altogether, which would skip a non-trivial amount of code to collect and add attributes:

https://github.com/open-telemetry/opentelemetry-python/blob/master/ext/opentelemetry-ext-wsgi/src/opentelemetry/ext/wsgi/__init__.py#L121

I think if the spec called out explicitly that the resulting span would actually be missing attributes that would have been added in an IsRecording scenario, that would have been a little clearer to me. Thank you for clarifying.

In my opinion the "correct" implementation on the instrumentation side would be:

Make sense. As a clarification, is it just a non-recording span? It's a non-recording, no-op span that ensures that the span will never reach the exporter in the first place, correct?

Although it makes sense to me intuitively, I don't see a call-out for a no-op or default span object that would not emit spans to spanprocessors in the specification. So maybe that's something that should be added as well.

dyladan · 2020-04-02T01:04:39Z

I think I understand. You're referring to adding a check, per integration, to see if the IsRecording flag is set. if it's false, then we avoid adding attributes altogether, which would skip a non-trivial amount of code to collect and add attributes

yes

I think if the spec called out explicitly that the resulting span would actually be missing attributes that would have been added in an IsRecording scenario, that would have been a little clearer to me

A non-recording span doesn't record any attributes at all

As a clarification, is it just a non-recording span? It's a non-recording, no-op span that ensures that the span will never reach the exporter in the first place, correct?

A no-op span is one step further than a non-recording span. A non-recording span still propagates context, just doesn't record anything to the backend. It is useful in cases where you do not want to break traces. A no-op span can't be propagated, because it has no trace context or trace state. The non-recording span isn't sent to the backend, but propagates trace context correctly so that traces aren't broken. In this instance, a no-op span would be equally effective and possibly an even better choice.

Oberon00 · 2020-04-02T07:29:45Z

@toumorokoshi Your link is actually perfect to illustrate why we additionally could use the "surpress_instrumentation" context variable: In https://github.com/open-telemetry/opentelemetry-python/blob/v0.6.0/ext/opentelemetry-ext-wsgi/src/opentelemetry/ext/wsgi/__init__.py#L121, attributes are collected to be passed to start_span (so the sampler has more info and knows e.g. the URL), so the is_recording property could only be queried after.

toumorokoshi · 2020-04-02T23:25:16Z

@toumorokoshi Your link is actually perfect to illustrate why we additionally could use the "surpress_instrumentation" context variable: In https://github.com/open-telemetry/opentelemetry-python/blob/v0.6.0/ext/opentelemetry-ext-wsgi/src/opentelemetry/ext/wsgi/__init__.py#L121, attributes are collected to be passed to start_span (so the sampler has more info and knows e.g. the URL), so the is_recording property could only be queried after.

Ah! That's a very good point. I guess between the two, I would probably op to just use the context variable, which can then be wrapped in a utiltity method on the trace module itself like trace.instrumentationEnabled(). And I do agree that, in that case, that may make isRecording redundant.

toumorokoshi · 2020-04-16T23:19:59Z

@pauldraper will a PR be filed on this? I like this approach and would love to see it in the spec so I can implement it.

Oberon00 · 2020-05-25T14:09:02Z

@trask I wonder, how does auto-instr-java deal with that? It has a gRPC instrumentation, would that not cause a feedback loop with the OTLP exporter currently?

trask · 2020-05-25T19:34:41Z

In auto-instr-java, we load the exporter (and it's dependencies) inside of a separate class loader. This is primarily done to avoid version conflicts, but has the nice the side-effect that we can skip instrumentation of those classes (by looking at what classloader they're in) and avoid the feedback loop.

trask · 2020-05-25T19:36:14Z

One interesting caveat to the above: open-telemetry/opentelemetry-java-instrumentation#375 (comment)

anuraaga · 2020-05-26T00:58:26Z

For a reference point, I've run into 3) myself. I find spans from e.g., the netty instrumentation noise when they're being wrapped by a client library, for example the aws sdk. Others may not though meaning some configurability would be nice. But it creates the problem that

With auto instrumentation, it wouldn't really be possible to disable netty instrumentation only when called from an RPC SDK, have to disable it globally
Even when disabled, it causes issues where semantics depend on whether there is an underlying instrumentation or not like in Change aws-sdk span kind to INTERNAL opentelemetry-java-instrumentation#323

Being able to pause new spans for a stack frame would help with this.

codeboten · 2021-05-27T16:36:50Z

@dyladan was there a decision around what the right way to move forward for suppressing instrumentations? I see the proposed solution of making a key available through the API was closed.

sirzooro · 2022-05-27T14:07:57Z

This new instrumentationDisabled flag should be also propagated from process to process. Servers/Consumers could use it to automatically disable instrumentation for their spans.

My use case: multiple services uses common Redis library with added instrumentation. Service A periodically sends KeepAlive messages to other services, and I want to disable tracing for them on both Publisher and Subscriber sides. On Publisher side this is quite easy, I can tell Redis library skip instrumentation. However on Subscriber side this is not so easy, Redis library would have to examine messages to check if this is KeepAlive or not. One potential solution to it is to not send TraceParent/State from Publisher for KeepAlives, and create new span in Subscriber only if they are received. However this has a downside that spans are not created for messages sent from services which are not instrumented yet. Because of this it is better to explicitly send instrumentationDisabled flag.

dyladan · 2022-05-27T17:49:49Z

@dyladan was there a decision around what the right way to move forward for suppressing instrumentations? I see the proposed solution of making a key available through the API was closed.

Nope. The PR was closed because there wasn't agreement and as far as I know nobody ever followed up. I stopped working on it because I have to admit I was feeling a little defeated.

This new instrumentationDisabled flag should be also propagated from process to process. Servers/Consumers could use it to automatically disable instrumentation for their spans.

This introduces serious trust concerns. Without somehow signing the request to ensure that it is trusted, this would be ripe for abuse. I would argue that paying to cost of inspecting the message to see that it is a keepAlive on the server side is a much simpler solution less prone to errors and abuse.

Oberon00 · 2022-05-28T19:07:36Z

@sirzooro If you need propagation of the flag, you are probably looking for the existing sampled flag, not this feature.

sirzooro · 2022-05-31T11:26:13Z

@sirzooro If you need propagation of the flag, you are probably looking for the existing sampled flag, not this feature.

Thanks, this is exactly what I was looking for.

dyladan · 2022-12-13T21:24:38Z

The need for a mechanism like this was brought up again by @tsloughter in the spec call again today. At this point most if not all SIGs have some mechanism like the context solution that my PR proposed. At least JS, Ruby, Python, and .NET have all implemented something similar. Not sure which of those implemented in the API vs SDK, but there are definitely instrumentations in the wild using it in multiple languages.

In JS it was implemented as a context key which the SDK recognizes and provides a helper function to set and unset. IMO this is a poor situation because it tightly couples the instrumentation with the SDK. I would prefer to implement it in the API but did not want to do that without specification.

The proposal was a simple key which suppressed tracing by indicating a nonrecording span should be returned. This prevents exporter loops and allows higher level module instrumentations like axios to suppress lower level ones like http. Other signals were not needed to be suppressed because they don't have these problems (incrementing a metric on export does not cause another export like in tracing, and axios/http would have separate metric streams).

I would like to propose that we move forward with that original proposal. It solves the problems which need to be solved, is already implemented in many places, and does not preclude a more advanced per-signal, or any other, mechanism from being specified in the future. I think this is one place where we have let the desire for perfection get in the way of pragmatism.

tsloughter · 2022-12-14T18:05:32Z

@dyladan by "original proposal" do you mean an existing PR? I was going to make one today and planned it to be a function like suppress_telemetry() which takes optional arguments to only suppress certain signals.

dyladan · 2022-12-14T20:30:20Z

@tsloughter I meant this #1653

alanwest · 2022-12-15T17:47:36Z

Other signals were not needed to be suppressed because they don't have these problems (incrementing a metric on export does not cause another export like in tracing, and axios/http would have separate metric streams).

Catching up a bit on the old conversation from #1653, it looks like the original proposal was to suppress all signals with this feature. Personally, I think that's a good thing. It is what .NET's implementation currently does.

It can be necessary to suppress other signals as well. Other signals can also generate undesirable loops in a sense. An http client metric incremented on export will be exported on the next cycle - which would occur infinitely. Same kind of thing with logs. Tangentially, it can complicate things for us if/when we want to generate telemetry about the SDK itself, but I think this is a different problem to solve.

samarth-math · 2023-12-07T21:32:31Z

Is there a clean way to have a feature flag which stops exporting the trace to the backend?

(I'm using gcp and opentelemetry with python, and want to disable tracing while developing locally, but enable in production)

mackinra · 2024-07-05T07:56:58Z

OpenTelemetry Ruby does this and I really like the idea

@dyladan I have a need to suppress some noisy traces (that effectively poll a message queue) in our app. Can you elaborate on how this can be done in Ruby?

marcalff · 2024-08-28T15:10:08Z

An OpenTracing exporter uses libraries that themselves may be instrumented (risking infinite tracing).

This is relevant for opentelemetry-cpp.

The OTLP GRPC exporter uses grpc, and the grpc C++ library can have instrumentation that uses opentelemetry-cpp.

cc @open-telemetry/cpp-maintainers

toumorokoshi mentioned this issue Apr 14, 2020

make suppress_instrumentation emit a defaultspan open-telemetry/opentelemetry-python#581

Closed

This was referenced Apr 29, 2020

fix(http-plugin): strip otel custom http header #983 open-telemetry/opentelemetry-js#984

Merged

Consider using a hidden symbol property on request to stop tracing open-telemetry/opentelemetry-js#1010

Closed

vmarchaud mentioned this issue May 6, 2020

[Plugin]: Add dns plugin open-telemetry/opentelemetry-js-contrib#16

Closed

5 tasks

pauldraper mentioned this issue May 7, 2020

feat(http): allow to disable span without parent for outgoing requests open-telemetry/opentelemetry-js#931

Closed

dyladan mentioned this issue May 8, 2020

Use context to stop tracing open-telemetry/opentelemetry-js#1040

Closed

trask mentioned this issue May 26, 2020

Should aws-sdk instrumentation emit CLIENT spans and suppress downstream http client library instrumentation? open-telemetry/opentelemetry-java-instrumentation#440

Closed

Oberon00 mentioned this issue May 26, 2020

Sampling decison is too late to gain much performance #620

Open

Oberon00 mentioned this issue Jun 3, 2020

Clean up Tracing API spec, clarify Tracer vs TracerProvider. #619

Merged

toumorokoshi mentioned this issue Jun 12, 2020

[feat] Allow additional sampling hooks open-telemetry/oteps#115

Closed

codeboten mentioned this issue May 26, 2021

Add create context key to contrib open-telemetry/opentelemetry-python-contrib#502

Merged

sirzooro mentioned this issue Apr 7, 2022

Please add default constructor and move constructor/assignment op to Scope class open-telemetry/opentelemetry-cpp#1298

Open

dukesilverr mentioned this issue Dec 11, 2022

Allow to not record a span without a parent for automatic XMLHttpRequest and Fetch Instrumentations open-telemetry/opentelemetry-js#3482

Closed

tsloughter mentioned this issue Jan 13, 2023

add SuppressTracing flag #3103

Closed

cijothomas mentioned this issue Jul 28, 2023

Need a way to suppress telemetry from SDKs own operation open-telemetry/opentelemetry-rust#1171

Closed

Flarna mentioned this issue Jul 30, 2023

Allow libraries to suppress tracing without taking a dependency on core open-telemetry/opentelemetry-js#4025

Closed

LikeTheSalad mentioned this issue Oct 6, 2023

Prevent infinite traces when automatically instrumenting OkHttp open-telemetry/opentelemetry-java#5886

Closed

alanwest mentioned this issue Nov 28, 2023

[HttpClient] Remove SDK dependency open-telemetry/opentelemetry-dotnet#5077

Merged

4 tasks

jack-berg mentioned this issue Feb 7, 2024

Turn off spans from a scope while retaining downstream spans and without breaking traces #3867

Closed

dyladan added triage:accepted:needs-sponsor and removed release:after-ga Not required before GA release, and not going to work on before GA labels Mar 29, 2024

SylvainJuge mentioned this issue Jul 2, 2024

suppress instrumentation: move to api + generic context key open-telemetry/opentelemetry-java#6546

Merged

carlosalberto mentioned this issue Aug 28, 2024

[Triage Process] Allow SIG maintainers to express their requirements #4083

Open

This was referenced Aug 28, 2024

[Exporter] Prevent infinite recursion with grpc open-telemetry/opentelemetry-cpp#3039

Open

[TRACKING] List of upstream issues impacting opentelemetry-cpp open-telemetry/opentelemetry-cpp#2708

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Context to stop tracing #530

Use Context to stop tracing #530

pauldraper commented Mar 28, 2020 •

edited

Loading

dyladan commented Mar 28, 2020

Oberon00 commented Mar 30, 2020

dyladan commented Mar 30, 2020

Oberon00 commented Mar 30, 2020 •

edited

Loading

dyladan commented Mar 30, 2020

Oberon00 commented Mar 30, 2020

pauldraper commented Mar 30, 2020 •

edited

Loading

Oberon00 commented Mar 31, 2020

toumorokoshi commented Apr 1, 2020

Oberon00 commented Apr 1, 2020

dyladan commented Apr 1, 2020 •

edited

Loading

toumorokoshi commented Apr 1, 2020 •

edited

Loading

dyladan commented Apr 2, 2020

Oberon00 commented Apr 2, 2020

toumorokoshi commented Apr 2, 2020

toumorokoshi commented Apr 16, 2020

Oberon00 commented May 25, 2020

trask commented May 25, 2020

trask commented May 25, 2020

anuraaga commented May 26, 2020

codeboten commented May 27, 2021

sirzooro commented May 27, 2022

dyladan commented May 27, 2022

Oberon00 commented May 28, 2022

sirzooro commented May 31, 2022

dyladan commented Dec 13, 2022

tsloughter commented Dec 14, 2022

dyladan commented Dec 14, 2022

alanwest commented Dec 15, 2022

samarth-math commented Dec 7, 2023 •

edited

Loading

mackinra commented Jul 5, 2024

marcalff commented Aug 28, 2024

Use Context to stop tracing #530

Use Context to stop tracing #530

Comments

pauldraper commented Mar 28, 2020 • edited Loading

dyladan commented Mar 28, 2020

Oberon00 commented Mar 30, 2020

dyladan commented Mar 30, 2020

Oberon00 commented Mar 30, 2020 • edited Loading

dyladan commented Mar 30, 2020

Oberon00 commented Mar 30, 2020

pauldraper commented Mar 30, 2020 • edited Loading

Oberon00 commented Mar 31, 2020

toumorokoshi commented Apr 1, 2020

Oberon00 commented Apr 1, 2020

dyladan commented Apr 1, 2020 • edited Loading

toumorokoshi commented Apr 1, 2020 • edited Loading

dyladan commented Apr 2, 2020

Oberon00 commented Apr 2, 2020

toumorokoshi commented Apr 2, 2020

toumorokoshi commented Apr 16, 2020

Oberon00 commented May 25, 2020

trask commented May 25, 2020

trask commented May 25, 2020

anuraaga commented May 26, 2020

codeboten commented May 27, 2021

sirzooro commented May 27, 2022

dyladan commented May 27, 2022

Oberon00 commented May 28, 2022

sirzooro commented May 31, 2022

dyladan commented Dec 13, 2022

tsloughter commented Dec 14, 2022

dyladan commented Dec 14, 2022

alanwest commented Dec 15, 2022

samarth-math commented Dec 7, 2023 • edited Loading

mackinra commented Jul 5, 2024

marcalff commented Aug 28, 2024

pauldraper commented Mar 28, 2020 •

edited

Loading

Oberon00 commented Mar 30, 2020 •

edited

Loading

pauldraper commented Mar 30, 2020 •

edited

Loading

dyladan commented Apr 1, 2020 •

edited

Loading

toumorokoshi commented Apr 1, 2020 •

edited

Loading

samarth-math commented Dec 7, 2023 •

edited

Loading