Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sampling decision cache isn't invalidated on span.set_parent #150

Closed
njsmith opened this issue Jul 2, 2024 · 5 comments · Fixed by #153
Closed

Sampling decision cache isn't invalidated on span.set_parent #150

njsmith opened this issue Jul 2, 2024 · 5 comments · Fixed by #153

Comments

@njsmith
Copy link

njsmith commented Jul 2, 2024

Bug Report

Version

│   │   ├── tracing-core v0.1.32
│   │   └── tracing-error v0.2.0
│   │       ├── tracing v0.1.40
│   │       │   ├── tracing-attributes v0.1.27 (proc-macro)
│   │       │   └── tracing-core v0.1.32 (*)
│   │       └── tracing-subscriber v0.3.18
│   │           ├── tracing v0.1.40 (*)
│   │           ├── tracing-core v0.1.32 (*)
│   │           └── tracing-log v0.2.0
│   │               └── tracing-core v0.1.32 (*)
│   └── tracing-error v0.2.0 (*)
│   │   │   │   └── tracing v0.1.40 (*)
│   │   │   └── tracing v0.1.40 (*)
│   │   ├── tracing v0.1.40 (*)
│   │   └── tracing v0.1.40 (*)
│   │   │   └── tracing v0.1.40 (*)
│   │   └── tracing v0.1.40 (*)
│       └── tracing v0.1.40 (*)
│   │   │   └── tracing v0.1.40 (*)
│   │   │   └── tracing v0.1.40 (*)
│   └── tracing v0.1.40 (*)
│   │       └── tracing v0.1.40 (*)
│   │   └── tracing v0.1.40 (*)
│   └── tracing v0.1.40 (*)
│   └── tracing-subscriber v0.3.18 (*)
├── tracing v0.1.40 (*)
├── tracing-error v0.2.0 (*)
├── tracing-opentelemetry v0.22.0
│   ├── tracing v0.1.40 (*)
│   ├── tracing-core v0.1.32 (*)
│   ├── tracing-log v0.2.0 (*)
│   └── tracing-subscriber v0.3.18 (*)
├── tracing-subscriber v0.3.18 (*)
│   │   └── tracing v0.1.40 (*)
│   └── tracing v0.1.40 (*)
│   └── tracing v0.1.40 (*)
├── tracing v0.1.40 (*)
├── tracing-error v0.2.0 (*)
├── tracing-opentelemetry v0.22.0 (*)
├── tracing-subscriber v0.3.18 (*)

Platform

Linux njs-0 6.1.58+ #1 SMP PREEMPT_DYNAMIC Mon Jan 29 15:19:25 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Description

My code performs the following sequence of operations:

  • Set up tracing-opentelemetry, with an otel config that uses a ParentBased(Box::new(AlwaysOff)) sampler
  • Create a span request_span to represent an entire request
  • Create a child span header_wait_span to represent time waiting for the http headers to arrive
  • Once the headers arrive, use a w3c text map propagator to extract a Context, and do
    • request_span.set_parent(remote_context)
    • header_wait_span.set_parent(request_span.context()) (<- so the child span picks up the new root id etc)

What I expect:

  • The sampling decision for request_span to be inherited from remote_context

What I get:

  • request_span's sampling decision ignores whether remote_context is sampled

But:

  • If I remove the child span, then it works like I expect.

It seems like what's happening is that the first time PreSampledTracer::sampled_context runs, it calls provider.config().sampler.should_sample, and caches the result. This can happen implicitly from multiple places, e.g. apparently it happens implicitly when creating a child span. Of course at this point, the span has no parent, so it uses the fallback sampler.

Then when I call request_span.set_context, it gains a parent, thus changing the decision about whether to sample -- but since the sampling decision is already cached, it doesn't matter.

OTOH if I carefully avoid touching the request_span in any way that might accidentally cause its context to be computed or a sampling decision to be made -- such as creating a child span -- then it doesn't get cached, and works as I expect.

Now that I finally figured this out, I can work around it -- but it's scary and feels fragile, because I don't know what operations might implicitly trigger the caching behavior. And it was very surprising to see that my cross-process trace stitching was working fine, but cross-process sampling choices weren't, even though they're sent in the same header!

I think a fix would be for set_parent to clear the cached builder.sampling_result, so the next call to sampled_context will re-calculate it.

@djc
Copy link
Collaborator

djc commented Jul 2, 2024

I think a fix would be for set_parent to clear the cached builder.sampling_result, so the next call to sampled_context will re-calculate it.

Sounds reasonable to me, would you be able to submit a PR?

@edwinsmith
Copy link

@djc interested if this is all it takes or are there any other gotchas - I'm new to this codebase.
@njsmith did you have a not-quite-reproducer that would make a good starting point for a test?

@njsmith
Copy link
Author

njsmith commented Jul 2, 2024

@edwinsmith This is how far I got before getting frustrated/confused by all the different moving parts in the setup

use opentelemetry::propagation::TextMapPropagator;
use opentelemetry::trace::{TraceContextExt, TracerProvider as _};
use opentelemetry::{global, Context};
use opentelemetry_sdk::trace::{BatchSpanProcessor, Config, Sampler, TracerProvider};
use std::collections::HashMap;
use tracing::Span;
use tracing_opentelemetry::OpenTelemetrySpanExt;
use tracing_subscriber::prelude::*;

fn main() {
    let sampler = Sampler::ParentBased(Box::new(Sampler::AlwaysOff));

    let exporter = opentelemetry_stdout::SpanExporter::default();
    let tracer = TracerProvider::builder()
        .with_config(Config::default().with_sampler(sampler))
        .with_simple_exporter(exporter)
        .build();

    tracing_subscriber::registry()
        .with(tracing_opentelemetry::layer().with_tracer(tracer))
        .init();

    let propagator = opentelemetry_sdk::propagation::TraceContextPropagator::new();

    // Create outer_span
    let outer_span = tracing::info_span!("outer_span");
    let _guard = outer_span.enter();

    dbg!(outer_span.context().span().span_context().is_sampled());

    // Create inner_span as a child of outer_span
    {
        let inner_span = tracing::info_span!("inner_span");
        let _guard = inner_span.enter();
    }

    // Use the propagator to get a context
    let mut headers = HashMap::new();
    headers.insert(
        "traceparent".to_string(),
        "00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01".to_string(),
    );
    let context = propagator.extract(&headers);

    dbg!(context.span().span_context().is_sampled());

    outer_span.set_parent(context);

    // Confirm that outer_span did not inherit the parent's sample-bit
    dbg!(outer_span.context().span().span_context().is_sampled());
}

@njsmith
Copy link
Author

njsmith commented Jul 2, 2024

(instead of using a child span you can call outer_span.context(); for a test that's probably a better way to trigger the caching path, vs my code that was trying to demonstrate the issue for a bug report)

@rohitmittapalli
Copy link
Contributor

@djc took over from @.edwinsmith, added a test here; new to the codebase as well

@djc djc closed this as completed in 7f4409a Jul 4, 2024
@djc djc closed this as completed in #153 Jul 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants