Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define where to handle invalid span contexts #233

Closed
toumorokoshi opened this issue Oct 22, 2019 · 7 comments · Fixed by #235
Closed

Define where to handle invalid span contexts #233

toumorokoshi opened this issue Oct 22, 2019 · 7 comments · Fixed by #235
Labels
discussion Issue or PR that needs/is extended discussion.
Milestone

Comments

@toumorokoshi
Copy link
Member

Joining a conversation that's currently happening across two PRs:

It's important for us to standardize on what httptextformatters will return back when they are unable to parse the span context from their respective headers. If the choice is to return an invalid span context, then we should figure out what is responsible for handling that condition, and what the correct behavior is.

Proposal

As a start, I'm proposing:

  1. if a formatter is unable to retrieve a valid spancontext, then an invalid spancontext should be returned.
  2. this should be handled in the propagator shim: https://github.com/open-telemetry/opentelemetry-python/blob/master/opentelemetry-api/src/opentelemetry/propagators/__init__.py#L26 and return back tracer.CURRENT_SPAN.

Here's alternatives and why I'm arguing against them:

Handling invalid spans in the integrations

Handling this in the integrations would require the same boilerplate code to handle and convert the invalid span into the CURRENT_SPAN constant. I worry about someone missing that implementation detail resulting in incongruent behavior.

Handling invalid spans in the SDK

I'm not 100% clear on what this would look like, but it would probably require making propagators and API interface and moving implementation code into the SDK. Not including this behavior would mean that the API alone is not enough to implement correct w3c tracecontext propagation (as invalid span contexts would not be converted to new ones).

I'm having trouble finding where it was stated that the API should propagate tracecontext by defaults. Maybe @c24t or @reyang can point me in the right direction?

@toumorokoshi toumorokoshi added the discussion Issue or PR that needs/is extended discussion. label Oct 22, 2019
@toumorokoshi toumorokoshi added this to the Alpha v0.2 milestone Oct 22, 2019
@Oberon00
Copy link
Member

I think the API will not be required to propagate context see open-telemetry/opentelemetry-specification#208 (comment)

@c24t
Copy link
Member

c24t commented Oct 23, 2019

I'm having trouble finding where it was stated that the API should propagate tracecontext by defaults. Maybe @c24t or @reyang can point me in the right direction?

open-telemetry/opentelemetry-specification#208 is probably the best source, this isn't really documented in the spec.

I think the API will not be required to propagate context

I think it's too soon to call this issue. The discussion in the specs issue is better, but as I understand it there's no way for us to do all of:

  • enable context propagation in applications via extensions alone (e.g. ext-grpc for gRPC, ext-wsgi/ext-requests for HTTP)
  • not require libraries to depend on the SDK
  • implement context propagation in the SDK only

From #228 (review):

I think we should rather fix the handling of INVALID_SPANCONTEXT in the SDK. Moving generate_spancontext to the API would break open-telemetry/oteps#58

@Oberon00 can you spell this out? Why does moving ID generation into the API make that OTEP impossible to implement?

Returning INVALID_SPAN from the formatter and converting to CURRENT_SPAN in the propagator sounds like a great solution to me, but it sounds like I'm missing some of the implications here.

Does it matter that integrations wouldn't be able to distinguish between invalid and unspecified spans?

@Oberon00
Copy link
Member

Because a vendor-SDK might encode custom information in the span context, maybe even in the Trace or Span ID. Thinking about it, moving the generate_span_id might not be as bad as I first thought because any vendor SDK would still have to deal with incoming span contexts that were not created by it anyway.

Anyway, my suggestion would be: Handle INVALID_SPANCONTEXT just like None when it is passed as parent. Maybe even disallow None as parent and only allow INVALID_SPANCONTEXT.

@toumorokoshi
Copy link
Member Author

That's a good idea, it makes sure that things are handled in a standard way without adding additional logic to integrations. Although it does come at the cost of not being able to disambiguate no parent vs invalid parent, but I feel like that isn't a big deal.

@c24t @mauriciovasquezbernal any concerns with handling this in Tracer.start_span (or those collection of methods?)

@mauriciovasquezbernal
Copy link
Member

I don't have concerns at this point. Just a question, propagators must then return INVALID_SPANCONTEXT when something fails. Is an SpanContext with trace_id and span_id 0 valid?

Just asking because the b3 propagator returns the later, a check like context is INVALID_SPAN_CONTEXT is False then.

toumorokoshi added a commit to toumorokoshi/opentelemetry-python that referenced this issue Oct 24, 2019
Fixes open-telemetry#233. The SDK tracer will now create spans with invalid parents
as brand new spans, similar to not having a parent at all.

Adding this behavior to the Tracer ensures that integrations do not have
to handle invalid span contexts in their own code, and ensures that behavior
is consistent with w3c tracecontext (which specifies invalid results should
be handled by creating new spans).
@toumorokoshi
Copy link
Member Author

@mauriciovasquezbernal regarding B3 and how they handle context propagation... I'm following this github repo and I don't see anything explicitly calling out how to handle invalid contexts: https://github.com/openzipkin/b3-propagation

Should we be handling this? or is there a more authoritative source?

looking closely at the spec again, I believe our current b3 propagation is invalid because it does not propagate the ParentSpanContext header. I'll file a ticket around that, but I don't think B3 propagation is something that we should fix in this PR.

@c24t
Copy link
Member

c24t commented Oct 24, 2019

Maybe even disallow None as parent and only allow INVALID_SPANCONTEXT

Even if we handle both cases the same way, I'd still prefer to allow null here because it makes for a more intuitive API. It's clear that null means "no parent context" in this case.

any concerns with handling this in Tracer.start_span (or those collection of methods?)

I think it's possible we'll discover some reason later that we have to distinguish between these in the SDK, or handle invalid contexts farther up the stack, but I don't have any concerns now.

toumorokoshi added a commit to toumorokoshi/opentelemetry-python that referenced this issue Oct 24, 2019
Fixes open-telemetry#233. The SDK tracer will now create spans with invalid parents
as brand new spans, similar to not having a parent at all.

Adding this behavior to the Tracer ensures that integrations do not have
to handle invalid span contexts in their own code, and ensures that behavior
is consistent with w3c tracecontext (which specifies invalid results should
be handled by creating new spans).
toumorokoshi added a commit that referenced this issue Oct 24, 2019
The SDK tracer will now create spans with invalid parents
as brand new spans, similar to not having a parent at all.

Adding this behavior to the Tracer ensures that integrations do not have
to handle invalid span contexts in their own code, and ensures that behavior
is consistent with w3c tracecontext (which specifies invalid results should
be handled by creating new spans).

Setting the parent to none on spans if the parent context is invalid, reducing logic
to handle that situation in downstream processing like exporters.
srikanthccv pushed a commit to srikanthccv/opentelemetry-python that referenced this issue Nov 1, 2020
* fix: ts-mocha allow recursively loading files
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Issue or PR that needs/is extended discussion.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants