Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spans's parent must be passed as Context instead of Span(Context)s. #875

Merged

Conversation

Oberon00
Copy link
Member

@Oberon00 Oberon00 commented Aug 25, 2020

Changes

Before this PR, a span's parent was logically a SpanContext. After this PR, it is a full Context:

  • A full Context is the only way to specify a parent of a span. SpanContext or even Span are not possible anymore.
  • Readable spans MUST provide a way to retrieve the full parent Context.

Prototype

There is an implementation of this in Java: open-telemetry/opentelemetry-java#1611 Note: The prototype does not implement the Context additional parameter to OnStart, but that would be trivial to add in Java.

Rationale

This is required to actually transport arbitrary custom context attributes from inject through to extract. Previously the only way to do that was to encode them in TraceContext (which has length limitations and only supports string-encoded data).

Additionally this makes the spec simpler because using a Context as a parent was already required for using Propagators.Extract, and getting a Context containing a certain Span in some way was already required for using Propagators.Inject. This PR removes the need to deal with parent SpanContexts and Spans in addition.

See motivating example of how this can be used to simplify propagator code that currently uses W3C TraceState at #875 (comment) The example no longer works when removing storage of Context. A custom SpanProcessor.OnStart could store the custom Context properties as Span attributes, but these would not be propagated to child spans.

This would actually have been required by OTEP 66 (https://github.com/open-telemetry/oteps/blob/master/text/0066-separate-context-propagation.md#observability-api):

Note that OpenTelemetry APIs calls should always be given access to the entire context object, and never just a subset of the context, such as the value in a single key.

We did intentionally not implement OTEP 66 fully though because it would have been too much to change. Still this PR does help reap more value from our partial OTEP 66 implementation.

Possible follow-ups

  • Should the parent Context also be stored on the Span (this PR originally required that)? Or: Put the parent Context inside the SpanContext instead of on the Span. With that we would be able to keep the APIs accepting SpanContext and Span. Spans's parent must be passed as Context instead of Span(Context)s. #875 (comment)
    • If we do either of that: Should W3C TraceState be removed from SpanContext? It could be hidden in the Context (maybe with a publicly accessible API exposed in the W3C propagator)?
  • Should APIs that create and operate on Context instead of Span be added to OpenTelemetry? This could be done gradually.
  • Suggested by @anuraaga in Spans's parent must be passed as Context instead of Span(Context)s. #875 (comment) Add a Map<Key<T>, T> or similar to the SpanContext for arbitrary non-string extra data -- this would shield users a bit more from having to deal with Context (although they still have to do that when using propagators) while still allowing tracing propagators to transport arbitrary information. It would, however not help with connecting metrics or baggage to spans.
    • @anuraaga notes that this can be solved by copying the baggage into that map.

@Oberon00 Oberon00 requested a review from a team as a code owner August 25, 2020 18:06
@Oberon00 Oberon00 requested a review from a team August 25, 2020 18:06
@Oberon00 Oberon00 requested a review from a team as a code owner August 25, 2020 18:06
@Oberon00 Oberon00 added area:api Cross language API specification issue release:required-for-ga Must be resolved before GA release, or nice to have before GA spec:trace Related to the specification/trace directory priority:p2 Medium priority level labels Aug 25, 2020
@Oberon00
Copy link
Member Author

Oberon00 commented Aug 25, 2020

CC @tedsuo I think you said in last week's SIG meeting that this context transport from extract to inject should work.

@Oberon00 Oberon00 added spec:context Related to the specification/context directory spec:baggage Related to the specification/baggage directory labels Aug 25, 2020
@@ -219,8 +219,7 @@ the entire operation and, optionally, one or more sub-spans for its sub-operatio
- The span name
- An immutable [`SpanContext`](#spancontext) that uniquely identifies the
`Span`
- A parent span in the form of a [`Span`](#span), [`SpanContext`](#spancontext),
or null
- A parent span in the form of a (possibly empty) [`Context`](../context/context.md)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also allow just Span in case people are not using Context

Copy link
Member Author

@Oberon00 Oberon00 Aug 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should people not use context? It is possible to support this but it would introduce a complication. We would have to specify: "If a Span is passed as Context, the parent Context of the child span is the parent Context of the parent span (i.e., the grandparent Context) with the Span in it replaced with the parent Span".
EDIT: As noted in the sdk part of this PR, a span can be in multiple contexts, therefore it is impossible to map a Span back to a Context. Thus passing a Span instead of a Context implies data-loss.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In JS I know it is a problem

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also in some cases you start an operation and directly interact with the Span.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Context is already the only way to inject/propagate a Span. So both of the problems you mention are necessarily already solved by all SIGs that implement the OTEP66 propagation spec.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Oberon00 believe me or not, I saw lots of cases where just Span could be provided (no longer work at Google to show more examples). I do believe this is a mistake to not support Span as well. Don't know how to convince besides waiting for a language to implement this and make all changes to all the instrumentations to see there will be cases where this is not possible.

Copy link
Member Author

@Oberon00 Oberon00 Sep 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In these cases, you can also not inject. Or rather, you can by using something like TracingContextUtil.currentContextWithSpan. This at least makes it more explicit that you are dropping information (and more cumbersome, which I think is OK here, as it nudges you to preserve the full Context).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree with this and think it should be only a Span/SpanContext. Making the parent a Context means there is an association between the Span and the Baggage. I may be wrong but my understanding was that they are separate concerns.

If the issue is returning to the parent but not losing the baggage (or anything else in the context) then the fix should be to not reset the entire context.

In the Erlang implementation the Context contains a list of SpanContexts. When the current span is ended it is popped from the head of the list, the rest of the Context remains unchanged.

You could call the list of SpanContexts the TraceContext, which is just one part of the total Context.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my understanding was that they are separate concerns.

My understanding was that while they are separate concerns, we do want to be able to cross-reference and connect different concerns, and the Context should be the place where everything comes together. That was how I understood OTEP66, and also our current propagator spec. But I guess this needs a decision. If we don't want this connection (#867, #381 seem to imply we do), then I agree that this PR is not the way we should go and @anuraaga's idea of extending SpanContext is better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, I will self apply the rule of "minimal api" :). We know for sure we need the "context" parent so it is great, for the "Span" parent we may or may not have good use-cases, so maybe add it later if that is the case.

So happy for me to start with only accepting Context for this because we can add later the "Span".

@Oberon00

This comment has been minimized.

@carlosalberto

This comment has been minimized.

@Oberon00

This comment has been minimized.

@carlosalberto

This comment has been minimized.

@Oberon00
Copy link
Member Author

Oberon00 commented Sep 1, 2020

I created a prototype for Java at open-telemetry/opentelemetry-java#1611

@Oberon00
Copy link
Member Author

Oberon00 commented Sep 1, 2020

@carlosalberto I don't fully understand the issues with baggage in the OpenTracing shim, but would putting the baggage in the context become easier with this PR? #768

@anuraaga
Copy link
Contributor

anuraaga commented Sep 2, 2020

When reviewing open-telemetry/opentelemetry-java#1611, came up with these high level comments. I think the code demonstrates that a lot of complexity is added (deleting methods is reducing API surface, but the complexity of the code went up a lot which isn't captured in the lines diff) and couldn't see us getting much from it.

I feel this is conflating two unrelated ideas, the parent / child relationship of spans (a logical connection of IDs) and a mechanism for propagating data within a process, often across threads (Context). The end result is contexts have spans, and spans also have contexts - it's not quite circular but it's not a logical relationship I think.

If someone needs to propagate data along with the span, there's no need to do it with as part of parent/child relationship I think, the data can be added to the Context next to the span, e.g., after creating the span, Context.withValue(valuetopropagate, Context.current()). If we really need a simpler API for adding arbitrary information to go along with the span (that is how I'm reading the rationale in #875), we could have a first-class field for extra data like Brave. But I find it confusing if we try to use the span parent for this sort of data.

@Oberon00
Copy link
Member Author

Oberon00 commented Sep 2, 2020

the code demonstrates that a lot of complexity is added

I think the complexity is mostly added by changing existing unit tests. Real code will mostly deal in (current) contexts anyway. I do not see a fundamental reason why this would increase complexity.

@Oberon00
Copy link
Member Author

Oberon00 commented Sep 2, 2020

data can be added to the Context next to the span, e.g., after creating the span, Context.withValue(valuetopropagate, Context.current())

That's exactly what this PR enables. Before this, such data would not be passed to samplers or exporters. Also, users would easily forget to propagate the context cross-thread (/cross-async task) since the API currently makes it the default to propagate only a parent span instead of the full parent context.

@Oberon00
Copy link
Member Author

Oberon00 commented Sep 2, 2020

This is actually required by OTEP 66 (https://github.com/open-telemetry/oteps/blob/master/text/0066-separate-context-propagation.md#observability-api):

Note that OpenTelemetry APIs calls should always be given access to the entire context object, and never just a subset of the context, such as the value in a single key.

By giving only the span instead of the entire context, we work against the spirit of OTEP 66.

@carlosalberto
Copy link
Contributor

Waiting for more prototypes to validate this, but otherwise LGTM.

@Oberon00
Copy link
Member Author

Oberon00 commented Sep 21, 2020

@bogdandrutu

But the context is no longer available to the exporter now (unlike the w3c tracestate)

Actually, thinking about it again, this should still be doable with a little extra work if we have a readable span in the parent Context. E.g. in Java this would work with a (checked) cast from Span to ReadableSpan. From there you could copy certain attributes to child spans, and thus use span attributes to store custom values instead of the tracestate. The initial extract would set them directly on the Context. So in OnStart(Span started, Context parent) you would do something like

long myvalue1 = 0;
String myvalue2 = null;

// Read from parent
final Span parentSpan = TracingContextUtils.getSpan(parent);
if (parentSpan instanceof ReadableSpan) { // Check span first, context values will be copied over & may be older.
  final SpanData sd = parentSpan.toSpanData();
  myvalue1 = sd.getAttributes().getLongAttribute("_myvendorcustom.myvalue1");
  myvalue2 = sd.getAttributes().getStringAttribute("_myvendorcustom.myvalue2");
} else {
  final MyCustomValues myCustomValues = MY_CUSTOM_VALUE_KEY.get(parent);
  if (myCustomValues != null) {
    myvalue1 = myCustomValues.myvalue1;
    myvalue2 = myCustomValues.myvalue2;
  }
}

// Propagate to child
started.setAttribute("_myvendorcustom.myvalue1", myvalue1);
started.setAttribute("_myvendorcustom.myvalue2", myvalue2);

which is rather ugly now that I write it out, but we might improve on that in the future.

@tsloughter
Copy link
Member

"no longer available to the exporter"

I think that is related to one of my questions for the meeting today. If Context is available in OnStart shouldn't it be available in OnEnd. I've argued against ending a Span with a method on the Span like Span.End() before and I think this shows why that is important, unless we can get away without needing Context in the processor.

@Oberon00
Copy link
Member Author

Which context would you want to pass in end? The context in which it was ended? Then you can just use Context.current(). The parent Context? Then you would need to store it inside the Span, which you (and others) said we should not do.

@carlosalberto
Copy link
Contributor

I suggest we don't talk about potential improvements and additions in order to not lose focus, unless it's something truly, absolutely needed by the issue/PR at hand.

It looks like @tsloughter @codeboten are working on prototypes of this, so please comment/approve whenever you are able to provide your feedback :)

@Oberon00
Copy link
Member Author

Oberon00 commented Sep 22, 2020

@carlosalberto If you want to wait for more prototypes, I suggest you mark this as "request changes".

@carlosalberto carlosalberto merged commit 8969a24 into open-telemetry:master Sep 23, 2020
@arminru arminru deleted the contextparent branch September 23, 2020 14:54
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:api Cross language API specification issue needs discussion Need more information before all suitable labels can be applied priority:p1 Highest priority level release:required-for-ga Must be resolved before GA release, or nice to have before GA spec:baggage Related to the specification/baggage directory spec:context Related to the specification/context directory spec:trace Related to the specification/trace directory
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider using Context as the unique way to specify parenthood
10 participants