Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unseal SpanContext #58

Closed
117 changes: 117 additions & 0 deletions text/0058-unseal-spancontext.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
# Opening Span's Context

Remove the proposition that "`SpanContext` MUST be a final (sealed) class.".

## Motivation

Currently, SpanContext is the only class in the specification that has the limitation of being final placed upon it.
I also haven't heard a good argument for having a final `SpanContext` so this seems an arbitrary restriction.

On the other hand, a non-final SpanContext does have advantages for vendors' flexibility. For example, they could:

* Add custom routing information to the SpanContext more efficiently than by using the string-based mapping in TraceState.
* Generate Span IDs lazily (e.g., if a Span only has internal parents and children, a mechanism that is more efficient than a combined 128 + 64 bit identifier could be used).
* Count the number of times a SpanContext was injected to better estimate on the back end how long to wait for (additional) child Spans.

## Explanation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposal


SpanContext remains as a non-abstract class implemented in the API, but they are non-final. Note that the proposition

> SpanContexts are immutable.

only applies to the part of the SpanContext that is publicly exposed on the API level.
Actual vendor implementations could have additional mutable properties or
actually have the first access to the `TraceId` property generate the TraceId lazily. Vendors must take care not to break any assumptions by doing that. For example, if a language allows copying a SpanContext, vendors should better put all mutable data behind an immutable reference so that all copies remain in a consistent state.

## Internal details

The changes in the initial summary of this OTEP shall be applied to the specification at
https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/api-tracing.md#spancontext.

Language libraries that used a "final" or "sealed" keyword or something similar in their API definition of SpanContext must remove it.

The reference SDK implementation will continue to use the API's SpanContext implementation without deriving from it.
Additionally, all public APIs of the SpanContext must be overridable
(they must not be public data fields in languages where these are not overridable,
languages that require an explicit marker like `virtual` to make an API overridable must add it, etc.).

Existing propagators
(which should be implemented in the SDK package or a separate package)
can continue to directly construct a SpanContext.
API implementations that want to use a custom SpanContext implementation must either
handle having a plain SpanContext as parent
and having additional properties stripped upon extracting
or they provide their own propagator implementations too.

## Trade-offs and mitigations

The decision to have SpanContext as a non-abstract class in the API results from the goal to keep compatibility.
The cleaner alternative would be to have SpanContext be an interface like all other API classes.

Since the implementation of propagators is on the SDK level, all public APIs could be removed from the SpanContext
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary, there is an argument to have default propagation in the no-op implementation of the API.

Copy link
Member Author

@Oberon00 Oberon00 Oct 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, the API could use it's own private implementation on top of the "interface"-level implementation. But I do not really think we should pursue this approach as it would be too huge of a change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't there a requirement to have propagation of w3c tracecontext be the default behavior of the API? That would imply that some level of propagation works with the API exclusively.

so that it becomes an object that is completely opaque (to everyone but the propagator implementations).
This would enable even more flexibility or at least spare vendors from adding boilerplate code that, e.g., calculates TraceIds
if there is no such concept in their own system.
Copy link
Member

@yurishkuro yurishkuro Oct 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't push for this argument, because that was the approach in OpenTracing and there was a lot of grief from people who simply wanted to log the damned trace id.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you just want something to log, then then you could have string properties for Trace and Span ID without forcing them to have a particular format or be the canonical representation. Even a ToString override may be sufficient for that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to have guarantees of the length and format in some case, even for logging. I think this was discussed as a requirement for OpenTelemetry from the beginning and cannot be changed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These guarantees (such as ASCII-only, no spaces) could be documented for API-implementations to follow instead of enforced by fixing certain types for them.

But I fully agree that this cannot really be changed at this point (the cost-benefit ratio would be very bad here).


In a (currently hypothetical) C++ implementation of OpenTelemetry (and possibly some other languages),
having the SpanContext be a "polymorphic", potentially derived class is a big difference to
the old SpanContext that might be implemented as a simple data structure with a few data fields.
These languages could opt to make SpanContext effectively a "smart pointer".
That is, the SpanContext could be implemented as a final class containing nothing but a reference to the actual SpanContextData (which is not final) and non-virtual functions that delegate to the corresponding virtual functions of the SpancontextData class.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a huge performance hit for languages like C++, because you require refcounting for the SpanContext now. I think this is a big mistake, and some companies may refuse to use this. We need to propose a solution that does not force heap allocation + refcounting for C++ (maybe golang as well where you need this to be an interface now).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That huge performance hit is already incurred for Spans though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, which is a problem, but you are doubling that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think making SpanContext into an interface == doubling the allocations. The Span type can easily implement SpanContext interface.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I misread your first comment a bit. You were complaining about the reference counting! This is not necessarily required, if you don't need the smart pointer to be copyable, you can use a unique_ptr-like zero-cost abstraction. Also, I wrote "could opt to". But thinking about it now, it seems in my head we already had a C++-implemenation were we need to care about minimizing breaking changes. In a new implementation you would probably just return a unique_ptr and have users use -> instead of . I guess I should just remove that paragraph.

If Span implements SpanContext, that would be a bit of a problem because then who is responsible for deleting the span is different between SpanContexts extracted from remote requests and SpanContexts that refer to inproc Spans. That could be solved with a custom smart pointer (nearly zero-cost, I'm thinking about a bool shouldDelete member) though.

Anyway, I feel this conversation is drifting too much into C++-specifics. I guess I should really remove that paragraph.


(Hypothetical) implementations of the OpenTelemetry specification that value performance highly over flexibility might choose to ignore this OTEP entirely, but these probably will not even be split into an API and SDK part.

## Prior art

The topic was briefly discussed in
https://github.com/open-telemetry/opentelemetry-specification/pull/244
("Fix" contradictory wording in SpanContext spec). Before this PR, the specification contained the following sentence that directly contradicted the requirement that a SpanContext should be final (that requirement was there too):

> `SpanContext` is represented as an interface, in order to be serializable into a wider variety of trace context wire formats.

In OpenTracing, the SpanContext was just an interface with very few public methods
(in particular, it did not assume the existence of a thing such as Trace or Span ID):

> The `SpanContext` is more of a "concept" than a useful piece of functionality at the generic OpenTracing layer. That said, it is of critical importance to OpenTracing *implementations* and does present a thin API of its own. Most OpenTracing users only interact with `SpanContext` via [**references**](https://github.com/opentracing/specification/blob/master/specification.md#references-between-spans) when starting new `Span`s, or when injecting/extracting a trace to/from some transport protocol.

(quoted from https://github.com/opentracing/specification/blob/master/specification.md#spancontext)

In principle, the same still applies to OpenTelemetry:
SpanContext has no functionality that is useful outside of propagation or implementations of Spans and Tracers.
However, we chose to expose the "internal details" of the W3C context in the API.

## Alternatives

An alternative (or addition) to making SpanContext non-final is to reduce the places where it is used.
In particular, `inject` could take a full Span as parameter.
This would also make it harder to accidentally inject the parent SpanContext
(that was initially extracted).
If `inject(SpanContext)` is still needed, it could be provided as an additional overload
or renamed to `forward(SpanContext)`.
In the most radical implementation of this approach, SpanContext could entirely disappear,
if we also change `extract(source) -> SpanContext` + `startSpan(parent: SpanContext) -> Span` to
a single `startSpanWithRemoteParent(source, spanName, ...) -> Span`.

Yet another alternative
(with different caveats)
would be to have SpanContext have a reference to the corresponding Span if any.
Then API implementations could store any additional information in the Span.
Upon extraction they would either have to fall back to a `null` Span reference on the SpanContext
and storing any data in the TraceState as strings
or they could set a special pseudo-Span to store information
(although that interferes with the reasonable assumption that
`SpanContext.getSpan() == null` iff `SpanContext.isRemote()`).

## Open questions

None known yet.

## Future possibilities

It is not expected that the OpenTelemetry reference implementation of the SDK or API benefits much from this change.
The reference SDK is blessed here
because if a new functionality is required in SpanContext (e.g., [IsRemote][])
it can "just" be added to the API implementation.
But for example, vendors who want to store attributes that are only useful for their special implementation in the SpanContext can now do that.

[IsRemote]: https://github.com/open-telemetry/opentelemetry-specification/pull/216