-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unseal SpanContext #58
Changes from 12 commits
3e8ca3d
6e01d0d
6aa8644
8efc035
85f03f1
a45f3a1
b3201a2
e47df04
55b6cde
4d09ced
36b78d7
f20d87d
182eaa9
d35a062
6da693c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,117 @@ | ||
# Opening Span's Context | ||
|
||
Remove the proposition that "`SpanContext` MUST be a final (sealed) class.". | ||
|
||
## Motivation | ||
|
||
Currently, SpanContext is the only class in the specification that has the limitation of being final placed upon it. | ||
I also haven't heard a good argument for having a final `SpanContext` so this seems an arbitrary restriction. | ||
|
||
On the other hand, a non-final SpanContext does have advantages for vendors' flexibility. For example, they could: | ||
|
||
* Add custom routing information to the SpanContext more efficiently than by using the string-based mapping in TraceState. | ||
* Generate Span IDs lazily (e.g., if a Span only has internal parents and children, a mechanism that is more efficient than a combined 128 + 64 bit identifier could be used). | ||
* Count the number of times a SpanContext was injected to better estimate on the back end how long to wait for (additional) child Spans. | ||
|
||
## Explanation | ||
|
||
SpanContext remains as a non-abstract class implemented in the API, but they are non-final. Note that the proposition | ||
|
||
> SpanContexts are immutable. | ||
|
||
only applies to the part of the SpanContext that is publicly exposed on the API level. | ||
Actual vendor implementations could have additional mutable properties or | ||
actually have the first access to the `TraceId` property generate the TraceId lazily. Vendors must take care not to break any assumptions by doing that. For example, if a language allows copying a SpanContext, vendors should better put all mutable data behind an immutable reference so that all copies remain in a consistent state. | ||
|
||
## Internal details | ||
|
||
The changes in the initial summary of this OTEP shall be applied to the specification at | ||
https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/api-tracing.md#spancontext. | ||
|
||
Language libraries that used a "final" or "sealed" keyword or something similar in their API definition of SpanContext must remove it. | ||
|
||
The reference SDK implementation will continue to use the API's SpanContext implementation without deriving from it. | ||
Additionally, all public APIs of the SpanContext must be overridable | ||
(they must not be public data fields in languages where these are not overridable, | ||
languages that require an explicit marker like `virtual` to make an API overridable must add it, etc.). | ||
|
||
Existing propagators | ||
(which should be implemented in the SDK package or a separate package) | ||
can continue to directly construct a SpanContext. | ||
API implementations that want to use a custom SpanContext implementation must either | ||
handle having a plain SpanContext as parent | ||
and having additional properties stripped upon extracting | ||
or they provide their own propagator implementations too. | ||
|
||
## Trade-offs and mitigations | ||
|
||
The decision to have SpanContext as a non-abstract class in the API results from the goal to keep compatibility. | ||
The cleaner alternative would be to have SpanContext be an interface like all other API classes. | ||
|
||
Since the implementation of propagators is on the SDK level, all public APIs could be removed from the SpanContext | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not necessary, there is an argument to have default propagation in the no-op implementation of the API. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In that case, the API could use it's own private implementation on top of the "interface"-level implementation. But I do not really think we should pursue this approach as it would be too huge of a change. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Isn't there a requirement to have propagation of w3c tracecontext be the default behavior of the API? That would imply that some level of propagation works with the API exclusively. |
||
so that it becomes an object that is completely opaque (to everyone but the propagator implementations). | ||
This would enable even more flexibility or at least spare vendors from adding boilerplate code that, e.g., calculates TraceIds | ||
if there is no such concept in their own system. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wouldn't push for this argument, because that was the approach in OpenTracing and there was a lot of grief from people who simply wanted to log the damned trace id. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If you just want something to log, then then you could have string properties for Trace and Span ID without forcing them to have a particular format or be the canonical representation. Even a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You need to have guarantees of the length and format in some case, even for logging. I think this was discussed as a requirement for OpenTelemetry from the beginning and cannot be changed. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These guarantees (such as ASCII-only, no spaces) could be documented for API-implementations to follow instead of enforced by fixing certain types for them. But I fully agree that this cannot really be changed at this point (the cost-benefit ratio would be very bad here). |
||
|
||
In a (currently hypothetical) C++ implementation of OpenTelemetry (and possibly some other languages), | ||
having the SpanContext be a "polymorphic", potentially derived class is a big difference to | ||
the old SpanContext that might be implemented as a simple data structure with a few data fields. | ||
These languages could opt to make SpanContext effectively a "smart pointer". | ||
That is, the SpanContext could be implemented as a final class containing nothing but a reference to the actual SpanContextData (which is not final) and non-virtual functions that delegate to the corresponding virtual functions of the SpancontextData class. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a huge performance hit for languages like C++, because you require refcounting for the SpanContext now. I think this is a big mistake, and some companies may refuse to use this. We need to propose a solution that does not force heap allocation + refcounting for C++ (maybe golang as well where you need this to be an interface now). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That huge performance hit is already incurred for Spans though. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, which is a problem, but you are doubling that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think making SpanContext into an interface == doubling the allocations. The Span type can easily implement SpanContext interface. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think I misread your first comment a bit. You were complaining about the reference counting! This is not necessarily required, if you don't need the smart pointer to be copyable, you can use a unique_ptr-like zero-cost abstraction. Also, I wrote "could opt to". But thinking about it now, it seems in my head we already had a C++-implemenation were we need to care about minimizing breaking changes. In a new implementation you would probably just return a unique_ptr and have users use If Span implements SpanContext, that would be a bit of a problem because then who is responsible for deleting the span is different between SpanContexts extracted from remote requests and SpanContexts that refer to inproc Spans. That could be solved with a custom smart pointer (nearly zero-cost, I'm thinking about a Anyway, I feel this conversation is drifting too much into C++-specifics. I guess I should really remove that paragraph. |
||
|
||
(Hypothetical) implementations of the OpenTelemetry specification that value performance highly over flexibility might choose to ignore this OTEP entirely, but these probably will not even be split into an API and SDK part. | ||
|
||
## Prior art | ||
|
||
The topic was briefly discussed in | ||
https://github.com/open-telemetry/opentelemetry-specification/pull/244 | ||
("Fix" contradictory wording in SpanContext spec). Before this PR, the specification contained the following sentence that directly contradicted the requirement that a SpanContext should be final (that requirement was there too): | ||
|
||
> `SpanContext` is represented as an interface, in order to be serializable into a wider variety of trace context wire formats. | ||
|
||
In OpenTracing, the SpanContext was just an interface with very few public methods | ||
(in particular, it did not assume the existence of a thing such as Trace or Span ID): | ||
|
||
> The `SpanContext` is more of a "concept" than a useful piece of functionality at the generic OpenTracing layer. That said, it is of critical importance to OpenTracing *implementations* and does present a thin API of its own. Most OpenTracing users only interact with `SpanContext` via [**references**](https://github.com/opentracing/specification/blob/master/specification.md#references-between-spans) when starting new `Span`s, or when injecting/extracting a trace to/from some transport protocol. | ||
|
||
(quoted from https://github.com/opentracing/specification/blob/master/specification.md#spancontext) | ||
|
||
In principle, the same still applies to OpenTelemetry: | ||
SpanContext has no functionality that is useful outside of propagation or implementations of Spans and Tracers. | ||
However, we chose to expose the "internal details" of the W3C context in the API. | ||
|
||
## Alternatives | ||
|
||
An alternative (or addition) to making SpanContext non-final is to reduce the places where it is used. | ||
In particular, `inject` could take a full Span as parameter. | ||
This would also make it harder to accidentally inject the parent SpanContext | ||
(that was initially extracted). | ||
If `inject(SpanContext)` is still needed, it could be provided as an additional overload | ||
or renamed to `forward(SpanContext)`. | ||
In the most radical implementation of this approach, SpanContext could entirely disappear, | ||
if we also change `extract(source) -> SpanContext` + `startSpan(parent: SpanContext) -> Span` to | ||
a single `startSpanWithRemoteParent(source, spanName, ...) -> Span`. | ||
|
||
Yet another alternative | ||
(with different caveats) | ||
would be to have SpanContext have a reference to the corresponding Span if any. | ||
Then API implementations could store any additional information in the Span. | ||
Upon extraction they would either have to fall back to a `null` Span reference on the SpanContext | ||
and storing any data in the TraceState as strings | ||
or they could set a special pseudo-Span to store information | ||
(although that interferes with the reasonable assumption that | ||
`SpanContext.getSpan() == null` iff `SpanContext.isRemote()`). | ||
|
||
## Open questions | ||
|
||
None known yet. | ||
|
||
## Future possibilities | ||
|
||
It is not expected that the OpenTelemetry reference implementation of the SDK or API benefits much from this change. | ||
The reference SDK is blessed here | ||
because if a new functionality is required in SpanContext (e.g., [IsRemote][]) | ||
it can "just" be added to the API implementation. | ||
But for example, vendors who want to store attributes that are only useful for their special implementation in the SpanContext can now do that. | ||
|
||
[IsRemote]: https://github.com/open-telemetry/opentelemetry-specification/pull/216 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proposal