-
Notifications
You must be signed in to change notification settings - Fork 182
Should there be a SpanContext.DebugCorrelationID
or similar?
#24
Comments
(cc @lookfwd) |
One important detail on this, is that I think beyond any IDs that represent the trace, an "implementation identifier" or some other form of namespace should also be provided. This way one avoids interpreting IDs from tracer of type |
I can see an implementation of that in Go. That's nice. I see the point. If you inject all the IDs on a log, you will be able to do a quick "grep" (or whatever) and be sure you've got all the info out. Otherwise implementations will have to hit an OT API to get all related IDs for a trace and then do a "grep" on any of the ids. |
One argument I can see is this: If a tracer wants to support injecting all the trace-ids always (thus print them all and avoid the OT-API to get related IDs), it must limit itself to only support multiple parents at pseudocode:
Otherwise, mainstream tracers that allow late join's to extra trace-ids can only support the "print a/any single tracedid/spanid" model and then there will be an API with "get relevant IDs" and a bulk search. pseudocode:
I can't see any solid hybrid-model between those two. |
here's some related commentary which might be of interest openzipkin/openzipkin.github.io#48 (comment) |
Good to keep in mind |
Trace ID/context propagation generally happens regardless of the sampling. At least in our setup trace context is always propagated in-band, in part because it also carries baggage which some functionality relies on always being there.
Only when it comes to profiling and reporting spans out of process. But as a distributed context propagation it is not optional. If it were optional, then any other form of "unique ID" propagation would be in the same boat, making the whole point of correlating logs moot. |
What about traces not being sampled and potentially not propagating
downwards?
It is definitely the case in B3 most do not propagate unsampled trace IDs.
This doesn't mean there's valid reasons to do so, just that if we aren't
attempting to rewrite history, the concern is valid.
|
If this helps, you can have people return e.g. |
@lookfwd fwiw in the next version of brave (zipkin java tracer), I'll be propagating the context regardless of sampling decision. will also try and see if it catches with other languages. old instrumentation won't do this, though. on what to print, incidentally without noticing this I was already doing traceid/spanid in my context toString |
nice, makes sense!
Ok. Maybe, if we all see the point/usefulness of this one, but we are aware of the practicalities of existing implementations, we could say that we:
|
We recommend there's print() related API that prints trace/span-id or
whatever related span identification mechanism an implementation uses. This
is useful for correlating traces with other logging mechanisms. This should
preferably print a string prefixed with an identifier of the tracing
implementation e.g. openzipkin:8a43f9e23. Trace/SpanIDs might not apply
on specific spans. In this case an implementation should print e.g.
zipkin:N/A. If this is because the trace wasn't sampled and you don't
propagate trace IDs while not sampling, it should print e.g. zipkin:N/S.
If more than one span/trace-ids are related to the current Span, the
decision on what exactly to print is implementation-specific. In any case,
the meaning of the printed text is implementation specific and not expected
to be interoperable. The implementation-name prefix helps you become aware
and identify cases where you switched tracer.
I think the representation is more tied to the propagation mechanics,
since the trace context is primarily about propagation in and out of
process. For example, wingtips and zipkin (and even uber/jaeger) can share
the same b3 propagation approach, but this doesn't mean they are the same
system. The essential difference here is that in-band compatibility doesn't
require out-of-band, eventhough it is sometimes the case. For example, I
would be highly surprised if a common propagation header wasn't easier to
get people to share than an out-of-band format.
|
@lookfwd @adriancole this brings up an interesting question: what if |
@lookfwd <https://github.com/lookfwd> @adriancole
<https://github.com/adriancole> this brings up an interesting question:
what if SpanContext had required (and trivial) methods called something
like SpanContext.formatName() and SpanContext.formatVersion()
I've been using idString() for the former (since toString isn't a
well-defined semantic), though in some cases you can encode the entire
context as a string (including flags), and there you might not care to
expose multiple methods for pieces of the same thing.
and SpanContext.formatVersion() which would be defined to return a
strings which together uniquely identify the, well, propagation format.
format version seems to hint representation..
And, further, if there was a SpanContext.debugId() that, in conjunction
with the format name+version, uniquely identified the *SpanContext*. This
is different than necessarily uniquely identifying the trace, as I still
think that's pretty ill-defined in a DAG model.
no comment on this one, yet, need to think about it.. debugId seems
overloaded term.
Thinking through all of this.. should we not just treat this as a form of
injection? Ex if the goal is to copy the context to logging context, that
is the same mechanics as injection, just a different (and library specific)
carrier
|
That's what I've sometimes said in the past... people usually respond with something about baggage :) |
That's what I've sometimes said in the past... people usually respond with
something about baggage :)
well, I suppose what I mean here is that I think of injector as a utility
class. Regardless of where you get the tracecontext from (opentracing's
hooks or some other propagation system), there is utility in extracting
identifiers from it. In ideal case, this is a loose contract which doesn't
require OT to step-in, for example, here's the schema for this key (eg. the
key "uber-trace-context". those attaching to logging context doesn't need
to be defined in OT, but if it were, it would have a similar mechanic to
injection bc you are pushing keys into something you don't control.
/me rambles...
|
that makes sense, @adriancole... |
If I get it right, by saying it's injection one should use the |
I apologize for coming late to the party with concerns, but this seems a bit over-engineered to me. What's needed is an implementation specific way to represent a span as a string. It's hard for me to imagine anything that can't be represented as a string. The API doesn't need to know how the implementer chooses to represent it. Further the injection idea seems to sidestep the problem. If I can represent a text map carrier as a string, and the proposition is to inject the ids into a text map carrier, then I can represent the ids as a string. I fear this is probably a case of me not understanding the problem, or focusing too much on my specific use case. More info would be appreciated. |
I agree with @SaintDubious this seems a bit over-engineered, or possibly missing what the value could be. In thinking about two main categories of OpenTracing users I see new projects with no existing logging, and legacy systems with established logging infrastructure. For the latter being able to expose a single string which can be correlated back to the trace is all we need. Whatever the name of the function (though I think Which is to say the actual string output of the function doesn't matter, as long as it can link data found in logs back to a trace, and vice versa. How that's achieved is an implementation detail of the Tracer, I don't think it matters to the spec. |
These are the requirements I'm working with for this issue:
... as such, I can get behind I am less sympathetic about users who want a |
@bhs I want to briefly argue against If the output Therefore the specificity of the output is entirely up to the tracer, and if they need span level correlation, they can output it from |
Thanks @ojkelly. I would probably argue against
My two cents, anyway. |
agreed. The intention of using the word "correlation" was that all spans in a (simple) trace have the same |
Agreed. We have the |
As a minor comment - I don't like the |
@lookfwd Got your point, reasonable concern. |
VS
|
On the same spirit, I would add to @bhs 's definition that:
|
@lookfwd I think I'm convinced that "CorrelationID" is different enough from "TraceID" that people won't get confused (i.e., that we can drop the "debug" prefix). |
I think we should also think of a default value for the Noop implementation |
@lookfwd It is a little default for unknown the tracer implementations will do. With intuition, I think maybe |
Yes... something along those lines... The only drive I could see is it being compatible with (any implementation's) trace-id expected character-set... so I think I would slightly favor an empty string (!!) in contrast to something that has special characters like |
I'm not clear why a different ID is needed. It would be better to expose TraceID to the application from the sdks. TraceID is already correlating... |
@andyday what is meant by that is that a trace ID is not a required part of an implementation. The specification does not prescribe that a Tracer even have a trace ID. Also it is a Tracer concern what the correlation ID might be or look at. |
@devinsba I would think most implementations would just make correlationId an alias for traceId, but I see your point. TraceId is required but in a vague way... I'm fine with |
@bhs regarding arguing for debugSpanId - I won't, but I'd rather argue against including this:
There is value in being able to correlate log messages with particular span. Few cases come to mind:
On the other hand in cases where above don't happen logging shorter debugCorrelationId is beneficial - less storage taken, less space on the screen, easier to pass around. I could see a tracer implementation having a configuration option which controls this behavior. One more case (my case) - in our API error responses return "logref" or "X-Request-Id" header which uniquely identify the request. In practice this is spanId (we use random 128bit span ids). So there is some need for "debugSpanId", but I'm not sure if it has to exist in the OT API. And we could as well return traceId+spanId as the X-Request-Id, probably even to just the traceId (for the client using the API it will be always different - good enough) |
By the way, despite my radio silence I have been thinking a lot about this. In order to facilitate the integration of OpenTracing into things like linkerd, Envoy, Istio, and so forth, we really may need to take a stance on a "default" / generic id scheme for OpenTracing. These thoughts have in turn made me reticent to pursue the I plan to bring this up at the next OTSC meeting (scheduled for friday of next week). |
I am wondering if CorrelationID is needed. The lack of a SpanID complicates the Observer API, and if something like CorrelationID is workable at all, does that not point to exposing SpanID() and TraceID() as being more viable than previously thought? Basically, can we call this TraceID rather than introduce a new CorrelationID concept, and have TraceID be defined simply as a string or bytes of arbitrary length. Is that effectively what this is? |
There are I think four primary use cases, and much of the discussion here has been about the logging one;
While OpenTracing may be hyper-generic, I think these use cases are real, and right now we're having to use hacks like the Observer API from opentracing-contrib/java-api-extensions#3 which lose important state - just to get at /any/ id. I don't have a deep view on whether span/trace id is appropriate to use as a correlation Id, but I think the minimal API we need is to be able to query a span:
And thats the entirety of it. |
@tedsuo I tend to agree with that, and given the way TraceContext-spec is moving there doesn't seem to be anyone arguing against the notion of a single trace ID. I know @bhs had arguments in the past that there may be situations with multiple parents etc., but is there a tracing system in existence that actually does assign multiple trace IDs to a span and would have a problem with this API? This feature has been asked for so many times, I'd rather be practical than purist.
I say string - tracer should be able to represent ID as string for HTTP headers anyway, so I don't see a lot of value in returning @rbtcollins I would put case (2) aside, since we can't even agree to expose the trace ID, not to mention on forcing the tracer to accept an external one. The recommended way of tagging a trace with an external ID is by storing external ID as a tag. E.g. when you use |
I can live with a I still slightly prefer to call it a (I also think there's some sort of vague slippery-slope argument about introducing semantically-meaningful getters into the OT APIs, but I also grudgingly acknowledge how often this has come up) |
@bhs I agree that I know some Baggage methods have leaked into the Span API, but if you consider Baggage to be just on the SpanContext, then you could essentially have something like this: // SpanContext represents the propagation context for the current trace.
// In the case where they are not supported, TraceID and SpanID return empty string.
// Additional trace context can be accessed via Baggage.
type SpanContext interface {
TraceID() string
SpanID() string
Baggage(string) string
} This feels very natural to me, and fits in well with Is there any tracer involved with OT right now that would have trouble with the above interface? |
type SpanContext interface {
TraceID() string
SpanID() string
Baggage(string) string
} @tedsuo For a Producer/Consumer module in Skywalking, we have more than one traceId, if the consumer processes messages in batch mode. So So I agree with @bhs about using |
There are plenty of good (great?) arguments about moving all baggage methods (esp copy-on-write setters) to SpanContext. @bogdandrutu has suggested as much IRL, and if I could do it all again I'd have designed things that way in the first place. (Of course this can be done in a backwards-compatible way with deprecation, etc) cc @tedsuo |
I agree it's conceptually cleaner to have the baggage setters on SpanContext. The issue is immutability: once you set a new baggage item, presumably you get back a new SpanContext. But that would decouple the span context from the span. I wonder if it's better to let the tracer control how it manages that relationship, and leave the setters on the span. I definitely think SpanContext should have key lookups for baggage, not just an iterator. That would be very useful and does not come with the design cost of adding the setters. |
It would be great to finally expose TraceID or CorrelationID in SpanContext interface . |
I don't understand on how this discussion can become official spec? is there any progress on this? |
I believe this has been resolved by https://github.com/opentracing/specification/blob/master/rfc/trace_identifiers.md |
Per opentracing/opentracing-cpp#3 as well as numerous misc discussions on Gitter.
Basically, most OT implementations have some sort of trace_id under the hood. There is practical value in exposing that, but there are also immediate problems:
One option would be a
SpanContext.DebugCorrelationID
that would return a string and be a sort of "best-effort trace_id" for lack of a better word. I have mixed feelings about this but wanted to raise it for discussion and tracking.The text was updated successfully, but these errors were encountered: