-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove "hexadecimal" restriction #16
Comments
Couple of things:
|
well, if we're talking about HTTP header spec, isn't that implied?
https://www.google.com/search?q=define+hexadecimal says "relating to or using a system of numerical notation that has 16 rather than 10 as its base." The point is, the current spec exclude tracing systems that might choose a different encoding or length of the trace ID. Adrian had an argument about storage implication, which is valid in the sense that yes there are implications, but it's not impossible to resolve. In the best case the two systems use "compatible" trace-ids and can use them directly. In the worse case the receiving instrumentation will record the incoming trace-id as a correlation id. |
Hi, Yuri.
What format does jaeger require that is not hexidecimal? What would happen
if it were not? How would you handle it in jaeger?
It is important we dont add technical debt in the process of doing this
work. Every constraint removed comes at a cost. It would be nice to know
you thought this through, especially as it can help others think this
through.
|
+1 about removing the base16 and length things. Skywalking is using three longs for ID(trace segment id and trace id). So length changes for every generated id. If we follow the spec, we must cost more CPU and memory for the ID. This is not good. Right now, skywalking can only consider to add a new HEAD for this spec if some people want this interop feature. Put two heads, one for skywalking, one for TraceContext Spec. It can work, but it is the best solution. :) But still, I hope the spec can make the skywalking supporting easier. |
@adriancole Right now, skywalking is considering put the ids(traceId and spanId) from outer tracer into tags of the EntrySpan. |
+1 about removing the base16 and length things. Skywalking is using three
longs for ID(trace segment id and trace id). So length changes for every
generated id. If we follow the spec, we must cost more CPU and memory for
the ID. This is not good.
Hi, sheng. do you need to remove both constraints or just one (length)? On
the technical side, I think there's been some benchmarking around encoding,
which only affects you as you exit the process (this spec has no concern
about in-memory structure). For example, hexadecimal is quite fast to
encode, well under microsecond for the entire context, maybe faster than a
formatted Date header :)
|
the other thing about encoding, if for memory or perf concerns, I suspect
we will have a binary format defined at some point. For example, there's
already one under experiment in census
<https://github.com/census-instrumentation/opencensus-specs/blob/master/encodings/BinaryEncoding.md>,
which is laid on http/2 transport in base64 (platform-specific grpc
encoding, but worth mentioning)
|
Hi Adrian,
I am not saying that general strings are required or even supported by Jaeger. Jaeger is using 128bit byte array as a trace ID, and cannot support arbitrary string IDs without substantial changes to the backend and client libs. The way I would handle incoming string ID is by recording it as a correlation ID tag. But it's not about Jaeger, it's about inclusiveness of the spec. Maybe we should start with the goals of this spec, what exactly it is trying to achieve. There are different possible interop modes. |
correlation by saving a string to a database allows offline correlation,
but breaks propagation interop, which is the primary goal of the project. I
think there certainly is space to define what we are defining by "context"
when we say this, but we have defined it! So, my suggestion is that when we
invalidate things, we provide for how to retain the charter while doing
this. If the charter is passing trace context, and we accept incompatible
data, we can't punt the responsibility to keep interop working in some
fashion. IOTW hope is not a specification! For example, a lot of this
tension is about a single header. We can in fact define means by which you
can pass incompatible format while still continuing the trace. Otherwise,
the spec grows to a correlation algorithm, right?
A trace context header is used to pass trace context information across
systems for a HTTP request. Our goal is to share this with the community so
that various tracing and diagnostics products can operate together, and so
that services can pass context through them, even if they're not being
traced (useful for load balancers, etc.)
|
can we define what the charter is? it's not stated anywhere in the repository, hence the confusion. |
The goal is mentioned here, we can top-level it, good point
https://github.com/TraceContext/tracecontext-spec/blob/master/HTTP_HEADER_FORMAT.md#trace-context-http-header-format
|
Let's discuss the charter in #17 |
copied missing data in
#18
|
Skywalking id is And as you known, skywalking has a way to resolve this: support this by using a new head for interop and do not change the sw3-head. In that way, cost of encoding and size of http package will change, right? Maybe not big, but for a very high throughputs app, like some 10K tps application in Chinese e-commerce and telecom systems (large population -> large system...), everything changed, and I have to treat all perf related risks very carefully. Btw, these users have such high throughputs, but can't accept sampling... So you can image what is the situation skywalking facing...... |
Skywalking id is long1.long2.long3. The dot is not part of base16 and long is not the certain length literally(After encoding, sure, it can be). So if the both constraints removed, skywalking will be no need to add new head, just need to separate the current sw3-head into two parts. This is my best choice.
OK, what would you do if you were given the following ID also in the
trace-context header
trace-context: Para bailar la bamba Para bailar la bamba se necesita
una poca de gracia Una poca de gracia pa' mi pa' ti y arriba y arriba
Ah y arriba y arriba por ti seré, por ti seré, por ti seré-Bamba
bamba-1
What would you propagate in order to continue that trace?
(intentionally being funny, but seriously we need to answer things
like this.. usually we think of our own IDs in isolation, but that's
not the case. Remember henry from china telecom who had 5 trace
systems to propagate to?)
https://twitter.com/henrychenyong/status/901738274632425473
|
Besides the context spec, which your guys used, I can introduce two use cases of many tracing system do to propagate the context. First thing first, the reason of context includes more fieldsthan yours, is not about continue trace, they're about analysis, e.g the application relationship, the service relationship. And you can see if we run the analysis after all trace segments(spans) finished, it's clearly no need for these extra fields, but the analysis latency is very long, and cost more resources in read/write or memory at the same time. And most of tracing systems use UUID or timestamp-trusted-traceid(skywalking is a kine of implementation). So this is user case <1>, we propagate parent application code(id), parent service(operation) name etc. And many tracing system implementation, they use the same span.id style, I called it EagleEye span-id style. EagleEye is an alibaba OSS system, they shared their implementations, so many people use it. The id is like these: That is user case <2>. @adriancole So, you can see, in order to fit this spec, they will do a lot of works. Most of them are about encoding and length... |
Sheng I think you are explaining that different tracing systems have
different existing encodings and lengths. I don't think that is under
debate. Clearly there are many many existing encodings and many of which
cannot be substituted for others the same way you cannot place the contents
of the http header Date into the header Cookie and expect it to work.
Maybe if I ask another way. How would you process the ID of another system
you mentioned even if you are only doing offline correlation? What would
you do with trace context: acbdef that was given to you? What would you
propagate downstream? Would you replace the header with yours, would you
restart a trace? or would you be ok halting all tracing due to this
unexpected value?
Even in discussions of B3, until you are an intermediary it is easy to not
think things through. For example, if you propagated through a tracing
system Y and they disagreed with your format then sent a request downstream
to a skywalking system, this would break your trace in the same way as it
would break without this spec, wouldnt it?
It seems that if in absence of this spec folks can figure out ways to
coexist with B3 we might be able to do at least the same here..
|
IMO, I think for a correlation id, included trace-id and span-id of course, skywalking saved these two keys(only care max length, not use a separated key, and ascii encoding), generate a new trace. In fact even for sw3-head, we generate a new one too. The difference between these two is,
User can query trace by correlation trace-id through skywalkingUI. And the can see the correlation values if this trace has correlation ids. And for more, based on this spec, I hope all participated tracing systems, can provide a standard query condition, e.g. This can really help users to take benefits from tracing system interop. I think that matters. |
Sheng, so what we have the ability to influence here is headers, and
we've mentioned trace-context as the scope of what this header is.
let's step back a sec.
In concrete terms, a propagation system needs to push the context from
one side to another. If only doing one-time log correlation, you could
just accept the incoming trace ID write to your external system, and
be done. If wanting to allocate these things to an operation, you'd
also need a span ID. The trick is that the latter changes, it will
change possibly many times per hop.
So when we define a propagation spec, we do need to indicate what
happens when the span ID changes, similar to how the Date header is
defined as the time a message originated, and placing 2029 as a date
is unexpected, the span part should ideally represent something. In
the way we've discussed this so far, that span part is the previous
remote operation.
So, if we remove the formatting constraint, here's what happens to propagation.
1. we can still pass the context from one side of the process to the other
2. we can tokenize IDs and log and/or add tags for out-of-band processing
However, we no longer have a fixed contract of any kind with trace
implementations. Does this matter?
1. when allocating a remote parent to a new child, we could imply all
tracing systems support linking (to establish a relationship with a
too wide ID). This has a cost to it
2. when sending a context to a remote party, we could send too little
info, requiring again linking or filling or something else.
3. the trace id itself may not fit in either side, eliminating the
ability to easily correlate. Some may prefer to overwrite the trace ID
into something they can understand.
Basically, I question the value of these lookups. Do you feel otherwise?
Sometimes I wish we would just make test cases as it takes hours to
discuss in abstract things that are obvious in implementation.
|
I want to clarify what this issue was about. One aspect of it was about variable-length IDs, which is currently discussed in @bhs's PR #19. The other aspect was about the encoding. For example, suppose some tracing system generates trace-ids in the form
Does that mean that the only acceptable un-encoded representation of a trace-id is opaque byte array? I.e. no system receiving Trace-Context header should attempt to interpret the parts of that byte array since it has no guarantee it's a value compatible with its internal semantics (like two longs in Skywalking). |
@yurishkuro I think you are talking in the same issue about completely different problems. I am not saying that they not need to be discussed but the initial issue was about removing "hexadecimal" restriction. |
Nobody seems to object to the existing fix-length hex encoding, so closing this. |
This has been discussed in the original PR (starting from #1 (comment)) but got lost/ignored. In the current form the spec excludes tracing systems that may be using differently formatted strings for trace and/or span IDs.
The text was updated successfully, but these errors were encountered: