Remove "hexadecimal" restriction #16

yurishkuro · 2017-09-22T17:15:45Z

This has been discussed in the original PR (starting from #1 (comment)) but got lost/ignored. In the current form the spec excludes tracing systems that may be using differently formatted strings for trace and/or span IDs.

bogdandrutu · 2017-09-22T17:24:59Z

Couple of things:

I would say base16 not "hexadecimal". "hexadecimal" means you encode a number.
You need to respect HTTP restrictions so you have to encode the "string" or "byte array" or however you call it. So we have to define what is that encoding because otherwise you put a restriction on the traceId to be HTTP compatible.

yurishkuro · 2017-09-22T18:26:29Z

because otherwise you put a restriction on the traceId to be HTTP compatible.

well, if we're talking about HTTP header spec, isn't that implied?

I would say base16 not "hexadecimal". "hexadecimal" means you encode a number.

https://www.google.com/search?q=define+hexadecimal says "relating to or using a system of numerical notation that has 16 rather than 10 as its base."

The point is, the current spec exclude tracing systems that might choose a different encoding or length of the trace ID. Adrian had an argument about storage implication, which is valid in the sense that yes there are implications, but it's not impossible to resolve. In the best case the two systems use "compatible" trace-ids and can use them directly. In the worse case the receiving instrumentation will record the incoming trace-id as a correlation id.

codefromthecrypt · 2017-09-23T01:01:42Z

Hi, Yuri. What format does jaeger require that is not hexidecimal? What would happen if it were not? How would you handle it in jaeger? It is important we dont add technical debt in the process of doing this work. Every constraint removed comes at a cost. It would be nice to know you thought this through, especially as it can help others think this through.

wu-sheng · 2017-09-23T01:01:42Z

+1 about removing the base16 and length things. Skywalking is using three longs for ID(trace segment id and trace id). So length changes for every generated id. If we follow the spec, we must cost more CPU and memory for the ID. This is not good.

Right now, skywalking can only consider to add a new HEAD for this spec if some people want this interop feature. Put two heads, one for skywalking, one for TraceContext Spec. It can work, but it is the best solution. :)

But still, I hope the spec can make the skywalking supporting easier.

wu-sheng · 2017-09-23T01:02:54Z

@adriancole Right now, skywalking is considering put the ids(traceId and spanId) from outer tracer into tags of the EntrySpan.

codefromthecrypt · 2017-09-23T01:22:22Z

+1 about removing the base16 and length things. Skywalking is using three longs for ID(trace segment id and trace id). So length changes for every generated id. If we follow the spec, we must cost more CPU and memory for the ID. This is not good.

Hi, sheng. do you need to remove both constraints or just one (length)? On the technical side, I think there's been some benchmarking around encoding, which only affects you as you exit the process (this spec has no concern about in-memory structure). For example, hexadecimal is quite fast to encode, well under microsecond for the entire context, maybe faster than a formatted Date header :)

codefromthecrypt · 2017-09-23T01:25:19Z

the other thing about encoding, if for memory or perf concerns, I suspect we will have a binary format defined at some point. For example, there's already one under experiment in census <https://github.com/census-instrumentation/opencensus-specs/blob/master/encodings/BinaryEncoding.md>, which is laid on http/2 transport in base64 (platform-specific grpc encoding, but worth mentioning)

yurishkuro · 2017-09-23T01:26:30Z

Hi Adrian,

What format does jaeger require that is not hexidecimal? What would happen if it were not? How would you handle it in jaeger?

I am not saying that general strings are required or even supported by Jaeger. Jaeger is using 128bit byte array as a trace ID, and cannot support arbitrary string IDs without substantial changes to the backend and client libs. The way I would handle incoming string ID is by recording it as a correlation ID tag.

But it's not about Jaeger, it's about inclusiveness of the spec. Maybe we should start with the goals of this spec, what exactly it is trying to achieve. There are different possible interop modes.

codefromthecrypt · 2017-09-23T01:31:56Z

correlation by saving a string to a database allows offline correlation, but breaks propagation interop, which is the primary goal of the project. I think there certainly is space to define what we are defining by "context" when we say this, but we have defined it! So, my suggestion is that when we invalidate things, we provide for how to retain the charter while doing this. If the charter is passing trace context, and we accept incompatible data, we can't punt the responsibility to keep interop working in some fashion. IOTW hope is not a specification! For example, a lot of this tension is about a single header. We can in fact define means by which you can pass incompatible format while still continuing the trace. Otherwise, the spec grows to a correlation algorithm, right? A trace context header is used to pass trace context information across systems for a HTTP request. Our goal is to share this with the community so that various tracing and diagnostics products can operate together, and so that services can pass context through them, even if they're not being traced (useful for load balancers, etc.)

yurishkuro · 2017-09-23T01:34:38Z

can we define what the charter is? it's not stated anywhere in the repository, hence the confusion.

codefromthecrypt · 2017-09-23T01:38:40Z

The goal is mentioned here, we can top-level it, good point https://github.com/TraceContext/tracecontext-spec/blob/master/HTTP_HEADER_FORMAT.md#trace-context-http-header-format

yurishkuro · 2017-09-23T01:47:58Z

Let's discuss the charter in #17

codefromthecrypt · 2017-09-23T01:50:41Z

copied missing data in #18

wu-sheng · 2017-09-23T02:09:40Z

Hi, sheng. do you need to remove both constraints or just one (length)?

Skywalking id is long1.long2.long3. The dot is not part of base16 and long is not the certain length literally(After encoding, sure, it can be). So if the both constraints removed, skywalking will be no need to add new head, just need to separate the current sw3-head into two parts. This is my best choice.

And as you known, skywalking has a way to resolve this: support this by using a new head for interop and do not change the sw3-head. In that way, cost of encoding and size of http package will change, right? Maybe not big, but for a very high throughputs app, like some 10K tps application in Chinese e-commerce and telecom systems (large population -> large system...), everything changed, and I have to treat all perf related risks very carefully. Btw, these users have such high throughputs, but can't accept sampling... So you can image what is the situation skywalking facing......

codefromthecrypt · 2017-09-23T02:19:14Z

Skywalking id is long1.long2.long3. The dot is not part of base16 and long is not the certain length literally(After encoding, sure, it can be). So if the both constraints removed, skywalking will be no need to add new head, just need to separate the current sw3-head into two parts. This is my best choice.

OK, what would you do if you were given the following ID also in the trace-context header trace-context: Para bailar la bamba Para bailar la bamba se necesita una poca de gracia Una poca de gracia pa' mi pa' ti y arriba y arriba Ah y arriba y arriba por ti seré, por ti seré, por ti seré-Bamba bamba-1 What would you propagate in order to continue that trace? (intentionally being funny, but seriously we need to answer things like this.. usually we think of our own IDs in isolation, but that's not the case. Remember henry from china telecom who had 5 trace systems to propagate to?) https://twitter.com/henrychenyong/status/901738274632425473

wu-sheng · 2017-09-23T04:04:13Z

What would you propagate in order to continue that trace?

Besides the context spec, which your guys used, I can introduce two use cases of many tracing system do to propagate the context.

First thing first, the reason of context includes more fieldsthan yours, is not about continue trace, they're about analysis, e.g the application relationship, the service relationship. And you can see if we run the analysis after all trace segments(spans) finished, it's clearly no need for these extra fields, but the analysis latency is very long, and cost more resources in read/write or memory at the same time.

And most of tracing systems use UUID or timestamp-trusted-traceid(skywalking is a kine of implementation).

So this is user case <1>, we propagate parent application code(id), parent service(operation) name etc.

And many tracing system implementation, they use the same span.id style, I called it EagleEye span-id style. EagleEye is an alibaba OSS system, they shared their implementations, so many people use it. The id is like these: 1.0, 1.1, 1.1.0, you can understand easily, 1.1.0-span is the first child of 1.1-span.

That is user case <2>.

@adriancole So, you can see, in order to fit this spec, they will do a lot of works. Most of them are about encoding and length...

codefromthecrypt · 2017-09-23T05:18:56Z

Sheng I think you are explaining that different tracing systems have different existing encodings and lengths. I don't think that is under debate. Clearly there are many many existing encodings and many of which cannot be substituted for others the same way you cannot place the contents of the http header Date into the header Cookie and expect it to work. Maybe if I ask another way. How would you process the ID of another system you mentioned even if you are only doing offline correlation? What would you do with trace context: acbdef that was given to you? What would you propagate downstream? Would you replace the header with yours, would you restart a trace? or would you be ok halting all tracing due to this unexpected value? Even in discussions of B3, until you are an intermediary it is easy to not think things through. For example, if you propagated through a tracing system Y and they disagreed with your format then sent a request downstream to a skywalking system, this would break your trace in the same way as it would break without this spec, wouldnt it? It seems that if in absence of this spec folks can figure out ways to coexist with B3 we might be able to do at least the same here..

wu-sheng · 2017-09-23T06:41:20Z

IMO, I think for a correlation id, included trace-id and span-id of course, skywalking saved these two keys(only care max length, not use a separated key, and ascii encoding), generate a new trace. In fact even for sw3-head, we generate a new one too. The difference between these two is,

for sw3, generate a trace-ref;
for other system, generate correlation(tags) values.

User can query trace by correlation trace-id through skywalkingUI. And the can see the correlation values if this trace has correlation ids.

And for more, based on this spec, I hope all participated tracing systems, can provide a standard query condition, e.g. http:/xxx:xx/zipkin/trace?tid=ttttt&spanId=sssss. So I submit an issue about vendor id. After have that, the tracing system can config the mapping from vendor to url(e.g. http:/xxx:xx/zipkin/trace for zipkin) ref #14 .

This can really help users to take benefits from tracing system interop. I think that matters.

codefromthecrypt · 2017-09-23T07:17:13Z

Sheng, so what we have the ability to influence here is headers, and we've mentioned trace-context as the scope of what this header is. let's step back a sec. In concrete terms, a propagation system needs to push the context from one side to another. If only doing one-time log correlation, you could just accept the incoming trace ID write to your external system, and be done. If wanting to allocate these things to an operation, you'd also need a span ID. The trick is that the latter changes, it will change possibly many times per hop. So when we define a propagation spec, we do need to indicate what happens when the span ID changes, similar to how the Date header is defined as the time a message originated, and placing 2029 as a date is unexpected, the span part should ideally represent something. In the way we've discussed this so far, that span part is the previous remote operation. So, if we remove the formatting constraint, here's what happens to propagation. 1. we can still pass the context from one side of the process to the other 2. we can tokenize IDs and log and/or add tags for out-of-band processing However, we no longer have a fixed contract of any kind with trace implementations. Does this matter? 1. when allocating a remote parent to a new child, we could imply all tracing systems support linking (to establish a relationship with a too wide ID). This has a cost to it 2. when sending a context to a remote party, we could send too little info, requiring again linking or filling or something else. 3. the trace id itself may not fit in either side, eliminating the ability to easily correlate. Some may prefer to overwrite the trace ID into something they can understand. Basically, I question the value of these lookups. Do you feel otherwise? Sometimes I wish we would just make test cases as it takes hours to discuss in abstract things that are obvious in implementation.

yurishkuro · 2017-09-26T02:45:15Z

I want to clarify what this issue was about. One aspect of it was about variable-length IDs, which is currently discussed in @bhs's PR #19. The other aspect was about the encoding. For example, suppose some tracing system generates trace-ids in the form {serviceName}-{randomBytes} (@wu-sheng mentioned something like that, only with numbers). We could support it if we allowed arbitrary strings, but there are two problems

we'd still need to talk about some encoding since we might want to reserve - as a trace/span id separator
such trace-id scheme would hardly work if the system that uses it was on the receiving side, where it might get an opaque hexadecimal string

Does that mean that the only acceptable un-encoded representation of a trace-id is opaque byte array? I.e. no system receiving Trace-Context header should attempt to interpret the parts of that byte array since it has no guarantee it's a value compatible with its internal semantics (like two longs in Skywalking).

bogdandrutu · 2017-09-26T15:43:32Z

@yurishkuro I think you are talking in the same issue about completely different problems. I am not saying that they not need to be discussed but the initial issue was about removing "hexadecimal" restriction.

yurishkuro · 2018-03-27T23:03:07Z

Nobody seems to object to the existing fix-length hex encoding, so closing this.

yurishkuro mentioned this issue Sep 22, 2017

Suggest Prefix encoding instead of a version number #15

Closed

SergeyKanzhelev mentioned this issue Sep 25, 2017

Variable-length IDs #19

Closed

mtwo added this to the Release 0.1.0 milestone Oct 21, 2017

yurishkuro closed this as completed Mar 27, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove "hexadecimal" restriction #16

Remove "hexadecimal" restriction #16

yurishkuro commented Sep 22, 2017

bogdandrutu commented Sep 22, 2017

yurishkuro commented Sep 22, 2017

codefromthecrypt commented Sep 23, 2017 via email

wu-sheng commented Sep 23, 2017

wu-sheng commented Sep 23, 2017

codefromthecrypt commented Sep 23, 2017 via email

codefromthecrypt commented Sep 23, 2017 via email

yurishkuro commented Sep 23, 2017

codefromthecrypt commented Sep 23, 2017 via email

yurishkuro commented Sep 23, 2017

codefromthecrypt commented Sep 23, 2017 via email

yurishkuro commented Sep 23, 2017 •

edited

Loading

codefromthecrypt commented Sep 23, 2017 via email

wu-sheng commented Sep 23, 2017

codefromthecrypt commented Sep 23, 2017 via email

wu-sheng commented Sep 23, 2017

codefromthecrypt commented Sep 23, 2017 via email

wu-sheng commented Sep 23, 2017

codefromthecrypt commented Sep 23, 2017 via email

yurishkuro commented Sep 26, 2017

bogdandrutu commented Sep 26, 2017

yurishkuro commented Mar 27, 2018

Remove "hexadecimal" restriction #16

Remove "hexadecimal" restriction #16

Comments

yurishkuro commented Sep 22, 2017

bogdandrutu commented Sep 22, 2017

yurishkuro commented Sep 22, 2017

codefromthecrypt commented Sep 23, 2017 via email

wu-sheng commented Sep 23, 2017

wu-sheng commented Sep 23, 2017

codefromthecrypt commented Sep 23, 2017 via email

codefromthecrypt commented Sep 23, 2017 via email

yurishkuro commented Sep 23, 2017

codefromthecrypt commented Sep 23, 2017 via email

yurishkuro commented Sep 23, 2017

codefromthecrypt commented Sep 23, 2017 via email

yurishkuro commented Sep 23, 2017 • edited Loading

codefromthecrypt commented Sep 23, 2017 via email

wu-sheng commented Sep 23, 2017

codefromthecrypt commented Sep 23, 2017 via email

wu-sheng commented Sep 23, 2017

codefromthecrypt commented Sep 23, 2017 via email

wu-sheng commented Sep 23, 2017

codefromthecrypt commented Sep 23, 2017 via email

yurishkuro commented Sep 26, 2017

bogdandrutu commented Sep 26, 2017

yurishkuro commented Mar 27, 2018

yurishkuro commented Sep 23, 2017 •

edited

Loading