Do we need sampling to be more than one bit in trace-parent to define priority #8

codefromthecrypt · 2017-09-08T18:25:26Z

The first draft of the http header spec had 140 comments in various areas. Not all came to closure. One that remained was if we do anything about vendor-specific sampling priority data, or relegate that as a stakeholder in the key/value aka baggage functionality.

Here's background from the original pull request (#1)

One line of comment started from @bhs
This discussion was about flags and why we had initially 4bytes allocated, more room implied problems with interpretation such as endianness. Bogdan had reserved extra space as some proposed they needed room for sampling. This was reduced to a single byte and a question raised about if we should extend if for sampling priority. This particular line of comment went quiet.

Another started afterwards from @SergeyKanzhelev
Sergey suggested that since sampling implementation would be vendor-specific, why not use a separate header? For example, not all vendors do consistent sampling per trace (ex some do sampling again at span level). Yuri mentioned benefits of a consistent trace-level algo. Sergey agreed with this and also that a sample bit as a force-trace operation is useful. However, he believed this should be a "debug" flag, not a sampling one. He suggests the sampling info should be vendor-specific.

After some chatter from Adrian, @SergeyKanzhelev maintained "I still think it would make sense to detach sampling/debug flag from the identity information."

The last comment on this sampling was from @bhs, when mentioned that one can stash the sampling rate (or reciprocal sampling rate) if arbitrary key/value propagated tags were supported somehow, and that presence of a sampled bit could be ignored/overlooked when they don't match a model.

tylerbenson · 2017-09-08T19:07:32Z

Overall I like the direction of this project. I think there would be significant value in keeping the core context propagation simple, but allow it to be extended for additional details that aren't central to maintaining a consistent trace flow. Examples that have been brought up are:

a debug flag
sampling flag (currently in the main header)
sampling priority
other vendor specific things (account scoping might be an example)

I think being able to handle these kind of things should be in the scope of this spec, but I am open to discussion on whether that is handled in one header or a separate standardized header (or set of headers).

bogdandrutu · 2017-09-15T20:39:56Z

I would like to keep this header as minimal as possible, with common things that are shared between multiple companies/implementations. If we achieve consensus to send trace_id/span_id/options in a common header each vendor can use a different vendor for their specific things and use this for these values. This will guarantee them that if on the other end they are talking for example with a cloud provider like Google or Azure tracing can be used to tie client requests with internal cloud requests for the moment.

tylerbenson · 2017-09-16T15:57:46Z

The problem I have with saying the spec will only help maintain context of the common shared info is it creates a huge barrier for vendors who need to pass along something not in the common area. They basically have to recreate all the functionality for maintaining the context in each transport type themselves. This is why I proposed including a header for both the efficient standardized common info, and the flexible vendor specific context data.

yurishkuro · 2017-09-16T16:08:03Z

@tylerbenson could you please elaborate? I am not sure I follow why it creates a "huge barrier" for vendors. The way I see it, a trace context is a combination of some metadata that can be standardized on, like the trace/span IDs in this spec, some baggage (which I think the spec also needs to standardize on), and vendor-specific metadata. Nothing prevents vendors from sending additional headers with their custom metadata, but if the request crosses the boundary between services instrumented with different tracing vendors, the common standard metadata can still be understood due to this spec, and vendor-specific metadata will be lost, which is unavoidable since you're entering a different vendor space anyway.

If your concern is with a case of A->B->C service calls where A and C are using tracing X and B uses tracing Y, and you want vendor-specific data to reach C, then I think this can be done by placing that metadata into baggage (which is why it's important to standardize on).

bogdandrutu · 2017-09-16T16:27:12Z

@yurishkuro +1

I started a new issue with open-questions about baggages/tags #13

tylerbenson · 2017-09-18T15:51:02Z

@yurishkuro my question... will vendors have to come up with their own serialization and propagation for vendor specific metadata (do I need to get the request object, pull my info from the header, etc), or will the spec define how and where vendors should store data and provide ways for interaction with that standardized format via an API? Perhaps vendors could just use the existing "baggage" for this purpose?

yurishkuro · 2017-09-18T18:25:49Z

@tylerbenson this spec is only concerned with describing how the context looks on the wire, it has no opinion on how implementations get it into that format. A specific example of the format is here: #13 (comment). The point of baggage is that implementations propagate it "no questions asked", so if say your implementation needs to propagate some additional (stable per trace) metadata like sampling rate, you can add it as one of the baggage items and be reasonably sure it will come back to you. That's assuming you have a case where services A1..An and C1..Cn are instrumented with tracing system X and service B1...Bn are instrumented with tracing system Y and the flow is Ai->Bj->Ck. For interactions Ai->Aj you don't even need this Spec as you can pass your proprietary metadata via any custom header, but it will get lost when you go Ai->Bj.

One danger with baggage is key clashes, we should also discuss some name spacing technique (I'll comment on #13).

codefromthecrypt · 2017-11-02T07:50:40Z

During the tracing workshop in Sydney, we discussed making sampling status a mandatory part of the trace options. Sampling recommendation as a tri-state value of deferred, true or false seems reasonable to all parties (allocated of 2 bits in the bitset). This included everyone present at the workshop including Google, Amazon, New Relic, DataDog, Dynatrace, Bas, Mick, Adrian and folks at Tyro.

AloisReitbauer · 2018-03-26T13:21:08Z

Does this one still matter after our latest decisions on trace-parent and trace-state? This should be a vendor-specific problem now.

SergeyKanzhelev · 2018-03-26T16:10:18Z

@AloisReitbauer this question was more like "is a single sampling flag in traceparent is too little or too much"? On one hand we do not need this flag as every vendor free to decide by herself. On other hand - if we putting it there should be put more generic number that allows more values?

I think we can keep it as it for now. I'm ok to close this issue so it can be re-opened later if needed

yurishkuro · 2018-03-26T16:27:00Z

The way I read this issue is it's primarily about extending the sampled bit with extra data like sampling priority (not the sampling probability!), e.g. indicating not just the upstream's desire to sample but the degree of sampling such as light-weight vs. heavy profiling. I'd be interested to hear if any existing tracers actually do this, as quite a few tracers I'm familiar with don't support this.

codefromthecrypt · 2018-04-20T03:12:35Z

Here with @tylerbenson and we need to clarify at least if the options field should be treated as a bitset or not. For example, if so, we can reserve the least significant bit for the purpose it has now, but also allow other options to combine with it. For example '3' would also be sampled.

SergeyKanzhelev · 2018-04-20T16:20:56Z

@adriancole trace options is already described as bitset. Does your question read as "can tracing system decide to use one of the flags for itself"? I.e. make it free for grabs and interpretation? Or you want to specify another bit purpose in specification for everybody to follow?

See #8 (comment)

codefromthecrypt · 2018-04-21T01:22:11Z

I added this to clarify that trace-options is a bit field. It is mostly correct just doesn't introduce it as a bit field. Experience in X-B3-Flags suggest programmers routinely get bit fields wrong so added some advice as well #116

codefromthecrypt · 2018-04-21T01:23:35Z

There are two parts:

can bit fields can be added without a change to the version?
are all bit fields reserved for the specification?

If the latter is true, then folks like datadog need to encode their extra bit into a custom format. Otherwise they could fit into the generic one.

yurishkuro · 2018-04-21T03:10:02Z

folks like datadog need to encode their extra bit

which extra bit do they have?

codefromthecrypt · 2018-04-21T04:12:01Z

one bit extra is very similar to zipkin debug flag. basically it emphasizes not only to sample, but especially try to keep it when dropping others

…

On Sat, 21 Apr 2018, 11:10 Yuri Shkuro, ***@***.***> wrote: folks like datadog need to encode their extra bit which extra bit do they have? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#8 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAD614oBd2a8nB9y0h3ybpUh--kAuK5uks5tqqMKgaJpZM4PRiFF> .

SergeyKanzhelev · 2018-04-21T04:32:18Z

The difference I understand between sampling and debug flags is the need to gain up. When sampling is set and you collected the span you need to gain up values if you calculate any aggregation function based on this span. Like average duration. If you collected because of a debug flag - you count it as a single request. Is it the same for datadog? Or it's simply force instead of recommendation?

SergeyKanzhelev · 2018-04-21T04:53:50Z

It is very beneficial to share the sampling approach among components. Even with different tracers. So respecting bit flag and having it in the main spec is a goodness.

For systems where you are paying for telemetry per component a single flag approach will not work unless those components will share extra hint on sampling priority. A good solution is to propagate "sampling score" with the trace (more than a bit).

It will be useful to standardize on sampling score format so it can be shared between different vendors. So it calls for tracestate entry that is a property of trace, not a vendor-specific position in a graph.

Do you know of anybody propagating sampling configuration alongside the trace. So gain ups may be calculated in the downstream components correctly?

codefromthecrypt · 2018-04-21T05:01:03Z

Tyler could speak more on DD but yesterday he was quite clear on all of this being recommendation because the agent or otherwise may still choose to override the "priority" decision and drop data.

* Clarifies that trace options is a bit field See #8 (comment) * simplify sentence

CodingFabian · 2018-05-02T08:20:03Z

From Instana point of view, we support non-binary sampling / trace details internally, but we never really needed that.
if this would be part of traceparent, the semantics would need to be "clear" across vendors, which seems ambitious.
So our stance right now is not to have a non-binary sampling/priority flag.

beberlei · 2018-05-02T08:23:32Z

is this interesting? A binary flag could say “we want this more than other traces” like in Zipkin mapping to the debug flag. Question: Do we need more bits to encode more information? From API side in OT: Often questions about how to interpret this information by users. Adding this would require all implementers to at least pass it on. Vendors would probably just pass on and ignore and transport this information in their TraceState. Security reasons were added, because it could be used for DoS. Two understanding, not resolved for now.

tylerbenson · 2018-05-29T06:22:45Z

Sorry for the delay, but to be clear, I wasn't suggesting we add the additional (vendor specific) flag into the bitfield. We should be able to include that added information in tracestate.

SergeyKanzhelev · 2018-08-07T18:58:46Z

Two flags are defined now that may control sampling. First indicate whether recording was requested and another indicates whether parent recorded this span. Closing this issue. Let's re-open issues for specific concerns. I believe original concerns were addressed.

codefromthecrypt mentioned this issue Sep 21, 2017

Make TraceOptions optional. #12

Merged

codefromthecrypt pushed a commit that referenced this issue Apr 21, 2018

Clarifies that trace options is a bit field

9f8534d

See #8 (comment)

codefromthecrypt mentioned this issue Apr 21, 2018

Clarifies that trace options is a bit field #116

Merged

codefromthecrypt mentioned this issue Apr 22, 2018

b3 single header format openzipkin/b3-propagation#21

Closed

SergeyKanzhelev mentioned this issue Apr 23, 2018

Allow to mutate mutliple keys of tracestate #117

Closed

SergeyKanzhelev pushed a commit that referenced this issue Apr 24, 2018

Clarifies that trace options is a bit field (#116)

59c0089

* Clarifies that trace options is a bit field See #8 (comment) * simplify sentence

AloisReitbauer added the trace-context label May 2, 2018

AloisReitbauer changed the title ~~If or how we address sampling priority~~ Do we need sampling to be more than one bit in trace-parent to define priority May 2, 2018

SergeyKanzhelev added this to the 3. FPWD milestone Aug 6, 2018

SergeyKanzhelev closed this as completed Aug 7, 2018

codefromthecrypt mentioned this issue Feb 23, 2024

Add a property to disable HTTP Observations for actuator endpoints spring-projects/spring-boot#34801

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do we need sampling to be more than one bit in trace-parent to define priority #8

Do we need sampling to be more than one bit in trace-parent to define priority #8

codefromthecrypt commented Sep 8, 2017 •

edited

Loading

tylerbenson commented Sep 8, 2017 •

edited

Loading

bogdandrutu commented Sep 15, 2017

tylerbenson commented Sep 16, 2017

yurishkuro commented Sep 16, 2017 •

edited

Loading

bogdandrutu commented Sep 16, 2017

tylerbenson commented Sep 18, 2017

yurishkuro commented Sep 18, 2017

codefromthecrypt commented Nov 2, 2017

AloisReitbauer commented Mar 26, 2018

SergeyKanzhelev commented Mar 26, 2018

yurishkuro commented Mar 26, 2018

codefromthecrypt commented Apr 20, 2018

SergeyKanzhelev commented Apr 20, 2018

codefromthecrypt commented Apr 21, 2018

codefromthecrypt commented Apr 21, 2018

yurishkuro commented Apr 21, 2018

codefromthecrypt commented Apr 21, 2018 via email

SergeyKanzhelev commented Apr 21, 2018

SergeyKanzhelev commented Apr 21, 2018

codefromthecrypt commented Apr 21, 2018 via email

CodingFabian commented May 2, 2018

beberlei commented May 2, 2018

tylerbenson commented May 29, 2018

SergeyKanzhelev commented Aug 7, 2018

Do we need sampling to be more than one bit in trace-parent to define priority #8

Do we need sampling to be more than one bit in trace-parent to define priority #8

Comments

codefromthecrypt commented Sep 8, 2017 • edited Loading

tylerbenson commented Sep 8, 2017 • edited Loading

bogdandrutu commented Sep 15, 2017

tylerbenson commented Sep 16, 2017

yurishkuro commented Sep 16, 2017 • edited Loading

bogdandrutu commented Sep 16, 2017

tylerbenson commented Sep 18, 2017

yurishkuro commented Sep 18, 2017

codefromthecrypt commented Nov 2, 2017

AloisReitbauer commented Mar 26, 2018

SergeyKanzhelev commented Mar 26, 2018

yurishkuro commented Mar 26, 2018

codefromthecrypt commented Apr 20, 2018

SergeyKanzhelev commented Apr 20, 2018

codefromthecrypt commented Apr 21, 2018

codefromthecrypt commented Apr 21, 2018

yurishkuro commented Apr 21, 2018

codefromthecrypt commented Apr 21, 2018 via email

SergeyKanzhelev commented Apr 21, 2018

SergeyKanzhelev commented Apr 21, 2018

codefromthecrypt commented Apr 21, 2018 via email

CodingFabian commented May 2, 2018

beberlei commented May 2, 2018

tylerbenson commented May 29, 2018

SergeyKanzhelev commented Aug 7, 2018

codefromthecrypt commented Sep 8, 2017 •

edited

Loading

tylerbenson commented Sep 8, 2017 •

edited

Loading

yurishkuro commented Sep 16, 2017 •

edited

Loading