-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do we need sampling to be more than one bit in trace-parent to define priority #8
Comments
Overall I like the direction of this project. I think there would be significant value in keeping the core context propagation simple, but allow it to be extended for additional details that aren't central to maintaining a consistent trace flow. Examples that have been brought up are:
I think being able to handle these kind of things should be in the scope of this spec, but I am open to discussion on whether that is handled in one header or a separate standardized header (or set of headers). |
I would like to keep this header as minimal as possible, with common things that are shared between multiple companies/implementations. If we achieve consensus to send trace_id/span_id/options in a common header each vendor can use a different vendor for their specific things and use this for these values. This will guarantee them that if on the other end they are talking for example with a cloud provider like Google or Azure tracing can be used to tie client requests with internal cloud requests for the moment. |
The problem I have with saying the spec will only help maintain context of the common shared info is it creates a huge barrier for vendors who need to pass along something not in the common area. They basically have to recreate all the functionality for maintaining the context in each transport type themselves. This is why I proposed including a header for both the efficient standardized common info, and the flexible vendor specific context data. |
@tylerbenson could you please elaborate? I am not sure I follow why it creates a "huge barrier" for vendors. The way I see it, a trace context is a combination of some metadata that can be standardized on, like the trace/span IDs in this spec, some baggage (which I think the spec also needs to standardize on), and vendor-specific metadata. Nothing prevents vendors from sending additional headers with their custom metadata, but if the request crosses the boundary between services instrumented with different tracing vendors, the common standard metadata can still be understood due to this spec, and vendor-specific metadata will be lost, which is unavoidable since you're entering a different vendor space anyway. If your concern is with a case of A->B->C service calls where A and C are using tracing X and B uses tracing Y, and you want vendor-specific data to reach C, then I think this can be done by placing that metadata into baggage (which is why it's important to standardize on). |
@yurishkuro +1 I started a new issue with open-questions about baggages/tags #13 |
@yurishkuro my question... will vendors have to come up with their own serialization and propagation for vendor specific metadata (do I need to get the request object, pull my info from the header, etc), or will the spec define how and where vendors should store data and provide ways for interaction with that standardized format via an API? Perhaps vendors could just use the existing "baggage" for this purpose? |
@tylerbenson this spec is only concerned with describing how the context looks on the wire, it has no opinion on how implementations get it into that format. A specific example of the format is here: #13 (comment). The point of baggage is that implementations propagate it "no questions asked", so if say your implementation needs to propagate some additional (stable per trace) metadata like sampling rate, you can add it as one of the baggage items and be reasonably sure it will come back to you. That's assuming you have a case where services A1..An and C1..Cn are instrumented with tracing system X and service B1...Bn are instrumented with tracing system Y and the flow is Ai->Bj->Ck. For interactions Ai->Aj you don't even need this Spec as you can pass your proprietary metadata via any custom header, but it will get lost when you go Ai->Bj. One danger with baggage is key clashes, we should also discuss some name spacing technique (I'll comment on #13). |
During the tracing workshop in Sydney, we discussed making sampling status a mandatory part of the trace options. Sampling recommendation as a tri-state value of deferred, true or false seems reasonable to all parties (allocated of 2 bits in the bitset). This included everyone present at the workshop including Google, Amazon, New Relic, DataDog, Dynatrace, Bas, Mick, Adrian and folks at Tyro. |
Does this one still matter after our latest decisions on trace-parent and trace-state? This should be a vendor-specific problem now. |
@AloisReitbauer this question was more like "is a single sampling flag in I think we can keep it as it for now. I'm ok to close this issue so it can be re-opened later if needed |
The way I read this issue is it's primarily about extending the sampled bit with extra data like sampling priority (not the sampling probability!), e.g. indicating not just the upstream's desire to sample but the degree of sampling such as light-weight vs. heavy profiling. I'd be interested to hear if any existing tracers actually do this, as quite a few tracers I'm familiar with don't support this. |
Here with @tylerbenson and we need to clarify at least if the options field should be treated as a bitset or not. For example, if so, we can reserve the least significant bit for the purpose it has now, but also allow other options to combine with it. For example '3' would also be sampled. |
@adriancole trace options is already described as bitset. Does your question read as "can tracing system decide to use one of the flags for itself"? I.e. make it free for grabs and interpretation? Or you want to specify another bit purpose in specification for everybody to follow? |
I added this to clarify that trace-options is a bit field. It is mostly correct just doesn't introduce it as a bit field. Experience in X-B3-Flags suggest programmers routinely get bit fields wrong so added some advice as well #116 |
There are two parts:
If the latter is true, then folks like datadog need to encode their extra bit into a custom format. Otherwise they could fit into the generic one. |
which extra bit do they have? |
one bit extra is very similar to zipkin debug flag. basically it emphasizes
not only to sample, but especially try to keep it when dropping others
…On Sat, 21 Apr 2018, 11:10 Yuri Shkuro, ***@***.***> wrote:
folks like datadog need to encode their extra bit
which extra bit do they have?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAD614oBd2a8nB9y0h3ybpUh--kAuK5uks5tqqMKgaJpZM4PRiFF>
.
|
The difference I understand between |
It is very beneficial to share the sampling approach among components. Even with different tracers. So respecting bit flag and having it in the main spec is a goodness. For systems where you are paying for telemetry per component a single flag approach will not work unless those components will share extra hint on sampling priority. A good solution is to propagate "sampling score" with the trace (more than a bit). It will be useful to standardize on sampling score format so it can be shared between different vendors. So it calls for Do you know of anybody propagating sampling configuration alongside the trace. So gain ups may be calculated in the downstream components correctly? |
Tyler could speak more on DD but yesterday he was quite clear on all of
this being recommendation because the agent or otherwise may still choose
to override the "priority" decision and drop data.
|
* Clarifies that trace options is a bit field See #8 (comment) * simplify sentence
From Instana point of view, we support non-binary sampling / trace details internally, but we never really needed that. |
|
Sorry for the delay, but to be clear, I wasn't suggesting we add the additional (vendor specific) flag into the bitfield. We should be able to include that added information in tracestate. |
Two flags are defined now that may control sampling. First indicate whether recording was requested and another indicates whether parent recorded this span. Closing this issue. Let's re-open issues for specific concerns. I believe original concerns were addressed. |
The first draft of the http header spec had 140 comments in various areas. Not all came to closure. One that remained was if we do anything about vendor-specific sampling priority data, or relegate that as a stakeholder in the key/value aka baggage functionality.
Here's background from the original pull request (#1)
One line of comment started from @bhs
This discussion was about flags and why we had initially 4bytes allocated, more room implied problems with interpretation such as endianness. Bogdan had reserved extra space as some proposed they needed room for sampling. This was reduced to a single byte and a question raised about if we should extend if for sampling priority. This particular line of comment went quiet.
Another started afterwards from @SergeyKanzhelev
Sergey suggested that since sampling implementation would be vendor-specific, why not use a separate header? For example, not all vendors do consistent sampling per trace (ex some do sampling again at span level). Yuri mentioned benefits of a consistent trace-level algo. Sergey agreed with this and also that a sample bit as a force-trace operation is useful. However, he believed this should be a "debug" flag, not a sampling one. He suggests the sampling info should be vendor-specific.
After some chatter from Adrian, @SergeyKanzhelev maintained "I still think it would make sense to detach sampling/debug flag from the identity information."
The last comment on this sampling was from @bhs, when mentioned that one can stash the sampling rate (or reciprocal sampling rate) if arbitrary key/value propagated tags were supported somehow, and that presence of a sampled bit could be ignored/overlooked when they don't match a model.
The text was updated successfully, but these errors were encountered: