-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
b3 single header format #21
Comments
cc @openzipkin/core @openzipkin/instrumentation-owners for feedback |
rewrote the description with more examples |
Notes on implementation once we settle on things, assuming we decide to introduce a "b3" header in addition to the same value in tracestate format.
At least looking at brave, this would be easy because there is one codebase parsing headers. A quick check for a "b3" header would be easy to to do, and be contained in B3 code, which would be independent of any other propagation code anyway. Libraries who have hard-coding around b3 might have more impact. For this reason a tracking issue would need to occur, and similar to our other things such as 128-bit trace ID, we'd expect a long time before all libraries will support this. |
posted an email to alert those not watching this repo https://groups.google.com/forum/#!topic/zipkin-dev/EdZjqHXuXsg |
+1 |
+1 definitely I am positive and agree with putting the These will make much easier for me to start a SkyWalking new feature, which could generate a new ref in EntrySpan, maybe named |
+1 on the separate sampled, debug vs flags I'm indifferent. I see pros and cons on each |
This is great. Would this new header eventually be considered the preferred propagation format over the existing headers? Would it be reasonable and possible to just use this header in an ecosystem where all applications are using updated tracing libraries? |
glad you like it. this has a two-role purpose:
1. answering the single-field problem that's plagued us for years, most
notably in JMS where hyphens don't work. "b3" or "B3" is far simpler to get
working than multiplied by several headers and several case formats.
2. having a state definition for the eventual w3c format, when/if gets to
that.
implementation wise:
I would code this up in brave for example, to show how it works.. basically
look for "b3" first, then fallback to legacy headers. Have a choice of
double-encode on the way out or choose one or the other. In a green field,
yeah "b3" would make more sense once we have tracked it through a quorum of
libraries.
|
+1 I'm for this, but I have two concerns:
|
valid points and good feedback. thanks!
…On Mon, 30 Apr 2018, 20:31 Ben Plotnick, ***@***.***> wrote:
+1
I'm for this, but I have two concerns:
1.
I'd like to make sure that we have clarifying examples of when some
fields can be missing and when they can't. e.g. "Optional fields MUST
appear in the correct position. e.g. If x-b3-parentspanid is present,
then x-b3-sampled must also be present"
2.
By supporting a new header, we'll eventually be supporting three
header formats instead of two: X-B3-*, b3, and trace-state/trace-parent.
This imposes a slightly larger burden for brown fields. i don't think that
it is much, but it's more than just frameworks. Proxies, load balancers,
logging frameworks, etc. may need to handle them as well.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#21 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAD61_87LOUGrlKk0RLEKmxaSyJjgeyMks5tt1hmgaJpZM4TerHT>
.
|
This looks good, although I'm unsure about how to handle some corner case scenarios. For example, what would I do if I want to specify "no decision" on sampling, but also want to specify parent ID? Or your Maybe that's a potential solution for the other issue as well...to specify "no decision" sampling but include a parent ID: |
I guess my concern is very close to @bplotnick's concern I'm also happy if someone thinks of another solution that allows these use cases. |
the parent but no decision is something that is an antipattern (as it can
cause very inconsistent trace down the line). the use case for no decision
is typically just provisioning id for the root span. I understand this
doesnt mean people don't do this! Do you on purpose? if so why? regardless
we can choose to handle it at some mapping cost..
one nice thing about parent after sampled is that it is easy to tell that
the parent is not the span id and also the most used fields are first.
one way out is to be like Amazon and accept ? as not yet sampled decision
for example sampled field is 1 0 or ?
this way it is always present. wdyt?
thanks for commenting
|
Ah ok, I guess if it's an antipattern that lessens the concern for that use case (it's not something that we do, just looking for corner cases). The I think I like the I don't think sampled should necessarily be required. I do like putting parent ID after sampled to help reduce confusion with span ID, and ordering fields based on importance. Makes sense. 👍 |
Thanks again for taking time away from things to give feedback. Really helps
@llinder do any of your "interesting channels" would they have a problem
with there was a '?' in the header value? Only drawback to '?' I could see
is around this part, not headers, but encoding in some other place.
…On Thu, May 3, 2018 at 6:59 AM, Nic Munroe ***@***.***> wrote:
Ah ok, I guess if it's an antipattern that lessens the concern for that
use case (it's not something that we do, just looking for corner cases).
The force trace on a root span example still nags at me though - I don't
like the idea of needing to have branching logic on the fourth position to
figure out if it's parent ID or debug flag, and it's ultimately the same
issue of wanting to omit an earlier optional field while specifying a later
one.
I think I like the ? idea best of all, and it could work for any of the
optional fields. Having a ? for any of the optional fields could be
equivalent to omitting that optional field. That would then allow you to
specify the positional nature of *all* the fields in a consistent way, no
exceptions (third position is *always* sampled, fourth position is
*always* parent ID, fifth position is *always* flags). If the positions
aren't consistent then you'll have to add caveat wording anytime you
mention parent ID or flags, otherwise IMO people will miss the part of the
header definition that calls it out and we'll get inconsistent handling.
I don't think sampled should necessarily be required.
I do like putting parent ID after sampled to help reduce confusion with
span ID, and ordering fields based on importance. Makes sense. 👍
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#21 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAD618OgA_6m7tlaNHwOXxAIKJ7t8_1Hks5tuo6ZgaJpZM4TerHT>
.
|
I will implement this in Brave as an optional feature. However, it would be used by default in JMS. |
In existing practice, it is ok to go without a trace ID, if sampling or debug is set. This was highlighted by @narayaruna at netflix as they have concerns with overhead of ID generation to just propagate a "don't sample" decision. This is also quite important for some messaging use cases. To make this fully portable, we'll need to accept the following special cases:
(again it is invalid to say debug and not sampled, so we won't address 0-1) |
This will be used for JMS and other propagation formats such as w3c tracestate entry. It is notably more efficient than multi-header. Example header: `b3: 4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-1` To aid in migration, this teaches the normal B3 to attempt to extract single header variants as well. See openzipkin/b3-propagation#21
ok first impl here openzipkin/brave#763 |
This will be used for JMS and other propagation formats such as w3c tracestate entry. It is notably more efficient than multi-header. Example header: `b3: 4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-1` To aid in migration, this teaches the normal B3 to attempt to extract single header variants as well. See openzipkin/b3-propagation#21
One clarification on the parentId being the strange field.. We'd expect messaging propagation to not send parentId at all. RPC spans share span IDs, but messaging spans always fork a new ID (for consumption of message). The parentId of the caller isn't read ever for messaging spans.. it is pure overhead. Plus messaging is the most sensitive to overhead. Long story short is that the parentId in an odd position keeps things more efficient for messaging who never care about parent. |
This will be used for JMS and other propagation formats such as w3c tracestate entry. It is notably more efficient than multi-header. Example header: `b3: 4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-1` To aid in migration, this teaches the normal B3 to attempt to extract single header variants as well. See openzipkin/b3-propagation#21
note: w3c recently changed their format including an incompatible definition of flags. Notably, they try to tease out trace requested from recorded in ways that don't quite match either sampled status or debug. https://github.com/w3c/distributed-tracing/blob/master/trace_context/HTTP_HEADER_FORMAT.md Very few people were involved in this decision. for example, what we called sampled ends up flags '03', unsampled or undecided '00', and debug is not expressible For this reason, the "prefix matches w3c" part is no longer valid.. and this solidifies the need for b3 to remain its own thing as w3c drifts |
Since we don't match w3c traceparent anyway, and it will drift further, I think it is a better optimization to do what we do well. I was thinking of a way to simplify and reduce mistakes by taking advantage of the fact that debug flag is the only flag and it is sampling modifier. "X-B3-Flags" is only valid when sampled and boosts sampling decision to the collector tier. In other words it is a 4th sampling state (undecided, unsampled, sampled, debug). Instead of having a dangling "-1" for this (ex sampled+debug = "1-1"), we can keep our "hyphens plus hex" and simplify by using only a single character 'd' to indicate debug (knowing debug is implicitly sampled). So this changes from: to: It makes parsing a lot easier when we limit to the existing choices of absent, 0, 1 or d. I think the intuitiveness is worth it. Also, folks who don't care about the parent ID, they can ignore the last field completely and not miss debug flag. Let's take all the examples and translate them. I've highlighted the ones I believe are clearer (and easier to parse) Propagate a root (or non-shared) span with no decision yet: Propagate a root (or non-shared) span with a sampled decision: Propagate a root (or non-shared) span with an unsampled decision: Propagate just unsampled decision: Propagate a root (or non-shared) span with a debug decision: Propagate a RPC child span with a debug decision: As you'll notice... we can look at 3 fields (especially in messaging which never shares span ID) and get all info we need. Since this is not a bit field, there's no risk in this drifting.. wdyt? cc @openzipkin/core and everyone else here of course |
As discussed, this makes debug use the letter 'd' in the same slot as unsampled '0' or sampled '1'. The result is much simpler code and ideally more a more intuitive header. See openzipkin/b3-propagation#21 (comment)
Added a bunch of issues to track support of at least parsing this. |
finally got around to the pull request #28 |
This will be used for JMS and other propagation formats such as w3c tracestate entry. It is notably more efficient than multi-header. Example header: `b3: 4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-1` To aid in migration, this teaches the normal B3 to attempt to extract single header variants as well. See openzipkin/b3-propagation#21
Design moved here https://cwiki.apache.org/confluence/display/ZIPKIN/b3+single+header+format
In designing the Trace Context format, we made a section called tracestate which holds the authoritative propagation data.
This issue defines a value that could be used as a separate "b3" header, and would be the same value used in the w3c tracestate field. Specific to the w3c format, this holds data not in the "traceparent" format, such as parent ID and the debug flag. It would be a completely non-lossy way to allocate our current headers into one value.
In simplest terms it is a mapping:
b3=
{x-b3-traceid}-{x-b3-spanid}-{if x-b3-flags 'd' else x-b3-sampled}-{x-b3-parentspanid}
, where the last two fields are optional.For example, the following headers:
Become one header or state field. For example, if a header:
Or if using w3c trace context format
Here are some more examples:
A sampled root span would look like:
b3: 4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-1
A not yet sampled root span would look like:
b3: 4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7
A debug RPC child span would look like:
b3: 4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-d-5b4185666d50f68b
Like normal B3, it is valid to omit trace identifiers in order to only propagate a sampling decision. For example, the following are valid downstream hints:
b3: 0
b3: 1
b3: d
NOTE this does not match the prefix of traceparent, so we must define ours independently and consider that the w3c may change in different ways. This is ok as the "tracestate" entries in w3c format are required to be treated opaque. In other words we can be different on purpose or by accident of drift in their spec.
On positional encoding vs nested key/values
Positional encoding is more space efficient and less complicated to parse vs key/value encoding. For example, the AWS format code in brave is complex due to splitting, dealing with white space etc. Positional is simple to parse and straight-forward to map. Rationale is same as w3c traceparent for the most part.
Different than new specs, we expect no additional fields. B3 is a very stable spec and we are not defining anything new except how to encode it. For this reason, positional should be fine.
On putting mandatory fields up front
The trace ID and span ID fields are the only mandatory fields. This would allow fixed-length parsing for those just correlating on these values. Usually parentid is not used for correlation, rather scraping. Moreover, this is easier for existing proxies who only create trace identifiers.
Ex: you can control trace identifiers without making a sampling decision like so:
On sampled before parent span ID
When name-values aren't used, it could be confusing which of the equal length fields are the parent. By placing the sampled flag in-between, we make this more clear. Also, it matches the prefix of the current
traceparent
encoding.Encoding "not yet sampled"
Leaving out the single-character sampled field is how we encoded the "no decision" state. This matches the way we used to address this (by leaving out
X-B3-Sampled
).Encoding debug
We encode the debug flag (previously
X-B3-Flags: 1
), as the letter 'd' in the same place as sampled ('1'). This is because debug is a boosted sampled signal. Most implementations record it out-of-band asSpan.debug=true
to ensure it reaches the collector tier.One alternative considered was adding another field just to hold debug (ex a trailing
-1
). Not only was this less intuitive, it made parsing harder especially as parentId is also optional. This was reverted in openzipkin/brave#773W3C drift alert
While we should watch out for changes in the TraceContext spec, for example, if they add a "priority flag", we should keep our impl independent. B3 fields haven't changed in years and we can lock in something far safer knowing that.
Why also define as a separate header
We have had continual problems with b3 with technology like JMS. In addition to declaring this format for w3c Trace Context, we could use it right away as the header "b3". This would solve all the problems we have like JMS hating hyphens in names, and allow those who opt into it a consistent format for when they transition to w3c.
openzipkin/brave#584
In other words, in messaging propagation and even normal http, some libraries could choose to read the "b3" header for the exact same format instead of "X-B3-X"
Should we use flags instead of two fields for sampled and debug?
We could encode the three sampled states and debug as a single 8-bit field encoded as hex. If we used flags, an example sampled span would be:
Notice this is 3, not 1. That's because if using bit field we need to tell the difference between unsampled and no sampling decision. This could be confusing to people.
On flags, in java, we already encode sampled and debug state internally flags like this internally
This might be better off than having two fields, although it is less simple as people often make mistakes coding bit fields, and X-B3-Flags caused confusion many times here including #20.
What about finagle's flags?
If we used flags, we could also do it the same way as finagle does, except I think it would be confusing as the length they allocated (64 bits or 16 characters in hex) was never used in practice.
In practice we could use a single hex character to encode all the flags in our format (that supports 8 flags). Also, using Finagle's flag encoding could further confusion about the "X-B3-Flags" header, which in http encoding never has a value besides "1" (#20). At any rate, we can consider using the first 8 bits of their format as prior art regardless of if we use it.
The text was updated successfully, but these errors were encountered: