-
Notifications
You must be signed in to change notification settings - Fork 897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Map Error Status to error tag in Zipkin & Jaeger #1257
Map Error Status to error tag in Zipkin & Jaeger #1257
Conversation
Error
Status
to error
tag in Zipkin & Jaeger
I think we use all-lowercase in all other enums in semantic conventions. So I'd rather keep it as-is or use all-lowercase. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this PR needs a better motivation. It is a breaking change after all. Also, it seems that the Zipkin change to error
needs polishing.
In the latter case it MUST NOT be reported. | ||
|
||
The following table defines the OpenTelemetry `Status` to Zipkin `tags` mapping. | ||
|
||
| Status|Tag Key| Tag Value | | ||
|--|--|--| | ||
|Code | `otel.status_code` | Name of the code, either `OK` or `ERROR`. MUST NOT be set if the code is `UNSET`. | | ||
|Code | `otel.status_code` | Name of the code, either `Ok` or `Error`. MUST NOT be set if the code is `Unset`. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How valuable is sending this? We can use it to distinguish OK and UNSET, but ERROR is already marked with the error
attribute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. Definitely redundant to have otel.status_code=ERROR
with the error
flag also present on either Jaeger or Zipkin. But having otel.status_code
does keep things consistent and allows for easier translation to/from OTLP? Might also be better to leave with both in there for now so this is an additive change and not a breaking one?
For motivation, check out open-telemetry/opentelemetry-dotnet#1579. There are 2 screenshots on there that show what Zipkin & Jaeger do with the error flag. Basically, the UI highlights the spans as errors when the flag is present. There's also some stuff that shows on the search results, but I don't have a screenshot handy for that. |
Sorry, my complaint was worded ambiguous: I am in favor of adding I meant a motivation to change the casing of |
I put them back to uppercase. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small clarification comment but generally looks good
### Error flag | ||
|
||
When Span `Status` is set to `ERROR` an `error` tag SHOULD be added with an | ||
empty string value (eg. `{"error":""}`). The added `error` tag MAY override any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused on the "override previous value" - does it mean that if the user set an error tag themselves, this may unset it if Status
is UNSET
? Or if a user sets error=cats
, we override to error=
? Just want to have an idea (maybe we can add some clarification)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to say that exporters can simply add the error
tag on the end of the tag list which will effectively override anything already there (with the same error
key). My motivation for that was 1) I didn't want to leave it undefined and 2) checking for an existing tag is particularly expensive (O(n)
) in the .NET implementation so I was hoping to avoid needing to do that. Open to suggestions for change or improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we setting an error
tag with empty string value and additionally add otel.status_description
instead of just setting the error
tag with a string value equal to Status message?
Doesn't Status message fit nicely in the error
tag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tigrannajaryan The only issue with that is error
tag must ONLY be sent if there was an actual error. So what we would need to do is set error
instead of otel.status_description
when otel.status_code=ERROR
otherwise send otel.status_description
(if it has a value for OK
or UNSET
cases). I don't have strong feelings either way. I will say that as-is is easier to implement (always sending otel.status_description
when it has a value and then adding the error
tag (empty) for otel.status_code=ERROR
) and it's non-breaking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe Status.message is not supposed to be set for OK or UNSET cases. We should explicitly call it out in the spec. This means error
tag will only be set equal to Spatus.message when Status.status==ERROR and for all other statuses there is no error message to record.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tigrannajaryan I added some text on Status
to clarify Description
should only be used with Error
and then updated Zipkin & Jaeger based on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just verifying that the otel-java
code follows the specification and I've given the above a read and may have noticed a slight difference between the jaeger
and zipkin
exporter specs regarding the error Status
.
In the jaeger
specification it says
otel.status_description
: Description of the Status if it has a value otherwise not set.
So this means that for jaeger
it sets the otel.status_description
if the status description is set - it does not mandate that it is ONLY set if there is an error. Additionally, if the status is an error, it shall add an error
tag.
However for zipkin
it says the following:
error
: Description of the Status. MUST be set if the code is ERROR, use an empty string if Description has no value. MUST NOT be set for OK and UNSET codes.
This describes that the otel.status_description
tag isn't set but instead the error
tag is the only thing that is set providing that there is an ERROR code and not OK or UNSET.
Is there a reason why for jaeger
we provide the otel.status_code
, otel.status_description
AND the error
tag but for zipkin
we only provide otel.status_code
and the error
tag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ChrisJBurns Slight differences in the implementations, basically. Zipkin error
is a string where you can put the description. Jaeger error
is a bool (true/false) so it has otel.status_description
for the description. The Zipkin text is a bit more specific because if you send something like error=OK
or error=false
to Zipkin, it's still going to treat that span as failed.
In general I'd advise looking at this as it has a lot of information about nuance of where and from who data is collected https://github.com/openzipkin/brave/blob/master/brave/src/main/java/brave/handler/MutableSpan.java#L37-L87 |
@@ -127,9 +127,7 @@ convention](../semantic_conventions/README.md) | |||
document maps to the strongly-typed fields of Zipkin spans. | |||
|
|||
Primitive types MUST be converted to string using en-US culture settings. | |||
Boolean values must use lower case strings `"true"` and `"false"`, except an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this special case is still true. For example, you have an opentracing bridge which has in the past littered folks stuff with "error"->"false" data. can you please verify that false should never be sent as a value of "error" regardless of if being mapped from the grpc status or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I added it back with different words as part of the Status section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add this change to the changelog.
@tigrannajaryan CHANGELOG updated. |
Let's keep this open until tomorrow since there were substantial changes. If there are no further comments we can merge tomorrow. |
- An API to set the `Status`. This SHOULD be called `SetStatus`. This API takes | ||
the `StatusCode`, and an optional `Description`, either as individual | ||
parameters or as an immutable object encapsulating them, whichever is most | ||
appropriate for the language. `Description` MUST be IGNORED for `StatusCode` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
either as individual parameters or as an immutable object encapsulating them, whichever is most appropriate for the language.
This sounds like a general guideline that would apply to any API function, we don't need to be cluttering individual function descriptions with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was there before my addition you want me to update here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be done in another PR
Co-authored-by: Yuri Shkuro <yurishkuro@users.noreply.github.com>
* Added mapping of Error Status to error flag in Zipkin & Jaeger exporters. * Zipkin error flag is now an empty string. * Restored uppercase for status enum values. * Removed the error text under attributes. * Casing fix. * Review feedback. * Added back information about removing error flag when false. * Updated CHANGELOG. * Update specification/trace/sdk_exporters/jaeger.md Co-authored-by: Yuri Shkuro <yurishkuro@users.noreply.github.com> * Review feedback. Co-authored-by: Cijo Thomas <cithomas@microsoft.com> Co-authored-by: Yuri Shkuro <yurishkuro@users.noreply.github.com>
Changes
Status
mapping to Jaeger which is the same as Zipkin.Error
Status
toerror
tag for Zipkin & Jaeger.Error
mapping./cc @cijothomas @yurishkuro @tedsuo @adriancole