Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Map Error Status to error tag in Zipkin & Jaeger #1257

Merged
merged 16 commits into from
Dec 10, 2020

Conversation

CodeBlanch
Copy link
Member

@CodeBlanch CodeBlanch commented Nov 23, 2020

Changes

  • Added Status mapping to Jaeger which is the same as Zipkin.
  • Defined mapping of Error Status to error tag for Zipkin & Jaeger.
  • Added a row in the compliance matrix for the Error mapping.

/cc @cijothomas @yurishkuro @tedsuo @adriancole

@CodeBlanch CodeBlanch requested review from a team November 23, 2020 18:42
@CodeBlanch CodeBlanch changed the title Map Error Status to error tag in Zipkin & Jaeger Map Error Status to error tag in Zipkin & Jaeger Nov 23, 2020
@Oberon00
Copy link
Member

I think we use all-lowercase in all other enums in semantic conventions. So I'd rather keep it as-is or use all-lowercase.

Copy link
Member

@Oberon00 Oberon00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR needs a better motivation. It is a breaking change after all. Also, it seems that the Zipkin change to error needs polishing.

In the latter case it MUST NOT be reported.

The following table defines the OpenTelemetry `Status` to Zipkin `tags` mapping.

| Status|Tag Key| Tag Value |
|--|--|--|
|Code | `otel.status_code` | Name of the code, either `OK` or `ERROR`. MUST NOT be set if the code is `UNSET`. |
|Code | `otel.status_code` | Name of the code, either `Ok` or `Error`. MUST NOT be set if the code is `Unset`. |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How valuable is sending this? We can use it to distinguish OK and UNSET, but ERROR is already marked with the error attribute.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. Definitely redundant to have otel.status_code=ERROR with the error flag also present on either Jaeger or Zipkin. But having otel.status_code does keep things consistent and allows for easier translation to/from OTLP? Might also be better to leave with both in there for now so this is an additive change and not a breaking one?

@CodeBlanch
Copy link
Member Author

For motivation, check out open-telemetry/opentelemetry-dotnet#1579. There are 2 screenshots on there that show what Zipkin & Jaeger do with the error flag. Basically, the UI highlights the spans as errors when the flag is present. There's also some stuff that shows on the search results, but I don't have a screenshot handy for that.

@Oberon00
Copy link
Member

Oberon00 commented Nov 24, 2020

Sorry, my complaint was worded ambiguous: I am in favor of adding error when the status equals ERROR.

I meant a motivation to change the casing of otel.status_code.

@CodeBlanch
Copy link
Member Author

I think we use all-lowercase in all other enums in semantic conventions. So I'd rather keep it as-is or use all-lowercase.

I put them back to uppercase.

Copy link
Member

@Oberon00 Oberon00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Copy link
Contributor

@anuraaga anuraaga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small clarification comment but generally looks good

### Error flag

When Span `Status` is set to `ERROR` an `error` tag SHOULD be added with an
empty string value (eg. `{"error":""}`). The added `error` tag MAY override any
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused on the "override previous value" - does it mean that if the user set an error tag themselves, this may unset it if Status is UNSET? Or if a user sets error=cats, we override to error=? Just want to have an idea (maybe we can add some clarification)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to say that exporters can simply add the error tag on the end of the tag list which will effectively override anything already there (with the same error key). My motivation for that was 1) I didn't want to leave it undefined and 2) checking for an existing tag is particularly expensive (O(n)) in the .NET implementation so I was hoping to avoid needing to do that. Open to suggestions for change or improvement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we setting an error tag with empty string value and additionally add otel.status_description instead of just setting the error tag with a string value equal to Status message?

Doesn't Status message fit nicely in the error tag?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tigrannajaryan The only issue with that is error tag must ONLY be sent if there was an actual error. So what we would need to do is set error instead of otel.status_description when otel.status_code=ERROR otherwise send otel.status_description (if it has a value for OK or UNSET cases). I don't have strong feelings either way. I will say that as-is is easier to implement (always sending otel.status_description when it has a value and then adding the error tag (empty) for otel.status_code=ERROR) and it's non-breaking.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe Status.message is not supposed to be set for OK or UNSET cases. We should explicitly call it out in the spec. This means error tag will only be set equal to Spatus.message when Status.status==ERROR and for all other statuses there is no error message to record.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tigrannajaryan I added some text on Status to clarify Description should only be used with Error and then updated Zipkin & Jaeger based on that.

Copy link

@ChrisJBurns ChrisJBurns Dec 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just verifying that the otel-java code follows the specification and I've given the above a read and may have noticed a slight difference between the jaeger and zipkin exporter specs regarding the error Status.

In the jaeger specification it says

otel.status_description : Description of the Status if it has a value otherwise not set.

So this means that for jaeger it sets the otel.status_description if the status description is set - it does not mandate that it is ONLY set if there is an error. Additionally, if the status is an error, it shall add an error tag.

However for zipkin it says the following:

error : Description of the Status. MUST be set if the code is ERROR, use an empty string if Description has no value. MUST NOT be set for OK and UNSET codes.

This describes that the otel.status_description tag isn't set but instead the error tag is the only thing that is set providing that there is an ERROR code and not OK or UNSET.

Is there a reason why for jaeger we provide the otel.status_code, otel.status_description AND the error tag but for zipkin we only provide otel.status_code and the error tag.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChrisJBurns Slight differences in the implementations, basically. Zipkin error is a string where you can put the description. Jaeger error is a bool (true/false) so it has otel.status_description for the description. The Zipkin text is a bit more specific because if you send something like error=OK or error=false to Zipkin, it's still going to treat that span as failed.

@codefromthecrypt
Copy link

In general I'd advise looking at this as it has a lot of information about nuance of where and from who data is collected https://github.com/openzipkin/brave/blob/master/brave/src/main/java/brave/handler/MutableSpan.java#L37-L87

@@ -127,9 +127,7 @@ convention](../semantic_conventions/README.md)
document maps to the strongly-typed fields of Zipkin spans.

Primitive types MUST be converted to string using en-US culture settings.
Boolean values must use lower case strings `"true"` and `"false"`, except an

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this special case is still true. For example, you have an opentracing bridge which has in the past littered folks stuff with "error"->"false" data. can you please verify that false should never be sent as a value of "error" regardless of if being mapped from the grpc status or not?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I added it back with different words as part of the Status section.

Copy link
Member

@tigrannajaryan tigrannajaryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add this change to the changelog.

@CodeBlanch
Copy link
Member Author

@tigrannajaryan CHANGELOG updated.

@tigrannajaryan
Copy link
Member

Let's keep this open until tomorrow since there were substantial changes. If there are no further comments we can merge tomorrow.

- An API to set the `Status`. This SHOULD be called `SetStatus`. This API takes
the `StatusCode`, and an optional `Description`, either as individual
parameters or as an immutable object encapsulating them, whichever is most
appropriate for the language. `Description` MUST be IGNORED for `StatusCode`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either as individual parameters or as an immutable object encapsulating them, whichever is most appropriate for the language.

This sounds like a general guideline that would apply to any API function, we don't need to be cluttering individual function descriptions with it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was there before my addition you want me to update here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be done in another PR

specification/trace/sdk_exporters/jaeger.md Outdated Show resolved Hide resolved
CodeBlanch and others added 2 commits December 8, 2020 22:36
@yurishkuro yurishkuro merged commit 0e25dd5 into open-telemetry:master Dec 10, 2020
@CodeBlanch CodeBlanch deleted the zipkin-jaeger-error-flag branch March 15, 2021 22:30
carlosalberto pushed a commit to carlosalberto/opentelemetry-specification that referenced this pull request Oct 31, 2024
* Added mapping of Error Status to error flag in Zipkin & Jaeger exporters.

* Zipkin error flag is now an empty string.

* Restored uppercase for status enum values.

* Removed the error text under attributes.

* Casing fix.

* Review feedback.

* Added back information about removing error flag when false.

* Updated CHANGELOG.

* Update specification/trace/sdk_exporters/jaeger.md

Co-authored-by: Yuri Shkuro <yurishkuro@users.noreply.github.com>

* Review feedback.

Co-authored-by: Cijo Thomas <cithomas@microsoft.com>
Co-authored-by: Yuri Shkuro <yurishkuro@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants