Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zipkin translator not respecting "error" tag correctly #16530

Closed
joe-elliott opened this issue Nov 29, 2022 · 23 comments
Closed

Zipkin translator not respecting "error" tag correctly #16530

joe-elliott opened this issue Nov 29, 2022 · 23 comments
Labels
bug Something isn't working good first issue Good for newcomers help wanted Extra attention is needed never stale Issues marked with this label will be never staled and automatically removed pkg/translator/zipkin

Comments

@joe-elliott
Copy link
Contributor

Component(s)

receiver/zipkin

What happened?

Description

When consuming Zipkin data the translation layer does not correctly translate the "error" tag into a status of error unless the string is exactly "true". Conventionally if a Zipkin span has the "error" tag with any valid data it should be considered an errored span.

This logic likely needs to be updated to accept values other than "true".

if val, ok := tags[tracetranslator.TagError]; ok {
if val == "true" {

This should be updated to:

  • If the error attribute exists with any value other than "true" set the status code to error and leave the error attribute
  • If the error attribute exists with exactly the value "true" set the status code to error and remove the error attribute.

Unsure about the second line, but it preserves the existing behavior and seems reasonable to me.

Steps to Reproduce

Send zipkin data with a key/value tag "error" with any value other than "true".

Expected Result

The OTLP status code is correctly set to error.

Actual Result

The OTLP status code is set to ok.

Collector version

v0.66.0

Environment information

Environment

any

OpenTelemetry Collector configuration

No response

Log output

No response

Additional context

No response

@joe-elliott joe-elliott added bug Something isn't working needs triage New item requiring triage labels Nov 29, 2022
@jpkrohling jpkrohling added pkg/translator/zipkin and removed needs triage New item requiring triage labels Nov 30, 2022
@github-actions
Copy link
Contributor

Pinging code owners: @open-telemetry/collector-approvers. See Adding Labels via Comments if you do not have permissions to add labels yourself.

@jpkrohling
Copy link
Member

It seems reasonable to me, and matches the logic we have in the Jaeger translator:

if errorVal, ok := attrs.Get(tracetranslator.TagError); ok && errorVal.Type() == pcommon.ValueTypeBool {
if errorVal.Bool() {
statusCode = ptrace.StatusCodeError
attrs.Remove(tracetranslator.TagError)
statusExists = true
if desc, ok := extractStatusDescFromAttr(attrs); ok {
statusMessage = desc
} else if descAttr, ok := attrs.Get(tracetranslator.TagHTTPStatusMsg); ok {
statusMessage = descAttr.Str()
}
}
}

In summary: if it's a boolean, and if it's true, then set the error flag to true and remove the attribute. Otherwise, leave it as is.

@jpkrohling jpkrohling added help wanted Extra attention is needed good first issue Good for newcomers labels Nov 30, 2022
@jonatan-ivanov
Copy link
Member

Fwiw, this is also defined in the zipkin exporter specs: https://opentelemetry.io/docs/reference/specification/trace/sdk_exporters/zipkin/#status

@jonatan-ivanov
Copy link
Member

If the error attribute exists with exactly the value "true" set the status code to error and remove the error attribute

I think this behavior would be weird. This means that if I throw an exception/error with an error message "true", it would not be an error anymore. Zipkin does not have any special treatment for this tag, if it has a value, the span failed, if it does not it was successful, the value does not matter after that.

@joe-elliott
Copy link
Contributor Author

This means that if I throw an exception/error with an error message "true", it would not be an error anymore.

The span would still be in error b/c the OTLP span would be have its status code set to error. Presumably on translating from OTLP back to Zipkin you would have to add error="true" if the otlp error status was set and there was no error attribute.

I proposed the behavior only to keep it backwards compatible with existing, but don't have strong opinions.

@jonatan-ivanov
Copy link
Member

I see your point. I'm also thinking that if the previous behavior is a bug (I'm not saying it is), it should be ok to break (fix) it.

@jpkrohling
Copy link
Member

I'm fine either way as well, being slightly in favor of @joe-elliott's proposal, as it matches the behavior of the Jaeger translator. Bonus points: it has the least attrition to current users of this component.

@nitedani
Copy link

nitedani commented Jan 20, 2023

JS:
https://github.com/open-telemetry/opentelemetry-js/blob/9589d541cef7c8d0227d66c0c4fc2ee32ac93dbe/packages/opentelemetry-exporter-zipkin/src/transform.ts#L70

export enum SpanStatusCode {
  UNSET = 0,
  OK = 1,
  ERROR = 2,
}

Produced tags by the JS library:

{
 "otel.status_code": "ERROR",
 "error": "error message"
}

python
https://github.com/open-telemetry/opentelemetry-python/blob/a99cc21152ef1a129fc28ec539e73cbe4ffa8e75/exporter/opentelemetry-exporter-zipkin-json/src/opentelemetry/exporter/zipkin/encoder/__init__.py#L197

class StatusCode(enum.Enum):
    UNSET = 0
    OK = 1
    ERROR = 2

go

func populateSpanStatus(tags map[string]string, status ptrace.Status) {

var statusCodeValue = map[string]int32{
	"STATUS_CODE_UNSET": 0,
	"STATUS_CODE_OK":    1,
	"STATUS_CODE_ERROR": 2,  <--
}

At this point status.SetCode(ptrace.StatusCode(statusCodeValue["ERROR"])) will run, looking up "ERROR" in the statusCodeValue enum, which is not an existing value.
This is a bug. statusCodeValue is wrong
Please make these libraries compatible with one another.

@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

  • pkg/translator/zipkin: @open-telemetry/collector-approvers

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Mar 22, 2023
@jonatan-ivanov
Copy link
Member

If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

@open-telemetry/collector-approvers I think this is still very relevant, it makes the Zipkin receiver of the collector quite useless in scenarios where an error is attached to the span which is a rather important use-case.

@jpkrohling jpkrohling removed the Stale label Mar 27, 2023
@jpkrohling
Copy link
Member

I'm open to accepting a pull request changing this, especially if it means aligning with OTel JS. The difficulty here is figuring out the right thing to do, given we don't have Zipkin code owners in this repository. If you know anyone who'd be interested in taking care of that component, let me know.

@jonatan-ivanov
Copy link
Member

@jpkrohling Could you please clarify one thing to me? Are you seeking advice from Zipkin maintainers or are you saying that the Zipkin receiver does not have an owner so it is not maintained anymore?

@jpkrohling
Copy link
Member

Both :-) We do not have code owners for the Zipkin exporter/receiver/translator, and none of the current maintainers of this repository are Zipkin experts, from what I know.

@shakuzen
Copy link

@jpkrohling Feel free to ping me if there is some Zipkin-specific question I can answer. I used to be an active committer for Zipkin (not so active these days).

As for this issue, it sounds like there is an acceptable change to be made. Is the only thing remaining to do is for someone to make a pull request with the change? Or is there still some open question about how things should be changed?

@atoulme
Copy link
Contributor

atoulme commented Mar 31, 2023

@shakuzen can I interest you in a once in a lifetime opportunity? Head on to #20269 if you're looking for adventure :)

@jpkrohling
Copy link
Member

Is the only thing remaining to do is for someone to make a pull request with the change?

At this point, yes. I'll gladly review and merge a PR that does what this issue here proposes.

@github-actions
Copy link
Contributor

github-actions bot commented Jun 5, 2023

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Jun 5, 2023
@jonatan-ivanov
Copy link
Member

I'm not sure this will be less of an issue with time, can somebody label this issue so that we don't need to ping-pong with the issue bot?

@github-actions github-actions bot removed the Stale label Jun 13, 2023
@andrzej-stencel andrzej-stencel added the never stale Issues marked with this label will be never staled and automatically removed label Jun 13, 2023
jpkrohling pushed a commit that referenced this issue Jul 28, 2023
…4547)

**Description:** <Describe what has changed.>
Fixing a bug - Stop dropping error tags from Zipkin spans. The old code
removes all errors from those spans, rendering them useless if an actual
error happened. In addition, no longer delete error tags if they contain
useful information. This now only deletes them if they are "true", and
thus don't contain useful information. Also fixes the whole issue of

**Link to tracking Issue:** #16530

**Testing:** Absolutely none, I have written no Go code before this and
have not even run the code.

I'm hoping CI saves me the trouble of installing and running everything
locally

---------

Co-authored-by: Curtis Robert <crobert@splunk.com>
Co-authored-by: Curtis Robert <92119472+crobert-1@users.noreply.github.com>
Co-authored-by: Antoine Toulme <antoine@lunar-ocean.com>
@jcchavezs
Copy link
Contributor

I can be code owner for zipkin.

@basvanbeek
Copy link
Contributor

I can be one too... I think with @jcchavezs and myself we have two actual maintainers of Zipkin being able to help out keep things idiomatic and consistent.

@jpkrohling
Copy link
Member

That's great to hear!

@gord02
Copy link
Contributor

gord02 commented Oct 18, 2023

Hello Everyone! I am new to the community and looking to contribute where I can. Is there anything left to be done for this issue? If I am interpreting the code correctly, I see that the status is being set to error if there exists an error tag, and the tag is only removed if the tag value string is true.

@crobert-1
Copy link
Member

Thanks @gord02, you're right, #24547 fixed this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers help wanted Extra attention is needed never stale Issues marked with this label will be never staled and automatically removed pkg/translator/zipkin
Projects
None yet
Development

No branches or pull requests