Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refinery does not properly handle empty parentId strings when determining root spans #1234

Closed
julialamenza opened this issue Jul 18, 2024 · 5 comments · Fixed by #1236
Closed
Labels
type: bug Something isn't working

Comments

@julialamenza
Copy link

julialamenza commented Jul 18, 2024

The Refinery considers a span to not be a root if the parentId is a string, but it doesn't check if the string is empty. This can lead to incorrect identification of root spans.
https://github.com/honeycombio/refinery/blob/main/collect/collect.go#L604
lso this will cause delays on the span being transmitted, because Refinery will wait for the parent span to show up, for TraceTimeout (60s by default). And in this case it never will, so any of these real rootspans are actually delayed by 60s for sending to Honeycomb.

Steps to Reproduce:

  1. Create a span with an empty string as the parentId.
  2. Observe that the span is not considered a root, even though it should be.

Expected Behavior:

Spans with an empty parentId string should be considered root spans.

  • Refinery version: v2.6.1
  • Operating system: Linux
@julialamenza julialamenza added the type: bug Something isn't working label Jul 18, 2024
@MikeGoldsmith
Copy link
Contributor

Thanks for reporting the issue @julialamenza.

Can you describe the situation where you're seeing a span with an empty parentId? We'll look into considering empty values when determining whether a span is root.

@julialamenza
Copy link
Author

Sometimes we are sending traces with trace.parent_id as a blank string and these appear to be "paused" in Refinery until the TraceTimeout elapses (as if it considered a blank string as a valid span_id).
As a result, the graphs for the service that sends these traces that way is delayed by 60s (the value of the TraceTimeout) and metrics look like they are dropping even though the traces are only delayed.
This seems to be a change in behaviour compared to Refinery 1.x

@kentquirk
Copy link
Contributor

Are you using an OpenTelemetry library? If so, can you tell us which one? We can look into enabling this behavior, but it's counter to the OpenTelemetry specification.

@julialamenza
Copy link
Author

No we dont

@MikeGoldsmith
Copy link
Contributor

@julialamenza we've merged a fix and expect it to go out with our next release, likely in the next week or two.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants