-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: set span end time when it exits #124
Conversation
According to otel spec the start time should be set to the time when the span was created and it can be overridable (currently isn't but that's fine I guess). End time should be set when the operation ends which I belive is during the last |
|
In the case when there's no tracked inactivity, this change is pretty bad. When inactivity is tracked, it's not so bad. It is a regression, but I think it's worth pointing out that it's 1000 enters and exits on a span still under 100 us. I'm not sure if there's a way to get the time faster.
|
For the record, I do believe that all parents having an end time only after all their children have closed (except for inter-service boundaries) is a bug. I could also create an option on the subscriber for this that could skip this to avoid the overhead if the user doesn't mind the timing issue or is sure it doesn't matter for their workloads and the spans would be closed right after the last exit. If you have any idea to make this faster though, I'd be happy to try it out. Another option would be to add |
Why? It sounds intuitively reasonable. |
Because that behaviour is present only because that's how In their example span C is longer than its parent. I personally had a scenario where a request spawns a long running background task where the former will be much shorter. |
Yep, spec said:
So child spans can be |
Setting end time when a span is closed makes it impossible for child spans to live longer than their parents if we want to keep the relationship in `tracing` itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, thanks. I think I agree that the performance impact is acceptable.
I'm a little worried about this change: I'd naively expect that the correct direction for "children outliving their parents" is |
@davidbarsky how do you square that with the spec referenced above? |
Personally, I see I've never seen any proposed rules about what should be considered parent-child and what should be just linked causal relationship, but currently for us, mostly visualization capabilities dictate our usage. |
I agree with this. My particular scenario is an incoming web request that then causes a message to be published to a queue. The publish happens asynchronously and may complete long after the web response is complete. However, there is a strong parent-child relationship here that should be maintained, that |
Sorry for the delay. My stance is that, at least in this respect, If there's a solution, it'd be something that handles span links or making the approach that @neoeinstein took more ergonomic, such as through a
(On an aside @neoeinstein, I appreciate the implication of what you're doing is necromancy!) |
We've been doing the same thing (and even created the macro you mentioned), but the main issue with that is that we lose the This change does not change the garbage collection at all (e.g. the request spans are still kept in the registry until the long running tasks finish), the only difference is the timestamp that is sent in the OpenTelemetry span as an end. Which is pretty much exactly the thing I would want it to do. The main drawback this has compared to detaching the child is that the OpenTelemetry spans are exported only after all the children have been closed (so the request spans are needlessly kept around). I did also think about whether maybe |
@@ -1165,9 +1169,7 @@ where | |||
} | |||
|
|||
// Assign end time, build and start span, drop span to export |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Assign end time
I think that the comment should have been updated too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops! Want to send a PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
=> #148
Closes #116
Motivation
Currently, the open telemetry span is considered to be running until the tracing span is dropped and closed. This can be an issue in cases where either the span is held onto in case the operation needs more processing which was not known beforehand or when a structure holds onto the span.
This is an issue especially if an instrumented task is spawned as
tracing
will hold onto all its ancestors. Therefore until the spawned task ends, the parent span will also be considered not closed.Solution
The time is set every time a span is exited. This incurs a lookup in extensions during
on_exit
.Open Questions
Start time is set during creation of the new span
It might also make sense to do it during the first
on_enter
call. I can't think of a scenario where someone creates spans beforehand and enters them only significantly later though (maybe that could happen for some instrumented asynchronous tasks with extremely high contention?)