-
Notifications
You must be signed in to change notification settings - Fork 844
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(sdk-trace-base): make span start times resistant to hrtime clock drift #3129
Conversation
Codecov Report
@@ Coverage Diff @@
## main #3129 +/- ##
==========================================
- Coverage 93.10% 93.05% -0.06%
==========================================
Files 196 196
Lines 6400 6410 +10
Branches 1349 1350 +1
==========================================
+ Hits 5959 5965 +6
- Misses 441 445 +4
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth abstracting this computation with a similar component to https://github.com/open-telemetry/opentelemetry-java/blob/main/sdk/trace/src/main/java/io/opentelemetry/sdk/trace/AnchoredClock.java?
I thought about that. What package do you think we would want it in? Core is the only place I can think of. Unless we make a separate clock package. |
) { | ||
this.name = spanName; | ||
this._spanContext = spanContext; | ||
this.parentSpanId = parentSpanId; | ||
this.kind = kind; | ||
this.links = links; | ||
this.startTime = timeInputToHrTime(startTime); | ||
this._hrStartTime = hrTime(); | ||
this.startTime = timeInputToHrTime(startTime ?? Date.now()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle it may happen that the process gets suspended for a few 100ms between line 81 and 82 resulting in two different start times - one used as start time one used to calc duration.
Not sure if it is worth to investigate a solution to avoid this. I could imagine that we could regularly update our own performance.timeOrigin
instead of relying on that on from node.js.
I'm also not sure if Date.now()
is worse performance wise as it might need to access the real time clock of the system and not just an high precision counter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we regularly update timeOrigin we lose the primary benefits of the hrtime anyway if it isn't monotonic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hrtime should be monotonic. To my understanding it only drifts away.
But yes, if the correction happens between span start/end the result my be strange.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah that's what i meant. if we apply a correction regularly we might apply the correction during a span and end up with a strange result like end before start which was the primary reason to use the monotonic clock in the first place.
I think core is the right place for it as all of our clock-related stuff is located in the core. |
I tried to go for an anchored clock implementation in #3134 |
Seems like #3134 replaces this one, right? |
Yes |
Fixes #852
Replaces #1019
Because the hrtime can drift from the system time, a system that is up for a long time will eventually have incorrect timings if times are not anchored to the system clock. This PR uses the system clock to calculate span start times and the high resolution monotonic clock to calculate span end times. This allows a high level of precision to be maintained for span durations, but protects spans timings from drifting off the real clock time. User provided times always override calculated times.