Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the current poll must have started before it ended #286

Closed
gwik opened this issue Feb 17, 2022 · 1 comment · Fixed by #287 or #288
Closed

the current poll must have started before it ended #286

gwik opened this issue Feb 17, 2022 · 1 comment · Fixed by #287 or #288
Labels
C-subscriber Crate: console-subscriber. S-bug Severity: bug

Comments

@gwik
Copy link

gwik commented Feb 17, 2022

I keep hitting the debug_assert:

thread 'tokio-runtime-worker' panicked at 'the current poll must have started before it ended; this is a `console-subscriber` bug!', /home/gwik/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/console-subscriber-0.1.2/src/stats.rs:449:13

After some investigations, it's simply the system clock going backward.

thread 'tokio-runtime-worker' panicked at 'the current poll must have started before it ended; this is a `console-subscriber` bug!
at      = SystemTime { tv_sec: 1645112066, tv_nsec: 136896974 }
started = SystemTime { tv_sec: 1645112066, tv_nsec: 136906896 }
result  = Err(SystemTimeError(9.922µs))

This is probably due to clock skew adjustements (I run a NTP daemon systemd-timesyncd).
I guess there is a reason to use SystemTime instead of Instant ?

@hawkw
Copy link
Member

hawkw commented Feb 17, 2022

Yes, this is likely due to clock skew. Note that that assertion is only enabled in debug mode, in release mode, we will simply ignore any poll durations that are incorrect due to clock skew:

let elapsed = last_poll_started.and_then(|started| at.duration_since(started).ok());
debug_assert!(elapsed.is_some(), "the current poll must have started before it ended; this is a `console-subscriber` bug!");
if let Some(elapsed) = elapsed {

A short-term solution for the panic would be to simply remove that assertion. We would then just ignore any polls that have negative durations, which is not ideal, but fixes the panic.

The longer-term solution is to transition the console to use monotonic time. Currently, we use SystemTimes rather than Instants because it's necessary to send timestamps on the wire, so we have to be able to access their actual values in order to serialize them. Instant is opaque, so it cannot easily be used for this. However, using SystemTime is not really correct either; these timestamps are only used for calculating durations, so it is much more important for them to be monotonic than to be globally comparable outside of the process. What we should probably do is switch to an approach where we take a single SystemTime timestamp and corresponding Instant when the process is started as an anchor, and record all actual timestamps using monotonic Instants. Then, we can calculate the elapsed time between the anchor Instant and the timestamp of a particular event, and add that duration to the anchored SystemTime. This allows us to generate a SystemTime that can be sent over the wire, but is monotonic and not effected by clock skew adjustments.

This is loosely related to the changes discussed in #254.

@hawkw hawkw added C-subscriber Crate: console-subscriber. S-bug Severity: bug labels Feb 17, 2022
hawkw added a commit that referenced this issue Feb 17, 2022
## Motivation

Currently, if compiled in debug mode, the `console-subscriber` crate's
`PollStats::end_poll` function will panic if a poll duration is negative
(i.e. if the poll's end
timestamp occurred before its start timestamp). Because the console
currently uses non-monotonic `SystemTime`s rather than monotonic
`Instant`s (due to serialization requirements), this means we are
potentially succeptible to clock skew adjustments. If one occurs during
a poll, and the system clock goes backwards, we may panic here. This
isn't great!

## Solution

This branch fixes the panic by removing the assertions, and changing
`end_poll` to simply bail and print a warning rather than panicking when
encountering a negative poll duration. This is a short-term solution; a
better long-term solution would be to change `console-subscriber` to use
monotonic `Instant`s rather than non-monotonic `SystemTime`s (as
discussed [here][1] and [here][2]).

Fixes #286

[1]:#254
[2]: #286 (comment)
hawkw added a commit that referenced this issue Feb 17, 2022
…287)

## Motivation

Currently, if compiled in debug mode, the `console-subscriber` crate's
`PollStats::end_poll` function will panic if a poll duration is negative
(i.e. if the poll's end
timestamp occurred before its start timestamp). Because the console
currently uses non-monotonic `SystemTime`s rather than monotonic
`Instant`s (due to serialization requirements), this means we are
potentially succeptible to clock skew adjustments. If one occurs during
a poll, and the system clock goes backwards, we may panic here. This
isn't great!

## Solution

This branch fixes the panic by removing the assertions, and changing
`end_poll` to simply bail and print a warning rather than panicking when
encountering a negative poll duration. This is a short-term solution; a
better long-term solution would be to change `console-subscriber` to use
monotonic `Instant`s rather than non-monotonic `SystemTime`s (as
discussed [here][1] and [here][2]).

Fixes #286

[1]:#254
[2]: #286 (comment)
hawkw added a commit that referenced this issue Feb 17, 2022
## Motivation

Currently, the `console-subscriber` crate records all timestamps as
`SystemTime`s. This is because they are eventually sent over the wire as
protobuf `Timestamp`s, which can be constructed from a `SystemTime`.
They cannot be constructed from `Instant`s, because `Instant` is opaque,
and does not expose access to the underlying OS time.

However, using `SystemTime` is not really correct for our use case. We
only use timestamps for calculating durations; we only have to serialize
them because some durations are calculated in the console UI rather than
in-process. We *don't* need timestamps that are globally consistent with
a shared timebase, but we *do* need monotonicity --- using `SystemTime`
leaves us vulnerable to clock skew, if (for example), an NTP clock skew
adjustment causes the system clock to run backwards far enough that a
poll appears to end "before" it started (as in issue #286). If we were
using monotonic `Instant`s, all polls should always have positive
durations, but with `SystemTime`s, this isn't necessarily the case.

Furthermore, `Instant::now()` may have less performance overhead than
`SystemTime::now()`, at least on some platforms.

## Solution

This branch changes `console-subscriber` to always take timestamps using
`Instant::now()` rather than using `SystemTime::now()`, and store all
timestamps as `Instant`s. In order to convert these `Instant`s into
`SystemTime`s that can be sent over the wire, we construct a reference
`TimeAnchor`, consisting of a paired `Instant` and `SystemTime` recorded
at the same time when the `ConsoleLayer` is constructed. We can then
construct "system times" that are monotonic, by calculating the duration
between a given `Instant` and the anchor `Instant`, and adding that
duration to the anchor `SystemTime`. These are not *real* system
timestamps, as they will never run backwards if the system clock is
adjusted; they are relative only to the base process start time as
recorded by the anchor. However, they *are* monotonic, and all durations
calculated from them will be reasonable.

This is part of the change I proposed in #254. I'm not going to close
that issue yet, though, as it also described potentially switching to
use the `quanta` crate rather than `std::time::Instant` to reduce the
overhead of recording monotonic timestamps.

Fixes #286
hawkw added a commit that referenced this issue Feb 17, 2022
## Motivation

Currently, the `console-subscriber` crate records all timestamps as
`SystemTime`s. This is because they are eventually sent over the wire as
protobuf `Timestamp`s, which can be constructed from a `SystemTime`.
They cannot be constructed from `Instant`s, because `Instant` is opaque,
and does not expose access to the underlying OS time.

However, using `SystemTime` is not really correct for our use case. We
only use timestamps for calculating durations; we only have to serialize
them because some durations are calculated in the console UI rather than
in-process. We *don't* need timestamps that are globally consistent with
a shared timebase, but we *do* need monotonicity --- using `SystemTime`
leaves us vulnerable to clock skew, if (for example), an NTP clock skew
adjustment causes the system clock to run backwards far enough that a
poll appears to end "before" it started (as in issue #286). If we were
using monotonic `Instant`s, all polls should always have positive
durations, but with `SystemTime`s, this isn't necessarily the case.

Furthermore, `Instant::now()` may have less performance overhead than
`SystemTime::now()`, at least on some platforms.

## Solution

This branch changes `console-subscriber` to always take timestamps using
`Instant::now()` rather than using `SystemTime::now()`, and store all
timestamps as `Instant`s. In order to convert these `Instant`s into
`SystemTime`s that can be sent over the wire, we construct a reference
`TimeAnchor`, consisting of a paired `Instant` and `SystemTime` recorded
at the same time when the `ConsoleLayer` is constructed. We can then
construct "system times" that are monotonic, by calculating the duration
between a given `Instant` and the anchor `Instant`, and adding that
duration to the anchor `SystemTime`. These are not *real* system
timestamps, as they will never run backwards if the system clock is
adjusted; they are relative only to the base process start time as
recorded by the anchor. However, they *are* monotonic, and all durations
calculated from them will be reasonable.

This is part of the change I proposed in #254. I'm not going to close
that issue yet, though, as it also described potentially switching to
use the `quanta` crate rather than `std::time::Instant` to reduce the
overhead of recording monotonic timestamps.

Fixes #286
hawkw added a commit that referenced this issue Feb 18, 2022
## 0.1.3  (2022-02-18)

#### Bug Fixes

*  record timestamps for updates last (#289) ([703f1aa](703f1aa),
   closes [#266](266))
*  use monotonic `Instant`s for all timestamps (#288)
   ([abc0830](abc0830), closes [#286](286))
*  bail rather than panic when encountering clock skew (#287)
   ([24db8c6](24db8c6), closes [#286](286))
*  fix compilation on targets without 64-bit atomics (#282)
   ([5590fdb](5590fdb), closes [#279](279))
hawkw added a commit that referenced this issue Feb 18, 2022
#### Features

*  add `Builder::filter_env_var` builder parameter (#276)
   ([dbdb149](dbdb149), closes [#206](206))

#### Bug Fixes

*  record timestamps for updates last (#289) ([703f1aa](703f1aa),
   closes [#266](266))
*  use monotonic `Instant`s for all timestamps (#288)
   ([abc0830](abc0830), closes [#286](286))
*  bail rather than panic when encountering clock skew (#287)
   ([24db8c6](24db8c6), closes [#286](286))
*  fix compilation on targets without 64-bit atomics (#282)
   ([5590fdb](5590fdb), closes [#279](279))
hawkw added a commit that referenced this issue Feb 18, 2022
#### Features

*  add `Builder::filter_env_var` builder parameter (#276)
   ([dbdb149](dbdb149), closes [#206](206))

#### Bug Fixes

*  record timestamps for updates last (#289) ([703f1aa](703f1aa),
   closes [#266](266))
*  use monotonic `Instant`s for all timestamps (#288)
   ([abc0830](abc0830), closes [#286](286))
*  bail rather than panic when encountering clock skew (#287)
   ([24db8c6](24db8c6), closes [#286](286))
*  fix compilation on targets without 64-bit atomics (#282)
   ([5590fdb](5590fdb), closes [#279](279))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-subscriber Crate: console-subscriber. S-bug Severity: bug
Projects
None yet
2 participants