Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add log feature #246

Merged
merged 2 commits into from
Jan 31, 2018
Merged

Add log feature #246

merged 2 commits into from
Jan 31, 2018

Conversation

dhardy
Copy link
Member

@dhardy dhardy commented Jan 26, 2018

Yet another optional feature (effectively 2×2×2×3 = 24 combinations, since nightly = i128_support and alloc only does anything without std).

The main motivation is to allow reporting of errors which are handled internally (via retry or fallback), and also to explain why OsRng delays (when NotReady).

This has the disadvantage of complicating the feature matrix and documentation, but probably on the whole is worth adding?

src/lib.rs Outdated
@@ -1080,6 +1091,14 @@ pub fn sample<T, I, R>(rng: &mut R, iterable: I, amount: usize) -> Vec<T>
.unwrap_or_else(|e| e)
}

#[cfg(feature="std")]
fn log_cause(mut e: &::std::error::Error) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know enough about the error types to be able to format them using a Debug impl including the chain of causes instead of multiple separate logging statements?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug is for dumping, Display for pretty-printing, as I understand. This code attempts to do the latter, which requires multiple lines.

And yes, mostly we're using the Error type from the rand::error module.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah gotcha. It might still be worth combining the multiple log statements into 1, otherwise you might end up with your chain of causes interleaved with other log messages.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm struggling to find a good middle-ground. Compare:

ERROR 2018-01-30T12:57:19Z: rand: JitterRng failed: RNG error [permanent failure or unavailable]: timer jitter failed basic quality tests
 WARN 2018-01-30T12:57:19Z: rand: cause: no timer available

and

ERROR 2018-01-30T12:59:17Z: rand: JitterRng failed: RNG error [permanent failure or unavailable]: timer jitter failed basic quality tests; full error: Error { kind: Unavailable, msg: "timer jitter failed basic quality tests", cause: Some(NoTimer) }

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I prefer the second one even though it doesn't look as nice in that example because the log message holds all the details. In the same samples with other log messages interleaved the second is easier to follow:

ERROR 2018-01-30T12:57:19Z: rand: JitterRng failed: RNG error [permanent failure or unavailable]: timer jitter failed basic quality tests
 INFO 2018-01-30T12:57:19Z: some_crate::some_mod: request to `/`
DEBUG 2018-01-30T12:57:19Z: some_crate::some_mod::some_other_mod: getting a connection to the database
DEBUG 2018-01-30T12:57:19Z: some_crate::some_mod::some_other_mod: got a connection to the database
TRACE 2018-01-30T12:57:19Z: rand: Seeding new RNG
 WARN 2018-01-30T12:57:19Z: rand: cause: no timer available
 INFO 2018-01-30T12:57:19Z: some_crate::some_mod: processing request `/` with handler `Root`
ERROR 2018-01-30T12:57:19Z: rand: JitterRng failed: RNG error [permanent failure or unavailable]: timer jitter failed basic quality tests; full error: Error { kind: Unavailable, msg: "timer jitter failed basic quality tests", cause: Some(NoTimer) }
 INFO 2018-01-30T12:57:19Z: some_crate::some_mod: request to `/`
DEBUG 2018-01-30T12:57:19Z: some_crate::some_mod::some_other_mod: getting a connection to the database
DEBUG 2018-01-30T12:57:19Z: some_crate::some_mod::some_other_mod: got a connection to the database
TRACE 2018-01-30T12:57:19Z: rand: Seeding new RNG
 INFO 2018-01-30T12:57:19Z: some_crate::some_mod: processing request `/` with handler `Root`

@dhardy dhardy mentioned this pull request Jan 30, 2018
@dhardy
Copy link
Member Author

dhardy commented Jan 30, 2018

I added some logging. Example using rand-num:

12:34 dhardy@localhost:~/rust/rand-num$ export RUST_LOG=rand=debug
12:34 dhardy@localhost:~/rust/rand-num$ cargo run
   Compiling rand v0.4.2 (file:///home/dhardy/rust/rand)
   Compiling rand-num v0.1.0 (file:///home/dhardy/rust/rand-num)
    Finished dev [unoptimized + debuginfo] target(s) in 1.94 secs
     Running `target/debug/rand-num`
DEBUG 2018-01-30T12:34:49Z: rand::os::imp: OsRng: testing getrandom
 INFO 2018-01-30T12:34:49Z: rand::os::imp: OsRng: using getrandom
 WARN 2018-01-30T12:34:49Z: rand: OsRng failed: RNG error [permanent failure or unavailable]: OsRng is disabled; falling back to JitterRng
DEBUG 2018-01-30T12:34:49Z: rand::jitter: JitterRng: testing timer ...
 INFO 2018-01-30T12:34:49Z: rand::jitter: JitterRng: using 8 rounds per u64 output
Random u32: 0x8a28821 = 144869409
Random u64: 0x6f22be488176a794 = 8008172306037122964
12:34 dhardy@localhost:~/rust/rand-num$ cargo run
   Compiling rand v0.4.2 (file:///home/dhardy/rust/rand)
   Compiling rand-num v0.1.0 (file:///home/dhardy/rust/rand-num)
    Finished dev [unoptimized + debuginfo] target(s) in 2.1 secs
     Running `target/debug/rand-num`
DEBUG 2018-01-30T12:35:21Z: rand::os::imp: OsRng: testing getrandom
 INFO 2018-01-30T12:35:21Z: rand::os::imp: OsRng: using getrandom
Random u32: 0xa39a1e3a = 2744786490
Random u64: 0x4002bbe423b24c84 = 4612455556906634372
12:35 dhardy@localhost:~/rust/rand-num$ cargo run
   Compiling rand v0.4.2 (file:///home/dhardy/rust/rand)
   Compiling rand-num v0.1.0 (file:///home/dhardy/rust/rand-num)
    Finished dev [unoptimized + debuginfo] target(s) in 1.95 secs
     Running `target/debug/rand-num`
DEBUG 2018-01-30T12:37:11Z: rand::os::imp: OsRng: testing getrandom
 INFO 2018-01-30T12:37:11Z: rand::os::imp: OsRng: using /dev/urandom
DEBUG 2018-01-30T12:37:11Z: rand::os::imp: OsRng: opening random device /dev/urandom
Random u32: 0x95cbb374 = 2513154932
Random u64: 0x416968a431c3648a = 4713413539438748810
12:37 dhardy@localhost:~/rust/rand-num$ export RUST_LOG=rand=trace
12:38 dhardy@localhost:~/rust/rand-num$ cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs
     Running `target/debug/rand-num`
TRACE 2018-01-30T12:39:01Z: rand: Seeding new RNG
DEBUG 2018-01-30T12:39:01Z: rand::os::imp: OsRng: testing getrandom
 INFO 2018-01-30T12:39:01Z: rand::os::imp: OsRng: using /dev/urandom
DEBUG 2018-01-30T12:39:01Z: rand::os::imp: OsRng: opening random device /dev/urandom
TRACE 2018-01-30T12:39:01Z: rand::os: OsRng: reading 2048 bytes from random device
Random u32: 0xc534d5b7 = 3308574135
Random u64: 0x4ac22b76433314d2 = 5386915891222090962
12:39 dhardy@localhost:~/rust/rand-num$ cargo run
   Compiling rand v0.4.2 (file:///home/dhardy/rust/rand)
   Compiling rand-num v0.1.0 (file:///home/dhardy/rust/rand-num)
    Finished dev [unoptimized + debuginfo] target(s) in 1.95 secs
     Running `target/debug/rand-num`
TRACE 2018-01-30T12:39:12Z: rand: Seeding new RNG
DEBUG 2018-01-30T12:39:12Z: rand::os::imp: OsRng: testing getrandom
 INFO 2018-01-30T12:39:12Z: rand::os::imp: OsRng: using getrandom
TRACE 2018-01-30T12:39:12Z: rand::os::imp: OsRng: reading 2048 bytes via getrandom
Random u32: 0xc556af49 = 3310792521
Random u64: 0xf98414a3158e1654 = 17979518303043720788

Note that this will change a bit when I merge #239.

src/jitter.rs Outdated
@@ -422,6 +423,8 @@ impl JitterRng {
}

fn gen_entropy(&mut self) -> u64 {
trace!("JitterRng: generating entropy");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gets called a lot when seeding StdRng — 256 times. @pitdicker and I already noted that initialising ISAAC via JitterRng can be slow; should we fix? Not sure whether it's better to adjust the seed_from_rng implementation or to try to make a special case for ISAAC seeded from JitterRng.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

collecting entropy sounds better to me than generating entropy.

Copy link
Contributor

@pitdicker pitdicker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a few comments, but I think this looks very good!
I think you found a good balance for the different log levels, and which things to log.

Does it make sense to use Warning as the highest log level, and let the application decide if it is important enough to log something as an error?

@@ -845,9 +854,13 @@ impl<R: SeedableRng> NewRng for R {
T::from_rng(&mut r)
}

trace!("Seeding new RNG");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to include the type of the RNG here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know a way of doing that. I believe only macros could do so; something similar to stringify but that one will not substitute R for the underlying type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

O, sorry. I kind of expected it to be possible, but it is only possible using intrinsics on nightly. Not worth the effort if it is not easy in my opinion.

src/os.rs Outdated
if log_err == 0 {
warn!("OsRng failed: {}", e);
log_cause(&e);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it help to rewrite this function to match over all possible errors? It is becoming a little complex now...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of the complexity comes from having limited retry and logging in "smart" ways, i.e. I don't think significantly simpler code can achieve the same behaviour. Whether all this complexity is warranted is another question (e.g. we could remove log_err and do logging each time, but that could create a huge number of messages).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of this try?

    fn fill_bytes(&mut self, dest: &mut [u8]) {
        // We cannot return Err(..), so we try to handle before panicking.
        const MAX_RETRY_PERIOD: u32 = 10; // max 10s
        const WAIT_DUR_MS: u32 = 100; // retry every 100ms
        const RETRY_LIMIT: u32 = (MAX_RETRY_PERIOD * 1000) / WAIT_DUR_MS;
        const TRANSIENT_STEP: u32 = RETRY_LIMIT / 8;
        let mut err_count = 0;
        let mut error_logged = false;

        loop {
            if let Err(e) = self.try_fill_bytes(dest) {
                if err_count >= RETRY_LIMIT {
                    error!("OsRng failed too many times, last error: {}", e);
                    panic!("OsRng failed too many times, last error: {}",
                           e.msg());
                }

                match e.kind() {
                    ErrorKind::Transient => {
                        if !error_logged {
                            warn!("OsRng failed: {}. retrying {} times ...",
                                  e, TRANSIENT_STEP);
                            error_logged = true;
                        }
                        err_count += TRANSIENT_STEP;
                        continue;
                    }
                    ErrorKind::NotReady => {
                        if !error_logged {
                            warn!("OsRng failed: {}. waiting and retrying for {} ms ...",
                                  e, MAX_RETRY_PERIOD);
                            error_logged = true;
                        }
                        err_count += 1;
                        let dur = ::std::time::Duration::from_millis(WAIT_DUR_MS as u64);
                        ::std::thread::sleep(dur);
                        continue;
                    }
                    _ => {
                        error!("OsRng failed: {}", e);
                        panic!("OsRng fatal error: {}", e.msg());
                    }
                }
            }

            break;
        }
    }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I played some more with it to log only the error cause, and added a cause for a few operating systems where it was missing. Take from it what you can use. https://github.com/pitdicker/rand/commits/log-feature

src/lib.rs Outdated
warn!("OsRng failed: {}; falling back to JitterRng", e1);
log_cause(&e1);
new_jitter().map_err(|e2| {
error!("JitterRng failed: {}", e2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to make this warn! also, and let the user code decide if it is important enough to also log as an error!.

src/jitter.rs Outdated
@@ -422,6 +423,8 @@ impl JitterRng {
}

fn gen_entropy(&mut self) -> u64 {
trace!("JitterRng: generating entropy");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

collecting entropy sounds better to me than generating entropy.

src/reseeding.rs Outdated
@@ -47,6 +47,7 @@ impl<R: Rng, Rsdr: Reseeder<R>> ReseedingRng<R, Rsdr> {
/// generated exceed the threshold.
pub fn reseed_if_necessary(&mut self) {
if self.bytes_generated >= self.generation_threshold {
trace!("Reseeding RNG");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe change this to something like "{} generated {} MiB, attempting to reseed"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could change it to "Reseeding RNG after {} bytes", but I don't think I can do much better than that. Not sure it's really useful changing units?

@pitdicker
Copy link
Contributor

If I change NewRng::new() to only log the cause, this is the result (using your rand-num and by butchering OsRng and JitterRng to return an error.:

[pitdicker@localhost rand-num]$ export RUST_LOG=rand=debug
[pitdicker@localhost rand-num]$ cargo run
   Compiling rand v0.4.2 (file:///home/pitdicker/rust/rand)
   Compiling rand-num v0.1.0 (file:///home/pitdicker/rust/rand-num)
    Finished dev [unoptimized + debuginfo] target(s) in 2.51 secs
     Running `target/debug/rand-num`
DEBUG 2018-01-31T09:10:33Z: rand::os::imp: OsRng: testing getrandom
 INFO 2018-01-31T09:10:33Z: rand::os::imp: OsRng: using getrandom
 WARN 2018-01-31T09:10:33Z: rand: OsRng failed: Operation not permitted (os error 1); falling back to JitterRng
DEBUG 2018-01-31T09:10:33Z: rand::jitter: JitterRng: testing timer ...
 INFO 2018-01-31T09:10:33Z: rand::jitter: JitterRng: using 8 rounds per u64 output
 WARN 2018-01-31T09:10:33Z: rand: JitterRng failed: no timer available
thread 'main' panicked at 'could not initialize thread_rng: Error { kind: Unavailable, msg: "seeding a new RNG failed: both OS and Jitter entropy sources failed", cause: Some(Error { kind: Unavailable, msg: "random_get() system call failed", cause: Some(Error { repr: Os { code: 1, message: "Operation not permitted" } }) }) }', /home/pitdicker/rust/rand/src/lib.rs:978:22
note: Run with `RUST_BACKTRACE=1` for a backtrace.

Code:

        trace!("Seeding new RNG");
        new_os().or_else(|e1| {
            warn!("OsRng failed: {}; falling back to JitterRng", e1.cause().unwrap());
            new_jitter().map_err(|e2| {
                warn!("JitterRng failed: {}", e2.cause().unwrap());
                // TODO: can we somehow return both error sources?
                Error::with_cause(
                    ErrorKind::Unavailable,
                    "seeding a new RNG failed: both OS and Jitter entropy sources failed",
                    e1)
            })
        })

(Unwrapping is safe because we know OsRng and JitterRng both always set a cause.)

@dhardy
Copy link
Member Author

dhardy commented Jan 31, 2018

Safe with current code, you mean, but unwrap there adds a trap future devs need to watch out for.

I think we should probably merge this and EntropyRng before worrying too much about the exact messages.

@pitdicker
Copy link
Contributor

Safe with current code, you mean, but unwrap there adds a trap future devs need to watch out for.

Good point. Probably should be a function that falls back to .description() or msg.

I think we should probably merge this and EntropyRng before worrying too much about the exact messages.

😄 Yes, that replaces this part anyway.

@dhardy
Copy link
Member Author

dhardy commented Jan 31, 2018

There's still room for tweaking messages, but response seems mostly positive, so I'll address the biggest concerns then merge; PRs to improve the details are welcome.

@dhardy dhardy merged commit 7d72ba9 into rust-random:master Jan 31, 2018
@dhardy dhardy deleted the log-feature branch January 31, 2018 18:42
pitdicker pushed a commit that referenced this pull request Apr 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants