Error handling: error sources / kinds #9

dhardy · 2017-10-18T16:58:54Z

Possible errors (detectable during initialisation, i.e. only applicable to new() methods):

Failure to open random device (e.g. /dev/urandom)
Insufficient entropy (system not ready)

Possible errors (at any time, thus affecting new(), next_* and try_fill()):

Interrupt by user (Ctrl+C)
Hardware/external generator failure or disconnection
PRNG cycle
"Entropy exhaustion": PRNGs with artificially limited output (hypothetical and not needed even for periodic reseeding; suggested here)

Proposed error kinds:

/// Error kind which can be matched over.
#[derive(PartialEq, Eq, Debug, Copy, Clone)]
pub enum ErrorKind {
    /// Permanent failure: likely not recoverable without user action.
    Unavailable,
    /// Temporary failure: recommended to retry a few times, but may also be
    /// irrecoverable.
    Transient,
    /// Not ready yet: recommended to try again a little later.
    NotReady,
    /// Uncategorised error
    Other,
    // no hidden members: allow exclusive match
}

References:

Summary by @pitdicker — touches on most of these sources
Should PRNGs detect and report cycle errors? See discussion starting here.

The text was updated successfully, but these errors were encountered:

dhardy · 2017-10-18T17:25:12Z

Advised handling of errors (mostly copied from material linked above)

Problems opening `/dev/urandom`

Sometimes an alternative to OsRng can be used. For example, if you only need a weak RNG, initialize one using the system time.
For libraries: bubble up the error. For applications it means informing the user, possibly with a panic.

The system does not have enough entropy

Currently Rand sometimes uses blocking, and sometimes not-so-random numbers. I would suggest to always give an error, and if this is not supported, to block. Using the syscall interface, and if not available the process described above.

Interrupt by a signal handler

The common advise on how to handle it is to just retry.

Rand's fill_bytes interface can read bytes from OsRng to a memory buffer. On my system it can fill 1 GiB in 3 seconds. It seems to me that only from amounts like this it makes sense to pass on the interrupt.

The scenario of reading from the OS RNG into a huge gigabyte+ vector in memory seems a bit far-fetched to me. If you do things like that, expect not everything to work smoothly. So just retrying on EINTR is best for fill_bytes.

PRNG cycles

It is generally advised to use a PRNG whose cycle is at least as long as the square of the required output length; if this advice is followed (and short period PRNGs are never used for cryptography) then cycle problems are extremely unlikely.

Concerns about ChaCha are mentioned due to the relatively limited size of output according to some implementations (potentially just 256 GiB); however the existing rand implementation increments high-order parts of the counter (the "nonce") as necessary, giving it an extremely long cycle (2^134 bytes).

Also mentioned in the discussion above is that for many PRNGs, correct cycle detection is difficult.

This combined, my preference is not to even try to detect cycles, thus never reporting.

Entropy exhaustion

This idea is derived from the possibility that /dev/urandom might give "low entropy" output shortly after start-up, which compromises the security of any PRNG seeded from it. Periodic reseeding may help with this, but is not a good solution; where possible OsRng should block or error if the kernel still has insufficient entropy.

There is a second possible point to periodic reseeding: if a weakness is found in a PRNG algorithm used or a side-band attack is used to derive the PRNG state, reseeding limits the amount of "compromised output", forcing an attacker to do more work to regain information on the PRNG state.

What we should do: since entropy cannot be accurately tracked, there is little point in returning an error. On the other hand, it may sometimes make sense to wrap a PRNG with Reseeding to periodically reset it. Thus this does not affect error handling.

dhardy · 2017-10-19T09:05:06Z

Generalising from the above, we have the following kinds of error:

irrecoverable (without user action): missing /dev/urandom, hardware generator failed
temporary/unknown failure (possibly recoverable on retry, but not recommended to retry indefinitely): lost connection to external generator, RDRAND error
~~wait~~ not ready (retry after a time): OS generator not ready yet, hardware initialisation
~~interrupt by user (retry if no specific handler)~~

Not included in the above list:

other/unknown (do we need this?)
entropy exhausted / reseed — I maintain that it is difficult for a PRNG to know when it needs reseeding, so it is better for a wrapper simply to reseed periodically if this functionality is desired

pitdicker · 2017-10-19T09:53:23Z

One problem I now have with an Interrupt kind that can be returned for fill_bytes is that it would lose the bytes already generated. Unless it also returns a count of the amount of bytes before the interrupt.

And as discussed before, it just doesn't seem that useful. It is only relevant in situations that match all these points:

the application has signal handler,
it spends a long time (a second or more) waiting on fill_bytes,
the RNG does something 'exotic', like reading from a slow device, slow hard drive or network,
the user code can handle an interrupt error.

So normally it makes sense to just retry on interrupt.

What if an RNG really has a reason the forward an interrupt? If we were to include a cause, maybe it is just enough to return Other and let users that case inspect the cause.

pitdicker · 2017-10-19T10:17:21Z

other/unknown (do we need this?)

An Other kind seems moderately useful to me. It is a simple catch-all for when you wrap some RNG and don't know (yet) every error it can return. But maybe NotAvailable can be used just ass well as a catch-all.

entropy exhausted / reseed — I maintain that it is difficult for a PRNG to know when it needs reseeding

Completely agree. For me it only makes sense for something that is not pseudo-random, but 'really' random. And that normally seems to be only deep in the OS implementation of its RNG, not available to us.

Maybe some RNG that exposes /dev/random on Linux, that estimates its entropy.

With all the disagreement about reseeding and entropy we could just expose the error kind and not think to hard about it. Or combine it with 'Insufficient entropy', which is pretty close...

dhardy · 2017-10-19T10:25:25Z

Okay, good points about Interrupt and Other.

But if /dev/random needs reseeding, it's not the user who's responsible for doing so, so it should just block or fail with "wait and retry".

I don't like the name 'insufficient entropy' because it's not clear what the user should do about it. 'Wait' is better IMO (and other error kinds where different handling is expected).

pitdicker · 2017-10-19T10:39:40Z

But if /dev/random needs reseeding, it's not the user who's responsible for doing so, so it should just block or fail with "wait and retry".

It seems to be possible to sort-of reseed /dev/random by writing to it, assuming you somehow have access to a source of randomness :-). I completely agree it is basically "wait and retry".

I don't like the name 'insufficient entropy' because it's not clear what the user should do about it. 'Wait' is better IMO

This is true for OsRng, but what about other generators? For example ReadRng that reads random numbers from a 'code book' and hits EOF. Okay, far fetched. But I like the error to be a bit general so it is useful for more types of generators.

Also "wait" feels more like a prescription than an error to me.

dhardy · 2017-10-19T10:44:24Z

Yes, you can write to /dev/random, but (1) do you actually have an entropy source the OS doesn't, and (2) does the OS increment its "entropy estimate" for user-writes? I suspect for (2) the answer is no, because it might allow some kind of entropy-depletion-and-falsified-new-entropy attack.

ReadRng is a very good point — but since the handling should be different in this case (i.e. the generator will not recover on its own), Wait is not an appropriate error kind. Irrecoverable would fit better. Using InsufficientEntropy as you proposed implies confusion over how users should handle that.

pitdicker · 2017-10-19T10:55:47Z

Writing to /dev/random was more of a joke.

InsufficientEntropy sounds nicely random related. But I see you are going for errors that guide users of the library, and I can see the use in that.

Still I don't like the name Wait. What do you think about NotReady?

dhardy · 2017-10-19T11:01:25Z

As far as the names are concerned, I don't like Failure either. Maybe TemporaryFailure or just Temporary or Retry. Then Irrecoverable is okay-ish. They were just to get the idea across.

pitdicker · 2017-10-19T11:33:39Z

While writing the PR I did have a look at the names in std::io::ErrorKind, and the names used in other libraries. It seemed like a good idea to follow them, and they mostly describe the cause of the error, not how it should be handled.

I have no problem with Failure. But it is a bit unspecific, and that would make it a bit confusing when there also is Other. The wordy TemporaryFailure would then be better. Retry sounds helpful, but not really like an error to me.

Instead of Irrecoverable I like Unavailable. It describes nicely the OsRng or devices being unavailable. It also works with EOF in ReadRng: there are no more numbers available...

Unavailable still carries the meaning "just give up, irrecoverable". TemporaryFailure still carries the idea of "retry".

dhardy · 2017-10-19T11:51:36Z

Or TransientFailure... the problem is that this is basically an "unknown"/miscellaneous category, so it's arguable whether it should be distinct from Other.

Unavailable is fine IMO.

pitdicker · 2017-10-19T12:14:33Z

TransientFailure was exactly the word I was thinking about after using a thesaurus. For me as a non-native speaker it sounds a bit exotic, but it is fine.

the problem is that this is basically an "unknown"/miscellaneous category, so it's arguable whether it should be distinct from Other.

It costs us nothing to include it, and I can see it be useful sometimes.

pitdicker · 2017-10-19T14:56:23Z

// TODO: allow exclusive match?

It seems like a smart thing to do. We now think we have all error kinds pretty much covered. But it is common, and good advice to prevent breaking changes.

dhardy · 2017-11-05T09:59:19Z

I decided not to allow exclusive match for now. I don't see much benefit, and it can be done in the future without breakage anyway.

dhardy · 2018-03-11T11:24:28Z

This has been implemented and revised so I think this issue can be closed now.

dhardy added this to the rand-core RFC milestone Oct 18, 2017

dhardy mentioned this issue Oct 19, 2017

Error handling: Error type #10

Closed

dhardy changed the title ~~Error handling: error sources~~ Error handling: error sources / kinds Oct 19, 2017

pitdicker mentioned this issue Oct 19, 2017

Error handling #7

Merged

dhardy added the apparently solved label Oct 22, 2017

This was referenced Oct 30, 2017

Support for embedded and no_std #11

Closed

Implicit error handling #25

Closed

dhardy added the impl'd in RFC label Nov 5, 2017

dhardy mentioned this issue Nov 6, 2017

Expose Both /dev/random & /dev/urandom #33

Closed

dhardy mentioned this issue Feb 17, 2018

ErrorKind: rename Transient → Unexpected and revise uses of Unavailable rust-random/rand#255

Merged

dhardy mentioned this issue Feb 25, 2018

Generating random unique integers (i.e. non-repeating, random permutations of an integer type) rust-random/rand#270

Closed

dhardy closed this as completed Mar 11, 2018

dhardy mentioned this issue Jun 29, 2018

ErrorKind::Input rust-random/rand#535

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error handling: error sources / kinds #9

Error handling: error sources / kinds #9

dhardy commented Oct 18, 2017 •

edited

Loading

dhardy commented Oct 18, 2017 •

edited

Loading

dhardy commented Oct 19, 2017 •

edited

Loading

pitdicker commented Oct 19, 2017

pitdicker commented Oct 19, 2017

dhardy commented Oct 19, 2017 •

edited

Loading

pitdicker commented Oct 19, 2017

dhardy commented Oct 19, 2017

pitdicker commented Oct 19, 2017

dhardy commented Oct 19, 2017

pitdicker commented Oct 19, 2017 •

edited

Loading

dhardy commented Oct 19, 2017

pitdicker commented Oct 19, 2017

pitdicker commented Oct 19, 2017

dhardy commented Nov 5, 2017

dhardy commented Mar 11, 2018

Error handling: error sources / kinds #9

Error handling: error sources / kinds #9

Comments

dhardy commented Oct 18, 2017 • edited Loading

dhardy commented Oct 18, 2017 • edited Loading

Advised handling of errors (mostly copied from material linked above)

Problems opening /dev/urandom

The system does not have enough entropy

Interrupt by a signal handler

PRNG cycles

Entropy exhaustion

dhardy commented Oct 19, 2017 • edited Loading

pitdicker commented Oct 19, 2017

pitdicker commented Oct 19, 2017

dhardy commented Oct 19, 2017 • edited Loading

pitdicker commented Oct 19, 2017

dhardy commented Oct 19, 2017

pitdicker commented Oct 19, 2017

dhardy commented Oct 19, 2017

pitdicker commented Oct 19, 2017 • edited Loading

dhardy commented Oct 19, 2017

pitdicker commented Oct 19, 2017

pitdicker commented Oct 19, 2017

dhardy commented Nov 5, 2017

dhardy commented Mar 11, 2018

dhardy commented Oct 18, 2017 •

edited

Loading

dhardy commented Oct 18, 2017 •

edited

Loading

Problems opening `/dev/urandom`

dhardy commented Oct 19, 2017 •

edited

Loading

dhardy commented Oct 19, 2017 •

edited

Loading

pitdicker commented Oct 19, 2017 •

edited

Loading