Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error handling: error sources / kinds #9

Closed
dhardy opened this issue Oct 18, 2017 · 15 comments
Closed

Error handling: error sources / kinds #9

dhardy opened this issue Oct 18, 2017 · 15 comments

Comments

@dhardy
Copy link
Owner

dhardy commented Oct 18, 2017

Possible errors (detectable during initialisation, i.e. only applicable to new() methods):

  • Failure to open random device (e.g. /dev/urandom)
  • Insufficient entropy (system not ready)

Possible errors (at any time, thus affecting new(), next_* and try_fill()):

  • Interrupt by user (Ctrl+C)
  • Hardware/external generator failure or disconnection
  • PRNG cycle
  • "Entropy exhaustion": PRNGs with artificially limited output (hypothetical and not needed even for periodic reseeding; suggested here)

Proposed error kinds:

/// Error kind which can be matched over.
#[derive(PartialEq, Eq, Debug, Copy, Clone)]
pub enum ErrorKind {
    /// Permanent failure: likely not recoverable without user action.
    Unavailable,
    /// Temporary failure: recommended to retry a few times, but may also be
    /// irrecoverable.
    Transient,
    /// Not ready yet: recommended to try again a little later.
    NotReady,
    /// Uncategorised error
    Other,
    // no hidden members: allow exclusive match
}

References:

@dhardy dhardy added this to the rand-core RFC milestone Oct 18, 2017
@dhardy
Copy link
Owner Author

dhardy commented Oct 18, 2017

Advised handling of errors (mostly copied from material linked above)

Problems opening /dev/urandom
  • Sometimes an alternative to OsRng can be used. For example, if you only need a weak RNG, initialize one using the system time.
  • For libraries: bubble up the error. For applications it means informing the user, possibly with a panic.
The system does not have enough entropy

Currently Rand sometimes uses blocking, and sometimes not-so-random numbers. I would suggest to always give an error, and if this is not supported, to block. Using the syscall interface, and if not available the process described above.

Interrupt by a signal handler

The common advise on how to handle it is to just retry.

Rand's fill_bytes interface can read bytes from OsRng to a memory buffer. On my system it can fill 1 GiB in 3 seconds. It seems to me that only from amounts like this it makes sense to pass on the interrupt.

The scenario of reading from the OS RNG into a huge gigabyte+ vector in memory seems a bit far-fetched to me. If you do things like that, expect not everything to work smoothly. So just retrying on EINTR is best for fill_bytes.

PRNG cycles

It is generally advised to use a PRNG whose cycle is at least as long as the square of the required output length; if this advice is followed (and short period PRNGs are never used for cryptography) then cycle problems are extremely unlikely.

Concerns about ChaCha are mentioned due to the relatively limited size of output according to some implementations (potentially just 256 GiB); however the existing rand implementation increments high-order parts of the counter (the "nonce") as necessary, giving it an extremely long cycle (2^134 bytes).

Also mentioned in the discussion above is that for many PRNGs, correct cycle detection is difficult.

This combined, my preference is not to even try to detect cycles, thus never reporting.

Entropy exhaustion

This idea is derived from the possibility that /dev/urandom might give "low entropy" output shortly after start-up, which compromises the security of any PRNG seeded from it. Periodic reseeding may help with this, but is not a good solution; where possible OsRng should block or error if the kernel still has insufficient entropy.

There is a second possible point to periodic reseeding: if a weakness is found in a PRNG algorithm used or a side-band attack is used to derive the PRNG state, reseeding limits the amount of "compromised output", forcing an attacker to do more work to regain information on the PRNG state.

What we should do: since entropy cannot be accurately tracked, there is little point in returning an error. On the other hand, it may sometimes make sense to wrap a PRNG with Reseeding to periodically reset it. Thus this does not affect error handling.

@dhardy
Copy link
Owner Author

dhardy commented Oct 19, 2017

Generalising from the above, we have the following kinds of error:

  • irrecoverable (without user action): missing /dev/urandom, hardware generator failed
  • temporary/unknown failure (possibly recoverable on retry, but not recommended to retry indefinitely): lost connection to external generator, RDRAND error
  • wait not ready (retry after a time): OS generator not ready yet, hardware initialisation
  • interrupt by user (retry if no specific handler)

Not included in the above list:

  • other/unknown (do we need this?)
  • entropy exhausted / reseed — I maintain that it is difficult for a PRNG to know when it needs reseeding, so it is better for a wrapper simply to reseed periodically if this functionality is desired

@pitdicker
Copy link

One problem I now have with an Interrupt kind that can be returned for fill_bytes is that it would lose the bytes already generated. Unless it also returns a count of the amount of bytes before the interrupt.

And as discussed before, it just doesn't seem that useful. It is only relevant in situations that match all these points:

  • the application has signal handler,
  • it spends a long time (a second or more) waiting on fill_bytes,
  • the RNG does something 'exotic', like reading from a slow device, slow hard drive or network,
  • the user code can handle an interrupt error.

So normally it makes sense to just retry on interrupt.

What if an RNG really has a reason the forward an interrupt? If we were to include a cause, maybe it is just enough to return Other and let users that case inspect the cause.

@pitdicker
Copy link

  • other/unknown (do we need this?)

An Other kind seems moderately useful to me. It is a simple catch-all for when you wrap some RNG and don't know (yet) every error it can return. But maybe NotAvailable can be used just ass well as a catch-all.

  • entropy exhausted / reseed — I maintain that it is difficult for a PRNG to know when it needs reseeding

Completely agree. For me it only makes sense for something that is not pseudo-random, but 'really' random. And that normally seems to be only deep in the OS implementation of its RNG, not available to us.

Maybe some RNG that exposes /dev/random on Linux, that estimates its entropy.

With all the disagreement about reseeding and entropy we could just expose the error kind and not think to hard about it. Or combine it with 'Insufficient entropy', which is pretty close...

@dhardy
Copy link
Owner Author

dhardy commented Oct 19, 2017

Okay, good points about Interrupt and Other.

But if /dev/random needs reseeding, it's not the user who's responsible for doing so, so it should just block or fail with "wait and retry".

I don't like the name 'insufficient entropy' because it's not clear what the user should do about it. 'Wait' is better IMO (and other error kinds where different handling is expected).

@dhardy dhardy changed the title Error handling: error sources Error handling: error sources / kinds Oct 19, 2017
@pitdicker
Copy link

But if /dev/random needs reseeding, it's not the user who's responsible for doing so, so it should just block or fail with "wait and retry".

It seems to be possible to sort-of reseed /dev/random by writing to it, assuming you somehow have access to a source of randomness :-). I completely agree it is basically "wait and retry".

I don't like the name 'insufficient entropy' because it's not clear what the user should do about it. 'Wait' is better IMO

This is true for OsRng, but what about other generators? For example ReadRng that reads random numbers from a 'code book' and hits EOF. Okay, far fetched. But I like the error to be a bit general so it is useful for more types of generators.

Also "wait" feels more like a prescription than an error to me.

@dhardy
Copy link
Owner Author

dhardy commented Oct 19, 2017

Yes, you can write to /dev/random, but (1) do you actually have an entropy source the OS doesn't, and (2) does the OS increment its "entropy estimate" for user-writes? I suspect for (2) the answer is no, because it might allow some kind of entropy-depletion-and-falsified-new-entropy attack.

ReadRng is a very good point — but since the handling should be different in this case (i.e. the generator will not recover on its own), Wait is not an appropriate error kind. Irrecoverable would fit better. Using InsufficientEntropy as you proposed implies confusion over how users should handle that.

@pitdicker
Copy link

Writing to /dev/random was more of a joke.

InsufficientEntropy sounds nicely random related. But I see you are going for errors that guide users of the library, and I can see the use in that.

Still I don't like the name Wait. What do you think about NotReady?

@dhardy
Copy link
Owner Author

dhardy commented Oct 19, 2017

As far as the names are concerned, I don't like Failure either. Maybe TemporaryFailure or just Temporary or Retry. Then Irrecoverable is okay-ish. They were just to get the idea across.

@pitdicker
Copy link

pitdicker commented Oct 19, 2017

While writing the PR I did have a look at the names in std::io::ErrorKind, and the names used in other libraries. It seemed like a good idea to follow them, and they mostly describe the cause of the error, not how it should be handled.

I have no problem with Failure. But it is a bit unspecific, and that would make it a bit confusing when there also is Other. The wordy TemporaryFailure would then be better. Retry sounds helpful, but not really like an error to me.

Instead of Irrecoverable I like Unavailable. It describes nicely the OsRng or devices being unavailable. It also works with EOF in ReadRng: there are no more numbers available...

Unavailable still carries the meaning "just give up, irrecoverable". TemporaryFailure still carries the idea of "retry".

@dhardy
Copy link
Owner Author

dhardy commented Oct 19, 2017

Or TransientFailure... the problem is that this is basically an "unknown"/miscellaneous category, so it's arguable whether it should be distinct from Other.

Unavailable is fine IMO.

@pitdicker
Copy link

TransientFailure was exactly the word I was thinking about after using a thesaurus. For me as a non-native speaker it sounds a bit exotic, but it is fine.

the problem is that this is basically an "unknown"/miscellaneous category, so it's arguable whether it should be distinct from Other.

It costs us nothing to include it, and I can see it be useful sometimes.

@pitdicker
Copy link

// TODO: allow exclusive match?

It seems like a smart thing to do. We now think we have all error kinds pretty much covered. But it is common, and good advice to prevent breaking changes.

@dhardy
Copy link
Owner Author

dhardy commented Nov 5, 2017

I decided not to allow exclusive match for now. I don't see much benefit, and it can be done in the future without breakage anyway.

@dhardy
Copy link
Owner Author

dhardy commented Mar 11, 2018

This has been implemented and revised so I think this issue can be closed now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants