Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ErrorKind: rename Transient → Unexpected and revise uses of Unavailable #255

Merged
merged 2 commits into from
Feb 27, 2018

Conversation

dhardy
Copy link
Member

@dhardy dhardy commented Feb 8, 2018

I've not been happy with the Transient error kind for a while, notably because it was never used, and because Unavailable was used for errors which just might be resolvable on retry. This PR keeps Unavailable but replaces most uses with Unexpected.

Further thoughts:

  • we could keep Transient for cases where we legitimately know there is a good chance of success on retry — although we still have no actual use cases yet, and I suspect mostly it would be handled in the same way as Unexpected
  • Other is redundant with Unexpected, right? Remove?
  • Unavailable now has two uses — in JitterRng (see comment in PR) and for WASM (which has an auto-failing OsRng implementation) — so we really can assume it's irrecoverable when handling.

@dhardy
Copy link
Member Author

dhardy commented Feb 15, 2018

@pitdicker I take it life's found some other priority for you by now; no complaints there. It would however be nice to get your thoughts on this if you can find the time.

@vks
Copy link
Collaborator

vks commented Feb 16, 2018

I don't like the name Unexpected, because if a retry might succeed, isn't it an expected failure?

@dhardy
Copy link
Member Author

dhardy commented Feb 16, 2018

No, it's for unexpected failures; e.g. read failing because no file handles are available. There's a chance this will succeed later and its an unexpected failure. I don't understand your point?

@vks
Copy link
Collaborator

vks commented Feb 16, 2018

I was confused by the name and the description and what unexpected means:

Unexpected failure; there is a slim chance of recovery on retry.

If it is is recoverable by retrying, it sounds to me like it is an expected failure, but maybe that is just me.

@dhardy
Copy link
Member Author

dhardy commented Feb 16, 2018

Oh, by "expected failure" you mean a failure whose possibility has been forseen? Well, so surely that applies to just about any possible failure (including out-of-memory), excepting things like SEGFAULT which don't normally happen in Rust code anyway.

So okay, we're talking about foreseen failures, but not expected ones (statistically, these are all highly unlikely).

@pitdicker
Copy link
Contributor

@pitdicker I take it life's found some other priority for you by now; no complaints there. It would however be nice to get your thoughts on this if you can find the time.

A little, sorry. But balance is the trick 😄.

My first thought was: haven't we already been over all the options? But I like the change here.

@dhardy
Copy link
Member Author

dhardy commented Feb 17, 2018

Yes, we've been over the options, but somehow practice didn't work out as planned (especially since you mentioned wanting to retry on some Unavailable errors).

@pitdicker
Copy link
Contributor

Other is redundant with Unexpected, right? Remove?

Seems like a good idea to me. I can't think of any meaningful difference between them.

And using both Unavailable and Unexpected seems definitely better for OsRng.

So 👍 from me.

@dhardy
Copy link
Member Author

dhardy commented Feb 22, 2018

Rebased to merge with latest master.

@dhardy
Copy link
Member Author

dhardy commented Feb 22, 2018

@vks @pitdicker I made another significant change.

  • UnexpectedFailure, because we don't know anything about the probability
  • Added Transient back, because we do have a use
  • Added Custom
  • Reworked handling, especially in ReseedingRng

@pitdicker
Copy link
Contributor

To be honest I don't immediately like almost any of the new changes. Can you explain your reasoning behind it a bit more?

@vks
Copy link
Collaborator

vks commented Feb 22, 2018

From the description, the difference between Failure ("retry later"), Transient ("transient") and NotReady ("needs more time") is not clear. All three basically suggest retrying later? Looking at how they are handled in the code, maybe they should be called ShouldRetry and ShouldWait? This seems more precise.

The use case for Custom is not clear to me.

@dhardy
Copy link
Member Author

dhardy commented Feb 22, 2018

Thanks @pitdicker, review is definitely needed.

@vks didn't like the name Unexpected because often this is used for known unlikely errors; I reasoned that these are unlikely — but in general we don't know that. Still, I don't think Unexpected is too bad a name.

Transient I found useful for ReseedingRng, although the necessity of a separate error kind (and the utility of ReseedingRng) are questionable. It doesn't cost much though.

Custom was added to allow users to include other codes, but I don't actually have a use for it. We could remove it. It shouldn't increase the size of Error on 64-bit due to alignment; I did consider using only u16 to make the same true on 32-bit. Maybe I shouldn't be adding it without actual requirement; I don't know.

The changes to handling were simplifications, mostly. ReseedingRng::try_fill_bytes actually had an order dependence between generating via the internal RNG and reseeding due to the possibility of early return, although that may not matter at all due to branch prediction. We don't have an existing benchmark on that and I didn't bother adding one.

@dhardy
Copy link
Member Author

dhardy commented Feb 22, 2018

Actually @vks I put some effort into documenting exactly what all these different kinds are for. The names you suggested sound better than Transient and NotReady though, thanks.

Possible change: TransientShouldRetry, NotReadyShouldWait, remove Custom

@pitdicker
Copy link
Contributor

I will have a careful look, and sorry for the maybe hard words.

To drop two links in here: For dhardy#9 (comment) I looked at the error kinds of other libraries, and "they mostly describe the cause of the error, not how it should be handled". And also we discussed in that issue names like Failure. The design of the error enum started with rust-lang/rfcs#2152 (comment) (nice to see how it evolved).

/// Take the cause, if any. This allows the embedded cause to be extracted.
/// This uses `Option::take`, leaving `self` with no cause.
#[cfg(feature="std")]
pub fn take_cause(&mut self) -> Option<Box<stdError + Send + Sync>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, when would you use this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stdError does not imply Clone and I didn't wish to add that, so this (or unwrap_cause) would be the only way to get the cause out, which I think might sometimes be useful.

I did consider just making cause a public field (with a hidden dummy field to prevent construction) but wasn't so sure about that.

src/error.rs Outdated
@@ -52,24 +73,41 @@ impl ErrorKind {
/// A description of this error kind
pub fn description(self) -> &'static str {
match self {
ErrorKind::Unavailable => "permanent failure",
ErrorKind::Unavailable => "permanently unavailable",
ErrorKind::Failure => "failure",
Copy link
Contributor

@pitdicker pitdicker Feb 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is not any error a 'failure'? I like Unexpected, UnexpectedFailure, Other, Unforeseen more.

src/error.rs Outdated
ErrorKind::Transient => "transient failure",
ErrorKind::NotReady => "not ready yet",
ErrorKind::Other => "uncategorised",
ErrorKind::Custom(_) => "custom error code",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we have the error cause for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method can only return a static string and it's not guaranteed that we have a cause.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is an argument. Reminds me a lot of that experiment I did with keeping the OS errno we decided against, because we don't assume an OS with no_std (https://gist.github.com/pitdicker/2411a4e3aae245c217e4fdd8a0b99917).

Like you suggested and @vks said, I would wait until people start asking for this solution.

src/error.rs Outdated
ErrorKind::__Nonexhaustive => unreachable!(),
}
}
}

impl fmt::Display for ErrorKind {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mostly necessary because of Custom?

/// Error type of random number generators
///
/// This is a relatively simple error type, designed for compatibility with and
/// without the Rust `std` library. It embeds a "kind" code, a message (static
/// string only), and an optional chained cause (`std` only).
/// string only), and an optional chained cause (`std` only). The `kind` and
/// `msg` fields can be accessed directly; cause can be accessed via
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Private fields feel neater, but this is just practical and I can't imagine any disadvantages.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I specifically wanted to replace kind then return the error, and realised it was difficult with the old API. Further, I want to emphasise that Error is a simple type which won't change much (at least the first two fields).

src/error.rs Outdated
#[test]
fn test_kind() {
use core::mem::size_of;
assert!(size_of::<ErrorKind>() <= 2 * size_of::<usize>());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to test 👍. Why 2 * size_of::<usize>()?

Copy link
Member Author

@dhardy dhardy Feb 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because on 32-bit that should be the size (or at least 32 + 3 bits), unless we change the payload to something < 32 bits.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without Custom this can be size_of::<usize>() now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, without Custom this is pointless

src/jitter.rs Outdated
@@ -145,6 +145,8 @@ impl ::std::error::Error for TimerError {

impl From<TimerError> for Error {
fn from(err: TimerError) -> Error {
// Timer check is already quite permissive of failures so we can assume
// any errors reported are irrecoverable.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand what you say with "Timer check is already quite permissive of failures".
Maybe it is safer to say that we use Unavailable because the timer jitter failed basic quality tests and we don't expect the hardware to change much.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timer tests allow for things like non-monotonic (decreasing) behaviour several times; I checked that all the tests have plenty of scope for handling unusual but non-fatal output before throwing an error. If the test was less permissive I would use Failure/Unexpected here.

Does that explain my reasoning sufficiently? Perhaps changes are still required.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you mean we don't really expect false positives. Ok. Would you mind rewording it a little?

/// from this method: `ErrorKind::Transient` and `ErrorKind::NotReady`.
/// in case of error we simply delay reseeding, allowing the generator to
/// continue its output of random data and try reseeding again later;
/// because of this we always return kind `ErrorKind::Transient`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I think keeping Transient is a good idea. So basically we only changed ErrorKind::Other to some other name?

break; // Successfully reseeded, delayed, or given up.
}
// Behaviour is identical to `try_reseed`; we just squelch the error.
let _res = self.try_reseed();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice clean-up, and I think I can get behind the choice to unify the two reseeding methods. But I think doing the retries immediately is something useful to keep. At least if the error is Transient, with RDRAND in mind.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You think? I don't expect RDRAND would report an error often, and we're only delaying until the next call, so I thought it was reasonable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't it wait until self.threshold >> 8, which could be large? But okay, I think this is a reasonable solution. The time to pick between reseeds is not exact science anyway...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the code sets bytes_until_reseed = 0 in case of Transient errors.

if let Err(e) = res1 {
// In the unlikely event the internal PRNG fails, we don't know
// whether this is resolvable; reseed immediately and return
// original error kind.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a nicely found use of the reseeding wrapper. Does it impact benchmarks? Probably not because we benchmark with large slices to fill...

I think this is unnecessary because just about every PRNG (everything that implements SeedableRng) never fails. Personally I would leave it up to the user if he wants to handle it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't benchmark this method at all currently, but yes, I might revert the changes to this method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you think about this now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frankly I don't care much, especially since your "block rng" idea might significantly alter this. But eventually I guess we should investigate the performance of thread_rng more.

I don't care much about changes to this particular method; I can revert if you like but it's probably a small improvement?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the performance is a big deal, but I am not sure adding this functionality is a good idea. Are there situations where reseeding the RNG when try_fill fails is not the right action, or maybe even insecure?

I can't think of any, actually I have a trouble imagining when a CSPRNG can fail...
But I can see some use in this: it allows a wrapped RNG to signal it wants to be reseeded.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some trouble finishing that Block Rng experiment. It is a bit too big for me. Do you want to help / take over, probably after 0.5 is done?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I'll leave this for now then.

Can you create an issue regarding the BlockRng experiment? Sounds like it may not be in time for the 0.5 release but I'll try to take a look.

@vks
Copy link
Collaborator

vks commented Feb 22, 2018

Actually @vks I put some effort into documenting exactly what all these different kinds are for.

Sorry, I did not mean to imply otherwise. It was just not clear to me.

I looked at the error kinds of other libraries, and "they mostly describe the cause of the error, not how it should be handled".

Maybe OsError or FailedSyscall is more descriptive for Failure? Transient and NotReady are probably more descriptive than my suggestions. I think it makes sense to remove Custom without a use case, it can be added later in a backwards compatible way.

@dhardy
Copy link
Member Author

dhardy commented Feb 22, 2018

Failure was for expected-resolvable errors but now it covers most error causes... I like the new structure myself (other than names and Custom).

I looked at the error kinds of other libraries, and "they mostly describe the cause of the error, not how it should be handled".

This is exactly where I think many error libraries go wrong. I'm not against adding a few more categories, but ultimately the only purpose of ErrorKind is to hint at what a handler should do with the error, so it should make this part easy, not confusing. When a user wants to locate the cause, it is the msg and cause fields which help with that, not kind.

That's not to say this approach is right; e.g. io::ErrorKind strikes a compromise in having lots of categories listed but not making them too specific. This is okay, but handling usually comes down to aborting with a message no matter what the kind is. Since RNGs can sometimes overcome errors on retry I want to emphasise that.

@pitdicker
Copy link
Contributor

I'm not against adding a few more categories, but ultimately the only purpose of ErrorKind is to hint at what a handler should do with the error

Yes, you pushed for that in the original issue, and I agree. Doing something meaningful with errors is maybe a pattern that is not common enough in the Rust world...

We already picked the names in that direction, with a name that indicates both the 'reason' or 'category' of the error, and a hint of the right way to handle it. NotReady indicates it should retry over a while (or manually initialize etc.), Transient indicates an immediate retry could help, Unavailable indicates you should just give up, and Other or Unexpected says we have no idea what happened and what you should do about it, although a retry might work.

I just do not want to drop the 'meaning' part and only have an 'instruction' left.

@pitdicker
Copy link
Contributor

From your first comment:

Unavailable now has two uses — in JitterRng (see comment in PR) and for WASM (which has an auto-failing OsRng implementation) — so we really can assume it's irrecoverable when handling.

I thought this PR initially let OsRng keep returning Unavailable of NotReady if new() failed, and Unexpected for failures in try_fill_bytes. Did this change? I still think Unavailable is the best kind to return if new fails.

@dhardy
Copy link
Member Author

dhardy commented Feb 23, 2018

OsRng::new() can fail if, for example, there are no file descriptors available. You pointed out yourself that you thought this should retry at least occasionally, hence it seems to me that Unavailable is not appropriate? But also Transient is not appropriate because we don't know that the feature will work later, so Unexpected seems more appropriate?

@pitdicker
Copy link
Contributor

Yes, I wanted that, but that is mostly with long-running processes in mind. I think we can honestly say OsRng is unavailable if OsRng::new() fails. Especially when the modern system call interfaces fail, because that should only happen if the OS is somehow deeply broken/misconfigured. Still, if the problem is 'magically' fixed (probably by a human intervening), it is nice to eventually start using OsRng again. That was my reasoning in ReseedingRng and later EntropyRng. And the same is true with a problem like no available file descriptors because of an attack. That will probably also not fix itself, or at least not quickly.

While writing this comment I looked again at the possible errors that can occur when opening /dev/urandom:

  • EMFILE or ENFILE: No file descriptors or other 'resource exhaustion attack'.
  • EACCESS, ENOENT, ENODEV, ENXIO: No permission, random device deleted, some deep OS misconfiguration
  • EINTR: Interrupted while opening /dev/urandom
  • EAGAIN, EWOULDBLOCK: Not ready

I think we could map EINTR to Transient, and we already map the last ones to NotReady. For the others, they will probably not fix themselves without human intervention, so I would say Unavailable.

@pitdicker
Copy link
Contributor

What did you think about Unforeseen instead of Unexpected?

@dhardy
Copy link
Member Author

dhardy commented Feb 23, 2018

Unforseen would imply that no one foresaw this failure mode — but that would obviously not be the case if there was code to return an Unforseen error there. This is exactly what @vks didn't like about Unexpected, only more literal.

"Expectations" are usually about what is likely to happen, but the word also gets used in other ways, e.g. "expect the unexpected" (which should probably be "expect not to foresee everything").

So I guess we stick with Unexpected

Edit: so not everyone agrees about the meaning of expected: OED, Merriam-Webster, Wiktionary. Perhaps its just my maths/stats background making me assume expected = likely.

@@ -62,7 +62,7 @@ impl<R: Read> RngCore for ReadRng<R> {
if dest.len() == 0 { return Ok(()); }
// Use `std::io::read_exact`, which retries on `ErrorKind::Interrupted`.
self.reader.read_exact(dest).map_err(|err| {
Error::with_cause(ErrorKind::Unavailable, "ReadRng: read error", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to split this in two errors? To differentiate between unrecoverable errors and others? If I look at std::io::ErrorKind I would only map :io::ErrorKind::Other to ErrorKind::Unexpected. And all the others seem like ErrorKind::Unavailable to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, but I would think only NotFound or PermissionDenied would indicate Unavailable. If Read gets used over the network, any of the refused/reset/aborted might indicate that human input is needed to resolve, but it's harder to say; all the rest should be Unexpected as I see it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, your right. If we can't imagine the error to occur, best to map it to Unexpected. I think besides the 5/6 you mentioned also BrokenPipe could also mean Unavailable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how a broken pipe is possible when reading directly from a file so Unexpected is fine.

I updated the PR BTW.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can read from stdin, just like PractRand does.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. Okay, BrokenPipe maps → Unavailable

@pitdicker
Copy link
Contributor

It is strange to be discussing / advocating things I do not care strongly about 😄.

For things like reading random bytes in OsRng there are not really any documented errors once OsRng::new() succeeded. I do not foresee any errors here. But we did add an error path, so in some way we did foresee it? No problem, let's stick with Unexpected.

src/os.rs Outdated
let file = File::open(path).map_err(|err| {
use std::io::ErrorKind::*;
match err.kind() {
NotFound | PermissionDenied | ConnectionRefused =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not really match the same posix errors, see https://github.com/rust-lang/rust/blob/master/src/libstd/sys/unix/mod.rs#L96. I don't want to be picky, but think it is better to match those (can be extracted).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get you? The names are all basically the same except for ENOENT and EAGAIN. You think we should match the constants defined in libc instead? Or you would choose a different selection of constants to map to Unavailable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think ConnectionRefused is an error we can encounter for OsRng, and am missing EMFILE, ENFILE, ENODEV and ENXIO. Interesting, if you search for "ENXIO random" python uses a similar list when opening /dev/random. Maybe using the libc constants is best here?

New Unexpected error kind
Error's kind and msg fields are now public
Some error handling simplifications and revisions
@dhardy
Copy link
Member Author

dhardy commented Feb 26, 2018

Removed test and rebased. I think this is ready now?

@pitdicker
Copy link
Contributor

Looks good to me. Only the check in OsRng that may be worth changing, but I'll leave the choice up to you.

@dhardy
Copy link
Member Author

dhardy commented Feb 26, 2018

Ah sorry, GitHub hid those comments.

@dhardy
Copy link
Member Author

dhardy commented Feb 27, 2018

@pitdicker going by your previous comment it doesn't look like matching all individual codes is the best option (also a problem because ReadRng is also used by Redox); better just to map most codes to Unavailable?

This means that Unexpected is now unused, equivalent to the previous Other... feels like we're going round in circles (although some of the error handling is definitely better).

@pitdicker
Copy link
Contributor

This means that Unexpected is now unused, equivalent to the previous Other... feels like we're going round in circles (although some of the error handling is definitely better).

I feel the same. But still you made good improvements here! Ready to merge?

@dhardy dhardy merged commit 7069151 into rust-random:master Feb 27, 2018
@dhardy dhardy deleted the failure branch February 27, 2018 14:02
pitdicker pushed a commit that referenced this pull request Apr 4, 2018
ErrorKind: rename Transient → Unexpected and revise uses of Unavailable
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants