Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace use of crypto/rand with more direct use of OS RNG #58

Closed
briansmith opened this issue Dec 30, 2015 · 8 comments
Closed

Replace use of crypto/rand with more direct use of OS RNG #58

briansmith opened this issue Dec 30, 2015 · 8 comments

Comments

@briansmith
Copy link
Owner

In particular, consider rand::os::OsRng. If rand::os::OsRng is insufficient in some way then try to fix it. rand::os::OsRng is preferable because other Rust code is likely using it, and we can avoid duplicate code. Also, crypto/rand is pretty complicated and has a weird interface. To the extent that that complication is useful, we should implement it as a layer over rand::os::OsRng.

@briansmith
Copy link
Owner Author

One problem is that OsRng implements Rng and there are a LOT of methods, in particular methods that have floating-point parameters and lots of methods that we'd never, ever use. It's not clear how the presence of the floating point types in the trait interface affects platforms where we want to avoid floating point stuff. It's not clear how good a job does in removing the extraneous methods. Perhaps we should propose a new, simpler, interface, that is just rand::os::OsRng::fill_bytes.

@briansmith
Copy link
Owner Author

@briansmith
Copy link
Owner Author

See rust-lang/rust#27703 (comment)

@briansmith
Copy link
Owner Author

briansmith commented Apr 25, 2016

I've started a new implementation of ring::rand where more of the logic is in Rust.

It is now possible to explicitly manage when the file handle for /dev/urandom is open by constructing/destroying rand::SystemRandom instances appropriately.

Most applications should only have a single implementation of rand::SystemRandom instantiated to avoid the waste of having multiple file descriptors to /dev/urandom open and to avoid the overhead of opening/closing /dev/urandom. That was effectively how BoringSSL works and how ring worked before these changes. However, I didn't implement that because, generally, we shouldn't be using /dev/urandom anyway. Instead, we should be using RDRAND (or equivalents on ARM and other platforms) + PRP (e.g. ChaCha20), or getrandom syscall (or the equivalent on other platforms), for performance reasons. The /dev/urandom implementation should only exist as a least-common-denominator fallback.

I also dropped the RDRAND+ChaCha20 implementation.

So, we should:

As far as the RDRAND + PRP implementation goes: The BoringSSL implementation was kinda-sorta-fork/vm-duplication-safe, but in some ways it actually makes that kind of problem worse. Instead, we should encourage applications to use safer patterns. For example, it seems like applications would be best off using SystemRandom for everything by default, but whenever they need a PRNG for something associated to a socket, they can use a PRF (perhaps in conjunction with RDRAND or equivalents) from a SystemRandom-generated seed to get the performance boost. By tying the CSPRNG state to a socket, the application should be able to avoid all the VM duplication and forking issues safely. (I think Amazon's s2n does this.)

@briansmith
Copy link
Owner Author

Here's the relevant comment from #147:

The source of RAND_bytes notes:

/* Without a hardware RNG to save us from address-space duplication, the OS
* entropy is used directly. */

This protection is racy in the sense that the address space could be duplicated between the time that hwrand is called and the time the function returns. The more random data we're generating, the bigger this window of vulnerability becomes. To shrink the window of vulnerability, we should minimize the time between the call to hwrand and then time the function returns. One way to do this would be to make the call to hwrand the last thing that RAND_bytes does. The complication is that currently, when a large amount of data is used, RAND_bytes encrypts the output of hwrand. It would have to be changed to encrypt something else (a buffer of zeros, or state->partial_block) in that case.

@DemiMarie
Copy link

My thoughts:

  • /dev/urandom fails to block even before it has been properly seeded. Solution: poll() on /dev/random before reading anything from /dev/urandom.
  • RDRAND should not be trusted (possible NSA backdoor that cannot be checked except by someone who can reverse-engineer an IC).
  • getentropy() "does the right thing" w.r.t security, but I don't think it will solve the speed issue as the Linux kernel's algorithms (based on hashes, IIRC) are much slower than ChaCha20 (this should probably be reported as a kernel bug). On my system (i7 Haswell laptop) I can only read <14MB/s from /dev/urandom (vs ChaCha20 at ~1 cycle/byte).
  • pthread_atfork() isn't reliable – too many reasons to use the raw clone() syscall.

@briansmith
Copy link
Owner Author

/dev/urandom fails to block even before it has been properly seeded. Solution: poll() on /dev/random before reading anything from /dev/urandom.

If one doesn't trust /dev/urandom then the "disable_dev_urandom_fallback" feature can be used to disable the fallback. I don't want to add extra logic like poll to support /dev/urandom beyond what we currently do.

RDRAND should not be trusted (possible NSA backdoor that cannot be checked except by someone who can reverse-engineer an IC).

Without agreeing or disagreeing, now the only way we get random numbers if from the OS, so we're good here.

getentropy() "does the right thing" w.r.t security, but I don't think it will solve the speed issue as the Linux kernel's algorithms (based on hashes, IIRC) are much slower than ChaCha20 (this should probably be reported as a kernel bug). On my system (i7 Haswell laptop) I can only read <14MB/s from /dev/urandom (vs ChaCha20 at ~1 cycle/byte).

This is being fixed now in Linux kernels after 4.7 (I don't remember the exact version). It is unclear that we have any need for a faster RNG anyway. Would like to see use cases before we worry about it.

pthread_atfork() isn't reliable – too many reasons to use the raw clone() syscall.

Agreed.

Anyway, I believe everything actionable here was already done and/or has its own issue. Please file new issues if something has been overlooked.

@briansmith
Copy link
Owner Author

Also, thanks for the comments!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants