Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export and document cryptographically secure random numbers #32954

Closed
chethega opened this issue Aug 18, 2019 · 32 comments
Closed

Export and document cryptographically secure random numbers #32954

chethega opened this issue Aug 18, 2019 · 32 comments
Labels
randomness Random number generation and the Random stdlib

Comments

@chethega
Copy link
Contributor

Various applications need a source of cryptographically secure random numbers. We currently do not provide a CSPRNG.

Secure random is a very basic primitive, and imo as widely important as secure hash (which we do provide). Further, we have a very nice AbstractRNG interface. In other words, there are very few API decisions (mainly: whether to define Random.seed! for the CSPRNG). Secure random could live in packages, but this has lots of disadvantages: First, it is inconvenient, which will lead to people misusing the default random generator. Second, a central place for maintaining such basic functionality is preferable to many competing implementations, both security-wise and in terms of simplified audits of packages.

There was some discussion here that inspired me to grep through a lot of packages to gauge how they use randomness. I'll link all of the individual issues here, to make the point that this is important functionality, and that there is confusion about secure random in the package ecosystem. Considerations about the default RNG are separate.

A sensible solution could be to use RandomDevice(). In this case, we would simply update the docs.

@JeffBezanson
Copy link
Member

Secure random could live in packages, but this has lots of disadvantages:

On the other hand, putting it in a package makes releasing bug fixes or improvements much faster.

@ViralBShah
Copy link
Member

To start with, it should certainly be a package. As it gains more widespread usage, we can figure out if it should live in stdlib. Given that we are more likely to move things out of stdlib rather than into it, having an external package seems better.

@ViralBShah ViralBShah added the randomness Random number generation and the Random stdlib label Aug 18, 2019
@chethega
Copy link
Contributor Author

I disagree.

Security is still the toxic wastedump of computer science. People are rightfully reluctant of adding external dependencies. Many people are naive with respect to crypto and security. People lack the resources to properly vet packages that provide secure random. We want to support a culture of "when in doubt, then be secure". We absolutely do not want to force people to make ill-informed trade-offs whether they "can get away" with using insecure rand. It must be easy to do the right thing; otherwise people will invariably mess up. Bad random leads to perfectly functional test-passing vulnerable packages.

At the very least, I want the Random stdlib to document a best-practice for secure random. This is basic functionality and there must be an officially endorsed way of getting secure random.

If we do not want to implement a CSPRNG, then we could export a wrapper around one of the mbedtls CSPRNGs. mbedtls is already a binary dependency of julia.

We could also fix up the random wrapper in MbedTLS.jl and be opinionated to point people there. MbedTLS.jl is already a dependency for large parts of the ecosystem, to the point that it is "effectively stdlib".

We could also point people at the RandomDevice, as proposed here. That has thread-safety issues we'd need to resolve. But pointing people there, with the magic words "use this for secure random", relieves them of reading the julia source, then reading the docs of the called windows SystemFunction036 and then searching for cryptoanalysis papers (due dilligence I did for my PR, and that would need to be replicated before officially endorsing RandomDevice for this role).

CC @hustf @essenciary @sbromberger for a user perspective.

@StefanKarpinski
Copy link
Member

Question: is this intended to produce truly unpredictable random data like /dev/random or a reproducible stream of pseudo-random data?

@sbromberger
Copy link
Contributor

On the other hand, putting it in a package makes releasing bug fixes or improvements much faster.

I might have more to say later, but I want to address @JeffBezanson 's comment here from a different perspective: while "releasing bug fixes or improvements" might be a valid reason to move non-secure code out to a 3rd-party package, I'm not sure this is a great strategy for things like CSPRNGs. There are a couple of reasons for my belief:

  1. Secure primitives should be available in stdlib/base. This is the case for a number of modern languages: Go, Python, Java, Rust.

  2. Even if that weren't compelling enough, it is important that bug fixes to crypto libraries be as widely announced and deployed as possible, as quickly as possible. This argues for a centralized model where a new version of the base/stdlib is released in response to a security/crypto problem. There's no better way to advertise "Hey, if you continue to use an older version of this code you're going to be insecure" than saying "Hey, we've just released Julia 1.3.1 with security fixes. All users are encouraged and advised to upgrade as soon as possible". It is wishful thinking to believe that users of third-party libs - especially when they're indirect (transitive) dependencies - will keep their fingers on the pulse of their downstreams.

  3. Because crypto libs tend to be static in terms of their development/enhancement, there's little benefit to splitting them out into a fast release process. In fact, I'd argue that stability of crypto libs is a requirement. I don't want to outsource my security to a third-party lib. I want it baked into the language, and vetted/updated by the language maintainers. It's that important, especially if we want to see Julia move from "here's an interesting language that we can play around with interactively" to "here's a service we've built on this language and are making available to some subset of the general public".

Finally, and possibly a distraction: extraordinary claims require extraordinary evidence. I am unconvinced that moving code out to third-party libraries has actually resulted in more frequent releases than what we see in Julia Base/stdlib. Just in 2019, we've had 3 releases of Julia, with two more imminent. I don't know that third-party packages have seen that sort of iteration: picking some popular packages, I see that in the same timeframe,

  • DiffEq has had 2 releases
  • Flux has had 1
  • Gadfly has had 1
  • Distributions has had 5 (still 0.x)
  • OrderedCollections has had 1

I'm certainly not arguing that these should be moved (back) into Base. (Actually, for a few of them perhaps I am, but that's a different discussion.) To reiterate my point more simply, and to wrap it up:

TL/DR MHO, YMMV, etc.: Crypto functionality should be primitive and stable, and should be important enough to be versioned as part of the language itself.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Aug 19, 2019

As the person merging the releases, I can assure you that DiffEq* has had a lot more than 2 releases. Doing git log --since=2019-1-1 D/DiffEq* | grep 'New version' | wc -l in the registry repo shows that the DiffEq collection of packages have had 90 releases. Are you talking about DiffEqBase? That alone has had 27 releases. That would have been completely impossible if DiffEq was a standard library.

Suppose that Gadfly were made into a standard library. Does that mean any more work would have been done on it? No. It would get the same amount of maintenance as it does now. Even if it would have technically been included in 3 different Julia releases, if the actual code was unchanged, it's hard to argue that this is really a release.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Aug 19, 2019

Again, I'd like to ask about the P part of CSPRNG is significant. Because if people just want a good reliable stream of random data, that's a pretty simple, stable API that I'd be happy to commit to. If people don't want reproducibility, then we can change algorithms at any time and not worry about breaking people's programs. If people want reproducibility, then that's a whole different story because then we need to pick and commit to maintaining and supporting a given algorithm. The original suggestion of using RandomDevice() suggests that a CSPRNG is not actually what's wanted, but a good, standard way to get secure true randomness.

@JeffBezanson
Copy link
Member

I also think looking at the release schedules of other existing packages is not relevant. The point was just that it's possible to release new versions independent of julia/stdlib. If you want to make a new release, you can at any time. Even if it were true that packages have on average fewer releases than julia, there is no causal relationship. Is it really so crazy to consider a compiler and a crypto library separate projects?

Long term, the structure I'd like to have is everything in stdlib moved to separate repos, but with a linux-distro-like bundling mechanism where we make it easy to get a (possibly custom) set of packages in one download.

But, if we can make the existing RandomDevice do this (and of course make it as easy to use as possible and document it), that would be great since it's within easy reach.

@sbromberger
Copy link
Contributor

Yup, as I feared, that last point became a distraction. However, as mentioned (perhaps ad nauseam) elsewhere,

Long term, the structure I'd like to have is everything in stdlib moved to separate repos

will kill practical use of Julia in many environments. If this is indeed the direction The Project is headed, confirmation now will allow me to start looking at alternatives in my organization. This preference has been repeated often enough by enough people that it is now, in my mind, a Thing To Be Taken Seriously. I'll follow up off github.

@JeffBezanson
Copy link
Member

Does the rest of that sentence

but with a linux-distro-like bundling mechanism where we make it easy to get a (possibly custom) set of packages in one download.

help?

I simply don't understand how a language and libraries being separate packages is some perverse and awful new idea that we invented. Isn't that how everything works? How many useful libraries are bundled with gcc?

And there is not that much in our stdlib folder today. If it's not possible to use e.g. DataFrames, Plots, Distributions, etc. I would think it's already not practical to use julia. So my point would be that a more comprehensive solution is needed anyway, not just trying to make the entire julia ecosystem a monorepo.

@sbromberger
Copy link
Contributor

Coming back to the discussion at hand:

Again, I'd like to ask about the P part of CSPRNG is significant. Because if people just want a good reliable stream of random data, that's a pretty simple, stable API that I'd be happy to commit to.

From my perspective, this should be sufficient. That is, I can use a PRNG in testing and development to ensure that my code is doing what it should using reproducible inputs, and then I can easily switch out to a CSRNG for production.

@KristofferC
Copy link
Member

KristofferC commented Aug 19, 2019

Secure primitives should be available in stdlib/base. This is the case for a number of modern languages: Go, Python, Java, Rust.

For Rust, you linked to an external crate https://crates.io/crates/rand, developed in a separate repo https://github.com/rust-random/rand.

@chethega
Copy link
Contributor Author

Sorry for being unclear in the initial feature request and inadvertently derailing the discussion.

In this specific issue, I am simply asking for an idiot-proof way of generating secure random (which is documented in an idiot-proof way).

Making RandomDevice easier to use and/or improving its documentation fits the bill.

@StefanKarpinski
Copy link
Member

Ok that's a pretty straightforward ask. We have also talked about making MbedTLS a standard library (the current situation is quite bad where you end up having multiple different copies of it) and HTTP as well, using MbedTLS for secure web connections.

@StefanKarpinski
Copy link
Member

Given that all you need to generate a secure random array is rand(RandomDevice(), UInt64, 10) is there anything else that needs to be done here? Improved documentation?

@sbromberger
Copy link
Contributor

is there anything else that needs to be done here? Improved documentation?

It might be too breaking/inconvenient a change, but I'd like to get folks' takes as to whether there's any reason not to make secure randomness the default. I think this would obviate most of the issues @chethega found in his initial post.

@chethega
Copy link
Contributor Author

Given that all you need to generate a secure random array is rand(RandomDevice(), UInt64, 10) is there anything else that needs to be done here? Improved documentation?

If called in a loop on unix-like systems, then you will run out of file descriptors. Correct use is to initialize a module-level global const CSPRNG = RandomDevice(); and use rand(CSPRNG, UInt64, 10). It is an unfortunate fact that the RandomDevice is not initialized by stdlib and each module therefore needs to open /dev/urandom individually.

But yes, documentation that includes copy-pastable examples of correct use can be an appropriate solution.

@rfourquet
Copy link
Member

It might be too breaking/inconvenient a change, but I'd like to get folks' takes as to whether there's any reason not to make secure randomness the default.

Just as a data-point: Primes.jl. It's sometimes practical to use the default RNG, have the possiblity to use Random.seed!(123) (e.g. to debug the Primes.jl code), while not needing security.

@tpapp
Copy link
Contributor

tpapp commented Aug 21, 2019

Secure random could live in packages, but this has lots of disadvantages: First, it is inconvenient, which will lead to people misusing the default random generator. Second, a central place for maintaining such basic functionality is preferable to many competing implementations, both security-wise and in terms of simplified audits of packages.

I don't understand either of these points. Packages are not inconvenient at all — presumably a CSPRNG would also live in a package, not Base, so I am assuming you are talking about making it a standard library. But that's more of a hindrance in this context: if one needs to make a new release for a security-related bug, it is much less of a hassle to update a package than release a new version of Julia.

I am not sure that an ecosystem of competing CSPRNG libraries is such a bad thing (competition has its advantages), but in any case, is it a practical concern? The only CSPRNG I could find for Julia is
https://github.com/pik/isaac-jl which appears to be unmaintained. Are there many others?

@JeffreySarnoff
Copy link
Contributor

fwiw (/dev/random vs /dev/urandom) https://www.2uo.de/myths-about-urandom/

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Aug 27, 2019

The term CSPRNG has been misused in this thread: what's actually wanted is a secure source of genuine randomness, regardless of how it's generated—usually partially from random events, partly from using a CSPRNG to enhance the amount of bits you can get from those random events.

@Keno
Copy link
Member

Keno commented Aug 29, 2019

I think it makes sense to have a good, fast CSPRNG in base alongside the default non-CS PRNG. Additional RNGs can always live in packages, but we should bless a good default.

@chethega
Copy link
Contributor Author

chethega commented Aug 29, 2019

I cannot resist adding stdlib/uuid to the list of mersenne twister users that MAY WISH TO use a better random source by default, per rfc4122/uuid and rfc6919 / further requirement levels.

@StefanKarpinski
Copy link
Member

Using a secure RNG for uuids is a good idea 👍

@StefanKarpinski
Copy link
Member

There seem to be two major things to do here:

  1. Address the issue that calling RandomDevice() opens a file each time it's called
  2. Pick a name by which to export the once-opened secure random stream

For (1), if we can open /dev/random on startup then we can just do that, if we want to avoid doing that until it's needed then we need to do something a bit more complicated where we open it the first time it's used and then keep the open device around.

For (2), I would propose that we provide Random.secure_rng() as an analogue to Random.default_rng() which is the official public interface to getting the default insecure RNG. We could also have Random.secure_prng() if one needs a secure pseudorandom RNG, but it's unclear from the conversation if that's needed at this point.

@JeffreySarnoff
Copy link
Contributor

Random.secure_prng() is important wherever e.g. /dev/urandom or its O.S. specific similar may be/become (temporarily) unavailable or may be/become compromised.

@chethega
Copy link
Contributor Author

if one needs a secure pseudorandom RNG

As far as I understood, your random/pseudo-random terminology difference is that pseudo-random additionally exposes a seed!-API? I think seedable secure random is only really needed for uses that we don't want to encourage anyway (e.g. make your own stream cipher by xor).

Random.secure_prng() is important wherever e.g. /dev/urandom or its O.S. specific similar may be/become (temporarily) unavailable

If the kernel fails to offer /dev/urandom then we can give up and should panic or throw. The OS failing to provide secure random is mostly a non-issue. Even if we believed that the current kernel RNG state is compromised (e.g. we are on a VM that just was migrated, we have been informed of this but suspect that the kernel did not get the message) then we could just write some entropy to /dev/urandom and reopen it.

Once we make RandomDevice usable (open once on startup, thread-safe, fix precompilation woes) the remaining main advantages of a different implementation are performance (plus the windows story that is SystemFunction036; we could check whether we are running a non-ancient OS and use a more modern API). Performance is kinda important, though!

If we plan to make the default use a secure algorithm, then we must have seed! (reproducibility) and we must be within ~2x speed of our mersenne twister on most hardware; preferably reaching almost parity (otherwise people will complain too much). Regardless of implementation, I think that Random.secure_RNG() should not be seedable (footgun). But we can of course share all other code between Random.default_RNG() and Random.secure_RNG(), and should still encourage all users who actually need secure random to use Random.secure_RNG instead of Random.default_RNG (both for hygiene and to permit us to backport minor security upgrades that otherwise change random stream or impact speed).

Using a secure RNG for uuids is a good idea +1

rfc6919#10 perfectly describes the kind of security considerations and requirement level that uuid generation recommends in rfc4122#4.5. I you haven't already, crack open a can of beer and read that. "a BETTER solution" my ass.

@StefanKarpinski
Copy link
Member

If we plan to make the default use a secure algorithm,

I really don't think this is a viable option. This will lead to hundreds of posts on discourse—that I will be answering for years to come—with people comparing Julia's rand() with Python, Matlab and R and demanding to know why Julia is so slow at generating random numbers. I get the desire to be defensive against people using rand() to generate sensitive randomness, but I think that's an education problem. If someone is naïvely using rand() to generate a sensitive random value, then the chances that the rest of their code is actually secure is pretty negligible.

In short, the default RNG can stay as is, and the secure RNG needs to be:

  • non-seedable, so it is not a PRNG, at least from the user's point of view
  • as genuinely random as we can make it—using /dev/urandom is a good option
  • does not need to be especially fast, but faster is generally better

To satisfy this, I think that my proposal to export Random.secure_rng() to match Random.default_rng() is fine and that's what we should go ahead with. It should initially just use RandomDevice but open it only once. If that actually turns out to be too slow for anyone, then we can consider faster options, but I doubt it will be an issue.

@Keno
Copy link
Member

Keno commented Aug 30, 2019

does not need to be especially fast, but faster is generally better

Mostly true, but the applications I have do suck secure randomness at a very large rate, so reading from /dev/urandom is not fast enough.

@jiahao
Copy link
Member

jiahao commented Dec 19, 2019

FWIW I have wrapped the RdRand and RdSeed instructions here in SecureComputation.jl, an experimental implementation of homomorphically encrypted arithmetic (the package is essentially superseded by ToyFHE.jl ), with help from @maleadt.

Maybe the easiest thing to do is to spin off this little wrapper into a tiny package that can be called on demand.

@ghost
Copy link

ghost commented Feb 2, 2020

Would it be possible to optionally pass RandomDevice a buffer size? I have an application where I must use a CSPRNG (urandom) for any and all random elements in a project.

Latency becomes a problem and I had to hack together a buffer containing normally distributed random values of all things. Doesn't help that urandom is a single threaded process.

edit: urandom is usually 200+ mb/sec on a decent cpu
time head -c 16000m </dev/urandom >/dev/null

@oscardssmith
Copy link
Member

This is now exported and documented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
randomness Random number generation and the Random stdlib
Projects
None yet
Development

No branches or pull requests