Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crate refactor #161

Closed
wants to merge 302 commits into from
Closed

Crate refactor #161

wants to merge 302 commits into from

Conversation

dhardy
Copy link
Member

@dhardy dhardy commented Jul 28, 2017

This is a rough preview of what these RFCs may result in:

Current state in my repo

rustdoc for refactor

This is a major break-anything-I-like refactor of rand, in an attempt to fix a bunch of issues I and others have found. I don't expect it to be merged as-is, but hopefully a view of what rand could be will help answer questions about how it should be.

There are a few bits unfinished (see TODO and FIXME); also some changes from other PRs should be integrated. Possibly there are a couple of things which could be cleaned up further. But mostly this is ready for review now.

Comments welcome. Each commit should be fairly clean with a low number of changes, but put together there are a lot of changes.

README.md Outdated
use rand::thread_rng;

let x: u32 = thread_rng().next_u32();
println!("{}", u32)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

println!("{}", x)


use Rng;
use dist::Distribution;
use dist::uniform::{SampleUniform, /*SampleUniform01,*/ codepoint};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the purpose of this comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid an unused item warning (see FIXME further down)

@Lokathor
Copy link
Contributor

Lokathor commented Jul 29, 2017

Why is Rand gone. That was the best part about things.

src/sequence.rs Outdated
/// println!("{:?}", choices[..].choose(&mut rng));
/// assert_eq!(choices[..0].choose(&mut rng), None);
/// ```
fn choose<R: Rng>(self, rng: &mut R) -> Option<T>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About dynamic dispatch: The new trait looks a lot better prepared for trait objects already. Wide use of <R: ?Sized + Rng> through the crate should make it possible to use dynamic dispatch, it's just a major breaking change of Rand and other trait implementors, but if that's not an issue, we can do it.

(There's two ways to go about it, using R: ?Sized + Rng with a &mut R parameter or using R: Rng and an R parameter, both work the same way if you have a blanket implementation of Rng for &mut R where R: Rng; I think the &mut R parameter is just as good a choice, but the second form could make it easier if you have Rng "adaptors" or whatever..)

Either way, better usability of Rng from trait objects is something I've wished for, but it's not relevant for all use cases.

@withoutboats
Copy link

withoutboats commented Jul 30, 2017

As someone not deeply invested in the differences between different random number generators & distributions and so forth, the way I want to use the rand crate is this:

  • #[derive(Rand)]
  • rand::random()

This seems ot have removed both of these features, and its not clear how they were replaced. How do I prepare a random value of my type with this API? What was wrong with these two features?

It would be helpful, with a huge change like this, to host the rustdoc output so users can easily see the API change.

EDIT: To be a bit clearer, the main way I use rand is to create values when writing unit tests, for which I want a "reasonable default" source of randomness & to avoid thinking about it.

@bluss
Copy link
Contributor

bluss commented Jul 30, 2017

@withoutboats The Rand trait and its derive seems like an independent part, that can easily be its own crate IMO

@Lokathor
Copy link
Contributor

It would be extremely silly for the rand crate to not include the trait for making random values of arbitrary types. Having some sort of sub-crate for just the generators might be fine, but the main crate "rand" should be the one-stop place for all of the common randomness needs.

@portal-chan
Copy link
Contributor

I'd also much rather prefer for the Rand trait to stay as opposed to being arbitrarily split off into it's own crate. I don't think it would be much of an improvement to start having to rely on 2 (or 3, if you use rand-derive) crates for pretty basic functionality.

@withoutboats
Copy link

withoutboats commented Jul 30, 2017

Rand and Rng are a handshake protocol, analogous to Serialize and Serializer or Hash and Hasher. Rand should be in rand not only for convenience but to establish the protocol as normative.

@bluss
Copy link
Contributor

bluss commented Jul 30, 2017

Sure, it's silly to toss out Rand from the rand crate, clearly, but it's an add-on concept that stands separately. The rng is the core. The current Rand trait has not been useful in numerical applications, it's not the part of the crate that gets used IME.

(The Rand in the proposal on internals could be better.)

Edit: See, no Rand trait in the rand-core rfc

@burdges
Copy link
Contributor

burdges commented Jul 30, 2017

Internals threads are a mess to read due to internals screwing up the browser's search functionality, but that proposal gives Rand a type parameter for the range, yes? You'd presumably change the gen method like :

pub trait Rand<Dist> : Sized {
    fn rand<R: Rng>(rng: &mut R, dist: Dist) -> Result<Self, RngError>;
}

pub trait Rng {
    type RngError : Sized;
    fn fill(&mut self, dest: &mut [u8]) -> Result<(), RngError>;
    ...
    fn next_u32(&mut self) -> Result<u32, RngError> { ... }
    fn next_u64(&mut self) -> Result<u64, RngError> { ... }
    ...
    fn gen<D,T>(&mut self,d: D) -> Result<T,RngError> where T: Rand<D> { Rand::rand(self,d) }
    ...
}

@Lokathor
Copy link
Contributor

I think it's unfortunate that we need to worry about RNGError in all cases. Some RNGs simply will never error, and it's a shame that they're having to take on that weight. Can we arrange things into RNGs that will sometimes error (I guess certain crypto RNGs?) and RNGs that will never error?

@nagisa
Copy link
Contributor

nagisa commented Jul 31, 2017 via email

@dhardy
Copy link
Member Author

dhardy commented Jul 31, 2017

Lots of comments, good...

Thanks @bluss , I will look at dynamic dispatch.

DefaultDist is supposed to be the replacement for Rand (currently it doesn't support f32/f64 because of a compiler bug). If so many of you want type-specific generation, maybe Rand deserves to stay as a first class citizen; my beef with it is that it has no good way of documenting what the distribution is for simple types (e.g. char in the crate master generates any code point, even unprinting ones — is this actually what you want?).

It seems better from my POV to write a little more code to specify the distribution of each value when interacting with the generator.

rand::random() went because it depended on gen() which I removed. But it could probably be implemented using DefaultDist — however, as above, should this implement an arbitrary distribution for char, Option, etc?

@dhardy
Copy link
Member Author

dhardy commented Jul 31, 2017

It would be helpful, with a huge change like this, to host the rustdoc output so users can easily see the API change.

@withoutboats yeah, I wanted to. Any suggestion how?

@withoutboats
Copy link

I'd recommend just hosting it in a Github Pages repo https://pages.github.com/

@dhardy
Copy link
Member Author

dhardy commented Jul 31, 2017

Done: https://dhardy.github.io/rand/rand/index.html

@burdges
Copy link
Contributor

burdges commented Jul 31, 2017

Another issue : Is an associated RngError type sufficient to support an Rng that operates as a future?

Afaik, there are no Rgns that never error @Lokathor : OsRng sets RngError to io::Error. If you've a single task, like say running some KDF to make a packets along with a Ziggurat algorithm to give a delay, then you could seed a ChaChaRng with output from OsRng, but technically ChaChaRng could still error if you exceed the ChaCha stream length. You can unwrap() if you know you'll never do this. In fact, this should always be the case since since ChaChaRng should only be used for a "single task", but maybe it's safer to make the developer read instructions on ChaChaRng that warn them not to use it too broadly.

Also, if I understand, we must keep the next_u32, next_u64, etc. because these help us build Rngs with different output types. We cannot use fancier tricks involving specialization on both the Rng and the T: Rand because doing so means some trait for the tuple (Rng,Rand) which likely breaks object safety. Yes?

@Lokathor
Copy link
Contributor

@burdges I am not an RNG expert, but I would expect most seedable RNGs that are not for cryptography to just never error out once they've been properly seeded. The seeding process might have a problem somehow, but once that works you've just got some state and some step function and that's it. What errors are left? I mean your number stream will loop, but that's not an error.

I guess what I'm trying to get at is that the rand crate needs to support the idea that you get some tiny amount of entropy from the OS (say, one or two u64 values) and then you make your own generator and you run it as often as you want without ever talking to the OS about it again, and you just accept the fact that it'll loop eventually. You want this to program video games with. I think we all understand that "you can program video games with it" is one of the biggest drivers of a programming language's growth.

@burdges
Copy link
Contributor

burdges commented Jul 31, 2017

I'd say even non-cryptographic random number generators should error rather than loop, provided their design permits detecting the loop, although not all designs permit this.

We do want Rng to be usable for cryptography regardless, so that requires a Result<T,RngError>. Anyone else can just unwrap() or maybe figure out some scheme by which Result<T,!> could actually be T.

We could just add an InfallibleRng that one impls like a marker trait, or even automagically impls for RngError = !, but actually includes a bunch of default methods that do the unwrap()s.

@burdges
Copy link
Contributor

burdges commented Jul 31, 2017

I've said this elsewhere but Rng includes some methods that involve comparisons, like gen_range. These are not ideal for cryptography unless those comparisons are constant time, and many applications do not need that, so Rng cannot be ideal for cryptography throughout.

I think the right solutions might be : Rng's raw-ish methods like next_u64, fill, etc. should be as suitable for cryptography as the underlying impl permits, which should be good enough in the case of OsRng. Above that, there should be a crate, maybe called subtle-rand, that provides a wrapper struct CTRng<R>(R) where R: Rng that fixes all the higher level default methods. Any crates that provide an Rng suitable for cryptography should either fix these default methods themselves, check that CTRng<TheirRng> does, or use specialization to fix CTRng<TheirRng>.

@dhardy
Copy link
Member Author

dhardy commented Jul 31, 2017

Wait, I didn't notice the error part before?

fn next_u32(&mut self) -> Result<u32, RngError>

Most PRNGs are pretty simple functions that won't error. Detecting a cycle isn't as simple as detecting the same output value because internal state is bigger than that; the only trivial way to do it would be to copy the entire internal state. But for games and simulators you wouldn't want it to error, because most of the time it wouldn't matter: the game/sim state would be quite different by this point, and likely the repeated values would get used for different things, so it wouldn't be noticeable. For scientific simulations it might be useful to get a warning when an RNG repeats itself, but that's about it.

For me, making generators return a result is a complete no-go. If crypto people really want this, maybe there should be separate crypto and numeric generators — but that would be a huge pain with duplicate code (some generators maybe, some distributions), and randomised algorithms could probably work with either — except in a few cases predictable numbers in randomised algs might allow DoS attacks.

Maybe what could work instead is a trait SecureRng: Rng with a method fn error_state -> Option<Error>, and a wrapper function fn try_fill(..) -> Result<(), Error>.

@dhardy
Copy link
Member Author

dhardy commented Jul 31, 2017

@bluss: There's two ways to go about it, using R: ?Sized + Rng with a &mut R parameter or using R: Rng and an R parameter ...

I don't think they are equivalent: an R parameter isn't Copy and so can't be passed into a function expecting R while also being retained. On the other hand, the compiler has no problems with &mut R being passed into a function and retained even though it's not Copy.

On the other hand the +?Sized everywhere approach seems to work.

@Lokathor
Copy link
Contributor

Lokathor commented Jul 31, 2017

This is reminding me of the Out Of Memory discussions happening elsewhere in the rust world lately. The "normal" case not wanting to be burdened with all the result checking that would come from being absolutely technically correct.

I think that splitting things up probably is the best route

  • Generator provides next_u32(&mut self) -> u32 and friends
  • FallibleGenerator provides try_next_u32(&mut self) -> Result<u32,RNGError> and friends
  • CryptoGenerator is a marker trait that you add to your generator when it is believed that an implementation is cryptographically secure (if you read the manual and seed it right and all that). Probably only applies to various kinds of FallibleGenerator, but I'll let the crypto folks make the call on that one.

Then the Rand trait would (probably?) just be written for Generator. If all you have on hand is a FallibleGenerator you can simply run it however many times to make a Generator and use that to execute your Rand building function. We could probably even provide a helper function or default method somewhere to do this. I suspect that most people willing to put up with using a FallibleGenerator just want the raw numbers though.

I don't recall it getting much attention, but right now there's also a SeedableRng, but I'm not entirely sure on what the point of that trait is compared to just making a new Rng from scratch. Unless it's just providing a uniform way to make generators instead of each one having their own new function? We don't have to throw it out, but maybe it could get a little docs love to explain what it's doing and/or signifying. The StdRng is seedable, and the ThreadRng is not seedable. Why is it that way? That should maybe go in the docs.

EDIT: Spelling I guess, whoops.

@burdges
Copy link
Contributor

burdges commented Jul 31, 2017

I just copied RngError from the internals thread, but made it an associated type. I'm not familiar with the OsRng implementation for different systems, but some systems provide random number generation options that can fail or block.

As an aside, I doubt even all CSPRNGs error eventually, maybe even SHAKE fails that. We'll obviously want an Rng for SHAKE eventually, so it's not like even all cryptographic Rngs should be responsible for detecting exhaustion. I only meant that Rngs that use counters like ChaCha do error eventually, but even if they never error you still cannot use them indefinitely. Looping is bad.

Anyways..

You cannot create an infallable PRNG from a failable one @Lokathor except by panicing because the infallable PRNG cannot know how many times some probabilistic algorithm, like the Ziggurat used in every distribution, will call the failable PRNG. You'd damage the entropy pool by sucking out megabytes.

In principle, you could've method that asks if the PRNG can provide enough output. In that way, a developer can give an upper bound on the entropy they require without consuming anything until needed. I suppose the trait might look like

pub trait Rng {
    fn available_entropy(&self, usize)  -> bool { true }
    fn fill(&mut self, dest: &mut [u8]);
    ...
    fn next_u32(&mut self) -> u32 { ... }
    fn next_u64(&mut self) -> u64 { ... }
    ...
    fn gen<D,T>(&mut self,d: D) -> T where T: Rand<D> { Rand::rand(self,d) }
    ...
}

so the available_entropy function can simply default to true, but an OsRng or ChaChaRng can replace it. I donno if all implementations of OsRng can actually do this check, but at worse they can return true, and warn about being used in startup, or even check the system uptime. I removed the RngError entirely because any Rng that needs it can provide it via an inherent method.

About the only issue I forsee here would be if developers start writing rng.available_entropy(1000000000) everywhere, but some OsRng is too honest and says not enough entropy remains. Imho, this works well for temporary SCPRNGs like ChaChaRng because if you even get close to exhausting one of those then you need to redesign your protocol.

@dhardy
Copy link
Member Author

dhardy commented Jul 31, 2017

@burdges please read this post on the internals thread about why user-space CSPRNGs are a bad idea and the side notes about how "available entropy" is at best a guess.

By the way, fallible is spelt with an i...

... but I still prefer the design I mentioned earlier:

  • Rng (or Generator) never fails
  • SecureRng (or CryptoGenerator) has a try_next_.. or try_fill method which can fail, but also implements the Rng trait (probably panicking on error)

@burdges
Copy link
Contributor

burdges commented Jul 31, 2017

That's an unrelated conversation. Yes, user-space CSPRNGs used for many or unrelated tasks suck, but that irrelevant. CSPRNGs play an essential role in any protocol with commitments.

Actually virtually every cryptographic protocol incorporates a CSPRNGs, namely its KDFs, usually it merely needs to convert non-uniform randomness into uniform randomness, but sometimes it needs much more. And sometimes it needs to sample non-uniform distributions, like anything lattice-base or anything that must generate an agreed upon delay (anonymity protocols).

Anyway, the reason properly used CSPRNGs need to be recreated isn't to limit the quantity of output, but because they only serve a specific role where some protocol needs determinism.

Also, if OsRng really needs to block, then folks may need futures to manage it eventually.

@burdges
Copy link
Contributor

burdges commented Jul 31, 2017

Anyways, if I understand, the worries over CSPRNGs mean OsRng must be present, and ideally the default, but OsRng needs error handling on some platforms, and arguably should on all platforms. In principle, these constraints rule out the fundamental Rng trait being infallible, no? The question should be how to do the error handling, no?

Now backwards compatibility may require InfallibleRng be named Rng, while all PRNGs actually implement another trait, say RawRng. A SecureRng marker trait doesn't work great if the rand crate uses the infallible trait everywhere internally though. Also, you should not assert an Rng is secure just because it wants to return errors.

@dhardy
Copy link
Member Author

dhardy commented Jul 31, 2017

Yeah, I get that from the crypto perspective, RNGs should be able to return errors by default.

But: can that be implemented without forcing numerical users using infallible RNGs to unwrap everywhere, and without significant overhead?

Otherwise it seems very difficult to find a design without significant compromise from one point of view.

Backwards compatibility isn't the issue; compatibility between these two types of RNG is. My proposal was to make fallible RNGs compatible with the (infallible) Rng trait by moving error handling to a separate method, but I know that's not ideal either.

Anyway: I've implemented the new Rand and restored random(). This is close to the API @aturon suggested aside from error handling now. API doc updated (see opening message).

@Lokathor
Copy link
Contributor

@burdges I think you misunderstood what I meant when I suggested that a FallibleGenerator be able to just make a Generator any time it needs to run Rand code. I wasn't suggesting some sort of bridge that calls the FallibleGenerator internally. I'm suggesting that, for example, a ChaChaRng can make a XorShiftRng using 4x u32 as the seed, and (assuming that the ChaChaRng has enough entropy for that) from there it can create the Rand value based on the as many calls to the XorShiftRng as the type needs. At the end of it you just throw out the XorShiftRng and return the Rand value as if it only used up 16 bytes of random output generation.

@dhardy your updated docs show random as a top level function of the crate, but don't show the Rand trait as a top level trait. Might want to put a pub on the use that brings Rand into scope. That's a minor thing though.

I'm not fully up on how the thread local storage works, but could thread_rng() and things that depend on it be made generic, so that programs can pick the generator type that they want to have be used? It looks like "no" but I thought I'd ask. It would be nice to be able to pick the RNG used for that, if possible.

pitdicker and others added 27 commits December 14, 2017 19:46
That is kind of the goal here :-)
Rename `AsciiWordChar` and optimize `Codepoint`
And a little cleanup around the init functions
Replace `convert_slice_{32,64}` with `read_u{32,64}_into`
Restrict the seed type to a few more array sizes
Make u128 range use widening multiply
* Move the check if it is time to reseed out of the `try_reseed_if_necessary`
  and make sure that function does not get inlined.
* Invert the counter direction. This way we can compare against 0 instead of
  `self.threshold`
* Doing the reseed check after generating a value turns out to be a bit faster.`
@pitdicker
Copy link
Contributor

This PR is not the way you want to merge the work from your branch, right? Do you want to keep it open? I get a mail for every commit that happens in https://github.com/dhardy/rand because of it 😄

@dhardy
Copy link
Member Author

dhardy commented Jan 8, 2018

Sorry, didn't realise I was spamming everyone via this PR! You're right, lets close this.

@dhardy dhardy closed this Jan 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.