Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement SeedableRng::seed_from_u64 #537

Merged
merged 3 commits into from
Sep 3, 2018
Merged

Conversation

dhardy
Copy link
Member

@dhardy dhardy commented Jun 29, 2018

Implements #522

We never quite came to a conclusion on the algorithm to use, but I don't see any reason not to use PCG32. It's reasonably good quality, reasonably fast and should work fine on 32-bit CPUs too.

@dhardy dhardy added E-question Participation: opinions wanted F-new-int Functionality: new, within Rand D-review Do: needs review labels Jul 1, 2018

for (i1, r1) in results.iter().enumerate() {
let weight = r1.count_ones();
assert!(weight >= 20);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment how likely this is to fail if the seeds were random.

Copy link
Member Author

@dhardy dhardy Jul 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the binomial distribution B(64, 0.5), so chance of failure is F(20; 64, 0.5) using the notation on wikipedia, which is 0.000781 (0.07%) according to some calculator I found online.

@vks
Copy link
Collaborator

vks commented Jul 5, 2018

Maybe we should evaluate the different choices of RNG to see how they recover from a seed with lots of zeros?

@dhardy
Copy link
Member Author

dhardy commented Jul 5, 2018

Go ahead if you like. Since the basis of this is new_state = state * 6364136223846793005 + 11634580027462260723 I think it should get away from zero fast 😄

@vks
Copy link
Collaborator

vks commented Jul 5, 2018

Go ahead if you like.

I'm working on it. 😄

@pitdicker
Copy link
Contributor

I agree PCG is a better choice than SplitMix here.

I added 0x9E3779B97F4A7C15 (golden ratio * 2⁶⁴) to the initial seed to avoid week seeds with SplitMix, using PCG instead is just cleaner.

If you see SplitMix as repeatedly using a hash with good avalanche (MurmurHash) to extend the seed it seems theoretically a little nicer. But I can't imagine a situation where the results from PCG aren't good enough.

@dhardy What do you think of including something like a fill_slice_by_repeating helper method in rand_core, and using that for ChaChaRng and Hc128Rng?

And can you update ISAAC to move their seed_from_u64 methods to SeedableRng?

@vks
Copy link
Collaborator

vks commented Jul 5, 2018

@pitdicker Which seeds for SplitMix are weak?

@pitdicker
Copy link
Contributor

@pitdicker Which seeds for SplitMix are weak?

I'll take it back, that was probably an unnecessary step (wrote to much from memory).

@vks
Copy link
Collaborator

vks commented Jul 6, 2018

I'll take it back, that was probably an unnecessary step (wrote to much from memory).

Okay, are you still saying PCG is a better choice? I'm curious for what reason.

@dhardy
Copy link
Member Author

dhardy commented Jul 6, 2018

If you see SplitMix as repeatedly using a hash with good avalanche (MurmurHash) to extend the seed it seems theoretically a little nicer.

Not sure what you mean; using MurmurHash repeatedly sounds like a completely different PRNG.

Essentially SplitMix is simply a Weyl sequence (new_state = old + A) plus an output function, and PCG is an LCG (new_state = old * A + B) plus an output function. I have a feeling SplitMix is recommended for the Xo* families of PRNGs because it is simple and behaves quite differently from those PRNGs; I see no reason why PCG won't work just as well, and in general it is a somewhat better PRNG. (I also don't think there would be any issues seeding PCG from itself, though I don't know for sure.) Getting bored of this topic though; I don't think it matters much here.

What do you think of including something like a fill_slice_by_repeating helper method in rand_core, and using that for ChaChaRng and Hc128Rng?

Simply repeating the seed should work fine for those PRNGs but so should just using the default implementation of seed_from_u64. I don't think either option has any real advantage over the other (both are useless for crypto because we start with only 64 bits; other than that quality should be good either way), so my question is why bother? Potentially it does let us change seed_from_u64 later without value-breaking these PRNGs, but we don't want to change it anyway.

And can you update ISAAC to move their seed_from_u64 methods to SeedableRng?

👍 (And I guess in this case it makes sense to avoid value-breakage.)

@vks
Copy link
Collaborator

vks commented Jul 6, 2018

I also don't think there would be any issues seeding PCG from itself, though I don't know for sure.

It was shown that using the same RNG for seeding as for generating can introduce correlations. I would not feel comfortable doing this without testing.

@vks
Copy link
Collaborator

vks commented Jul 6, 2018

Not sure what you mean; using MurmurHash repeatedly sounds like a completely different PRNG.

I think he meant that splitmix64 uses a MurmurHash avalanche function as its output function.

@dhardy
I counted the average number of zeros in the first output after seeding with a random sparse integer, and both pcg32 and splitmix64 seem to get rid of the zeros in the initial seed reliably. I could not see any significant difference between the two. (I think pcg32 was slower, but I don't believe this matters much for this application.) Both are good choices in my opinion.

I still would not implement Pcg::seed_from_u64 using pcg32 though, and I would prefer to not implement it at all for CryptoRng. It is true that you might want to use a CryptoRng just for its statistical quality, but I can't think of a case where you don't care about predictability and couldn't use one of the weaker RNGs. (This ship has sailed for IsaacRng::new_from_u64, but it is deprecated anyway.) My suggestion is to implement seed_from_u64 for the RNG directly instead of for SeedableRng.

@dhardy
Copy link
Member Author

dhardy commented Jul 6, 2018

@vks you propose not adding to rand_core then but just recommending that non-crypto PRNGs implement this (like Isaac*Rng)? This is another option, but I still think a trait function with default implementation is better.

I can't think of a case where you don't care about predictability and couldn't use one of the weaker RNGs

Not couldn't, but may not (always) want to. Scientific simulations (stochastic models) are models with obvious limitations, but it is of course desirable to rule out as many sources of bias as possible — therefore it may be desirable to use a cryptographic generator (or multiple generators) just for quality and/or diversity. Of course the seed used doesn't matter, but it may be desirable to make results reproducible (e.g. a project I was working on used volunteer computing and replication of some work units as a measure of protection against bad results).

@dhardy dhardy mentioned this pull request Jul 7, 2018
@dhardy
Copy link
Member Author

dhardy commented Jul 7, 2018

I added specific implementations for the ISAAC generators as suggested. General notes:

  • BlockRng implementations require three levels of definitions of SeedableRng functions. Not really a concern here, just a limitation of the design.
  • documentation on the implementation of seed_from_u64 is included in the generated documentation but is somewhat hidden; this is probably fine however (in this case just a note that "seed 0" corresponds to the reference implementation)
  • The ISSAC implementations simply set the first 64 bits of the key and leave the rest 0, which for this PRNG (and any crypto PRNG) should be fine.

Another option for seed_from_u64 would be to have no default implementation and force every PRNG to implement the method itself, while providing one or two implementation helpers in rand_core::impls. (This would be a breaking change, but we can add a compatibility release bridging to rand_core 0.3.)

In a couple of ways, not having a default implementation is better:

  • forces each PRNG to choose its own version
  • avoids implicitly adding seed_from_u64 to PRNGs which must then have a value-breaking change if a custom implementation is used

The disadvantage is that we then need to add implementation helpers (perhaps fn stretch_seed_pcg32(seed: u64, key: &mut [u8]) and SplitMix equivalent, or perhaps actual RngCore implementations — though we didn't really want to add those to rand_core).

I would prefer to not implement it at all for CryptoRng

See #543

@dhardy dhardy force-pushed the from_u64_stable branch 2 times, most recently from 02750aa to 52563b0 Compare July 7, 2018 10:15
@vks
Copy link
Collaborator

vks commented Jul 7, 2018

@dhardy

you propose not adding to rand_core then but just recommending that non-crypto PRNGs implement this (like Isaac*Rng)? This is another option, but I still think a trait function with default implementation is better.

Yes, this is what I was suggesting. It does not support using them for just their statistical quality, as you mentioned.

Another option for seed_from_u64 would be to have no default implementation and force every PRNG to implement the method itself, while providing one or two implementation helpers in rand_core::impls.

This is similar to my suggestion, except that it would not force them to implement it.

@dhardy
Copy link
Member Author

dhardy commented Jul 8, 2018

I still think we should make all seedable RNGs implement seed_from_u64, but not having a default implementation sounds like the way to go.

Interesting point in your other thread about not implementing SeedableRng for StdRng; that's another breaking change but makes a lot of sense (the only real use of SeedableRng without reproducibility is to allow faster seeding by using a PRNG instead of OsRng; this may be useful for SmallRng but not so much for StdRng I think).

@dhardy
Copy link
Member Author

dhardy commented Jul 8, 2018

Implementation strategy for adding as a required function without default:

  • rand_core 0.3 adds the function to SeedableRng
  • rand_core 0.2 gets a compatibility release which implements all 0.3 traits for objects supporting the 0.2 traits (or just re-exports if identical); seed_from_u64 gets implemented via unimplemented!() (i.e. run-time failure on use; this allows PRNGs implementing 0.2 to continue functioning as they do now)
  • rand 0.6 depends on rand_core 0.3 and adds real implementations

@vks
Copy link
Collaborator

vks commented Jul 9, 2018

If we don't provide a default implementation, why do we want to force implementors of SeedableRng to implement seed_from_u64? We could just recommend to implement it on the struct and provide helper functions/macros.

There might be use cases for manually seeding StdRng, but those could possibly be deferred to using the underlying RNG directly (which doesn't have the associated reproducibility hazards). If we don't implement SeedableRng for StdRng, does it make sense not to provide SmallRng at all, given we don't guarantee reproducibility and it's not very useful without seeding? Probably not, SmallRng seems to be used by quite a lot of code on Github, so removing it would cause a lot of churn.

I think we might want to offer reproducibility guarantees for minor versions of Rand.

@dhardy
Copy link
Member Author

dhardy commented Jul 13, 2018

Maybe there's just insufficient use-cases to justify the existence of this method.

The better option may be a "universal seeder" based on a hash function, e.g. SipHash. The hash itself can be used to extend the result as necessary. But rather than push that functionality into another hash function implementation or into Rand proper, it may be better to create a sub-crate for it (rand_seeder).

Usage could then be something like:

let rng = Seeder::from(x).make_rng::<Xoshiro128>();

// or, with less "magic":
let mut seeder = Seeder::new();
seeder.hash(x);
let mut seed = <Xoshiro128 as SeedableRng>::Seed::default();
seeder.fill(&mut seed[..]);
let rng = Xoshiro128::from_seed(seed);

@dhardy dhardy closed this Jul 13, 2018
@burdges
Copy link
Contributor

burdges commented Jul 13, 2018

Another option would be seed_from_u128 no?

@dhardy dhardy mentioned this pull request Jul 14, 2018
@sicking
Copy link
Contributor

sicking commented Jul 15, 2018

I think seed_from_u64 is fine as-is. I don't think the risk of people using seed_from_u64 and then using it for crypto-critical stuff is any higher than someone doing

let seed = [0u8; 32];
for i in 0..4 {
    seed[i] = myseed[i];
}
let mut rng = StdRng::from_seed(seed);

or someone doing

let mut rng = StdRng::from_entropy();
let my_secure_token = Alphanumeric.sample_iter(&mut rng).take(6).collect::<String>();

I.e. we can't prevent people from doing stupid stuff. The best thing we can do is make it easier to do the right thing than the wrong thing. I.e. have an API which make it easy to get a CSPRNG which has been seeded from a good source of entropy. Then point to that API in the documentation for the CryptoRng trait and the SeedableRng trait.

I think the "recipes" in the documentation and the simplicity of the secrets object here does a lot more for security than any risk introduced by having no requirements on seed size here.

@pitdicker
Copy link
Contributor

I much liked this PR and would like to see it merged as it is now (please don't close 😄). It is a win for users porting from rand 0.3 and 0.4, for simple tests, games, and possibly simulations.

When it comes to security, we already make it easy to do the right thing with FromEntropy.

And I don't think we can really design Rand with the motto 'make it hard to do the insecure thing'. Good security practices are just not always possible to make convenient. And for many use cases of Rand convenience is an important aspect.

Since 0.4 we made good improvements for security and correctness when it comes to seeding: easy seeding from the OS, error reporting, handling of endianness, not silently accepting small sizes. But we do not have to go overboard with it, and use 'security' as a reason to not add simple, reproducible seeding that can benefit many use cases.

@pitdicker pitdicker reopened this Jul 15, 2018
@pitdicker
Copy link
Contributor

What is this discussion around removing SeedableRng from StdRng?
We had it before, and it was the reason to have a split of SeedableRng into two traits. One for from_seed and one for from_rng. Didn't we conclude at the time that one trait was more convenient, and encoding reproducibility guarantees in the type system was not really worth it?

@dhardy
Copy link
Member Author

dhardy commented Jul 15, 2018

No love for #554 then? It covers the "convenient seeding" thing nicely enough IMO, but is complicated enough to warrant being a separate crate.

One strange thing about adding seed_from_u64 to SeedableRng is that it allows impl's to override it with more "optimal" implementations — only, there's not really any reason to optimise. Because of that this addition is not as simple as it could have been (i.e. it potentially caries more costs for new RNGs and helpers like BlockRng). Because of that I'm less sure about it.

The other thing to consider is use-cases. Many users don't care about reproducibility; they don't need this. Then there are those wanting a random secret generator; they can't use it. That leaves only simulations and a few games wanting reproducibility. Using an external crate like #554 should be good enough for these people, and that one is significantly more capable (I don't want to say it's suitable for cryptography because I haven't tried doing any crypto-analysis, but it carries 256 bits of state and uses a fairly strong algorithm, so it should be good enough — though not for password hashing).

@sicking
Copy link
Contributor

sicking commented Jul 15, 2018

I think #554 is awesome. I could definitely see making a separate crate for it for the use cases that would benefit from it. But I think most use cases that wants reproducibility doesn't need something that powerful.

I agree with you that seed_from_u64 is only needed for the set "simulations and games needing reproducibility". But I think that's quite a large set. If we didn't think reproducibility was that important, I'd say that we should structure SeedableRng quite differently and only allow seeding from entropy.

But I really like the SeedableRng design as it is now, and I think the PR here really makes rand significantly more pleasant to use when reproducibility is needed, which I think is quite common.

@pitdicker
Copy link
Contributor

No love for #554 then? It covers the "convenient seeding" thing nicely enough IMO, but is complicated enough to warrant being a separate crate.

Sorry, didn't look at the open PRs yet... You have resurrected the hashing approach!
On the one hand it has potential. coming up with some 'random' seed string (let mut rng = Seeder::from("my_crate::test_run::string_test").make_rng::<XorShiftRng>()) is easier than coming up with a u64.

I still feel a bit uneasy about adding hashing to a core randomness trait, the idea we had in dhardy#18. There was no clear algorithm that fit best. But you made it an external crate, so that's good.

One strange thing about adding seed_from_u64 to SeedableRng is that it allows impl's to override it with more "optimal" implementations — only, there's not really any reason to optimise.

I see this as a nice property. Many PRNG designs come with a method to seed from a smaller integer. This extra method will give implementations the chance to implement the method chosen by the author. Seems like a nice thing to me. In theory it could also help with reproducibility from other languages, but I don't believe that is easy or worthwhile.

@pitdicker
Copy link
Contributor

I was surprised this PR didn't need rebasing after the ISAAC PRNGs were split out, but the base branch is 0.5. Probably doesn't matter much, this just means users get the feature sooner.

What needs to happen here? I see no reason why we would have to choose between seed_from_u64 and #554, they can both be options.

@dhardy
Copy link
Member Author

dhardy commented Jul 27, 2018

Yes, this should affect rand_core only, so I thought better to use the 0.5 branch (stable) to make a new rand_core release.

There is absolutely no reason we cannot have both this and #554, though #554 makes this redundant.

You are correct that allowing PRNGs to override the seed_from_u64 implementation allows compliance with the original RNG's specification and potentially faster, simpler code. There are two drawbacks to this: (1) it is more likely that SomeRng::seed_from_u64 will change in the future (although perhaps this is not a big deal), and (2) some RNGs may not handle some common seeds well with the default implementation (what would XorShiftRng::seed_from_u64(0) do? panic?); in contrast with our own implementation we can state that seed_from_u64(0) is fine and (seed_from_u64(n), seed_from_u64(n+1)) yields two fully independent RNGs.

@vks
Copy link
Collaborator

vks commented Jul 27, 2018

what would XorShiftRng::seed_from_u64(0) do? panic?

According to Rand's docs, SeedableRng::from_seed should never panic, so I don't see why seed_from_u64 should either.

@TheIronBorn
Copy link
Collaborator

It's a little late to change anything but this may be useful http://www.pcg-random.org/posts/developing-a-seed_seq-alternative.html

@dhardy
Copy link
Member Author

dhardy commented Jul 27, 2018

SeedableRng::from_seed should never panic, so I don't see why seed_from_u64 should either.

Aside from the two being independent, what should it do?

The key point here is:

  • should we have an all round good quality seeding mechanism (as linked to by @TheIronBorn and as Universal seeder #554 is, I hope),
  • or should we make it a hook to implement seeding compatible with the original PRNG specification (if specified) / most common seeding method?

Because @pitdicker appears to be proposing the latter, but I'm not keen. On the other hand, the former is redundant with #554, but simpler and less capable.

@dhardy
Copy link
Member Author

dhardy commented Aug 16, 2018

I much liked this PR and would like to see it merged as it is now (please don't close smile). It is a win for users porting from rand 0.3 and 0.4, for simple tests, games, and possibly simulations.

I guess this deserves fleshing out: unit tests often want to be deterministic (so cannot use from_entropy) but don't have much else in the way of requirements (many currently use new_unseeded). from_seed is less easy to use here, though perhaps XorShiftRng::from_seed(Default::default()) is sufficient.

Allowing XorShiftRng::seed_from_u64(0) is a decent replacement for XorShiftRng::new_unseeded.
#554 is less useful in this regard: it is an extra dependency that is otherwise unnecessary.

@dhardy dhardy mentioned this pull request Aug 23, 2018
28 tasks
@dhardy
Copy link
Member Author

dhardy commented Aug 23, 2018

Note on licence: PCG is Apache 2.0 only, so we would have to make a slight exception here instead of the dual MIT / Apache 2.0 we use elsewhere (unless @imneme agrees otherwise).

@imneme
Copy link

imneme commented Aug 23, 2018

Note on licence: PCG is Apache 2.0 only, so we would have to make a slight exception here instead of the dual MIT / Apache 2.0 we use elsewhere (unless @imneme agrees otherwise).

To make various people happy, I relicensed the C++ version of the code to MIT/Apache 2.0 a while back. I have no problem with a similar relicense of the C code; the only reason I've not done so is the hassle of making a patch to update all the notices. If someone submits a pull request to update the license for the C code, I'd happily accept it (check out the equivalent patch for the C++ version for what's needed).

But if you just want my okay, it's okay.

@dhardy
Copy link
Member Author

dhardy commented Aug 24, 2018

Your word is good enough for me. This is actually based off the snipped on your download page.

Since I got your attention @imneme, is there any particular reason for preferring the hash "function" constructed in your blog post over PCG for seeding RNGs? And is there any reason for concern when using PCG to seed PCG?

As far as I can see PCG has a little better mixing but both are probably sufficient.

@dhardy
Copy link
Member Author

dhardy commented Sep 3, 2018

Well, this PR has been open for comment for two months now, and response is mostly positive. I've been hesitant because once people start using it, almost any tweaking becomes a breaking change. This thing has two use-cases (unit tests and reproducible simulations) and meets the requirements for those use-cases.

The remaining question is whether SeedableRng implementations should provide their own implementations of this function or not. There is sometimes a little rationale for this (compatibility with other seeding routines), but I will suggest that custom implementations are not used for non-crypto PRNGs unless (a) "simple" input values like 0 and 1 produce good sequences and (b) any two "close" inputs (like x, x+1) should produce fully independent sequences. Note that this is a non-issue for CSPRNGs but can be an issue for some PRNGs.

@dhardy dhardy merged commit 24fbd1f into rust-random:0.5 Sep 3, 2018
@dhardy dhardy deleted the from_u64_stable branch February 15, 2019 10:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
D-review Do: needs review E-question Participation: opinions wanted F-new-int Functionality: new, within Rand
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants