Implement SeedableRng::seed_from_u64 #537

dhardy · 2018-06-29T19:56:51Z

Implements #522

We never quite came to a conclusion on the algorithm to use, but I don't see any reason not to use PCG32. It's reasonably good quality, reasonably fast and should work fine on 32-bit CPUs too.

vks · 2018-07-05T08:39:24Z

rand_core/src/lib.rs

+
+        for (i1, r1) in results.iter().enumerate() {
+            let weight = r1.count_ones();
+            assert!(weight >= 20);


Maybe add a comment how likely this is to fail if the seeds were random.

This is the binomial distribution B(64, 0.5), so chance of failure is F(20; 64, 0.5) using the notation on wikipedia, which is 0.000781 (0.07%) according to some calculator I found online.

vks · 2018-07-05T08:43:40Z

Maybe we should evaluate the different choices of RNG to see how they recover from a seed with lots of zeros?

dhardy · 2018-07-05T09:53:56Z

Go ahead if you like. Since the basis of this is new_state = state * 6364136223846793005 + 11634580027462260723 I think it should get away from zero fast 😄

vks · 2018-07-05T13:14:05Z

Go ahead if you like.

I'm working on it. 😄

pitdicker · 2018-07-05T15:00:20Z

I agree PCG is a better choice than SplitMix here.

I added 0x9E3779B97F4A7C15 (golden ratio * 2⁶⁴) to the initial seed to avoid week seeds with SplitMix, using PCG instead is just cleaner.

If you see SplitMix as repeatedly using a hash with good avalanche (MurmurHash) to extend the seed it seems theoretically a little nicer. But I can't imagine a situation where the results from PCG aren't good enough.

@dhardy What do you think of including something like a fill_slice_by_repeating helper method in rand_core, and using that for ChaChaRng and Hc128Rng?

And can you update ISAAC to move their seed_from_u64 methods to SeedableRng?

vks · 2018-07-05T15:07:51Z

@pitdicker Which seeds for SplitMix are weak?

pitdicker · 2018-07-05T15:24:25Z

@pitdicker Which seeds for SplitMix are weak?

I'll take it back, that was probably an unnecessary step (wrote to much from memory).

vks · 2018-07-06T08:30:52Z

I'll take it back, that was probably an unnecessary step (wrote to much from memory).

Okay, are you still saying PCG is a better choice? I'm curious for what reason.

dhardy · 2018-07-06T09:24:52Z

If you see SplitMix as repeatedly using a hash with good avalanche (MurmurHash) to extend the seed it seems theoretically a little nicer.

Not sure what you mean; using MurmurHash repeatedly sounds like a completely different PRNG.

Essentially SplitMix is simply a Weyl sequence (new_state = old + A) plus an output function, and PCG is an LCG (new_state = old * A + B) plus an output function. I have a feeling SplitMix is recommended for the Xo* families of PRNGs because it is simple and behaves quite differently from those PRNGs; I see no reason why PCG won't work just as well, and in general it is a somewhat better PRNG. (I also don't think there would be any issues seeding PCG from itself, though I don't know for sure.) Getting bored of this topic though; I don't think it matters much here.

What do you think of including something like a fill_slice_by_repeating helper method in rand_core, and using that for ChaChaRng and Hc128Rng?

Simply repeating the seed should work fine for those PRNGs but so should just using the default implementation of seed_from_u64. I don't think either option has any real advantage over the other (both are useless for crypto because we start with only 64 bits; other than that quality should be good either way), so my question is why bother? Potentially it does let us change seed_from_u64 later without value-breaking these PRNGs, but we don't want to change it anyway.

And can you update ISAAC to move their seed_from_u64 methods to SeedableRng?

👍 (And I guess in this case it makes sense to avoid value-breakage.)

vks · 2018-07-06T11:42:05Z

I also don't think there would be any issues seeding PCG from itself, though I don't know for sure.

It was shown that using the same RNG for seeding as for generating can introduce correlations. I would not feel comfortable doing this without testing.

vks · 2018-07-06T12:31:32Z

Not sure what you mean; using MurmurHash repeatedly sounds like a completely different PRNG.

I think he meant that splitmix64 uses a MurmurHash avalanche function as its output function.

@dhardy
I counted the average number of zeros in the first output after seeding with a random sparse integer, and both pcg32 and splitmix64 seem to get rid of the zeros in the initial seed reliably. I could not see any significant difference between the two. (I think pcg32 was slower, but I don't believe this matters much for this application.) Both are good choices in my opinion.

I still would not implement Pcg::seed_from_u64 using pcg32 though, and I would prefer to not implement it at all for CryptoRng. It is true that you might want to use a CryptoRng just for its statistical quality, but I can't think of a case where you don't care about predictability and couldn't use one of the weaker RNGs. (This ship has sailed for IsaacRng::new_from_u64, but it is deprecated anyway.) My suggestion is to implement seed_from_u64 for the RNG directly instead of for SeedableRng.

dhardy · 2018-07-06T16:47:12Z

@vks you propose not adding to rand_core then but just recommending that non-crypto PRNGs implement this (like Isaac*Rng)? This is another option, but I still think a trait function with default implementation is better.

I can't think of a case where you don't care about predictability and couldn't use one of the weaker RNGs

Not couldn't, but may not (always) want to. Scientific simulations (stochastic models) are models with obvious limitations, but it is of course desirable to rule out as many sources of bias as possible — therefore it may be desirable to use a cryptographic generator (or multiple generators) just for quality and/or diversity. Of course the seed used doesn't matter, but it may be desirable to make results reproducible (e.g. a project I was working on used volunteer computing and replication of some work units as a measure of protection against bad results).

dhardy · 2018-07-07T09:50:36Z

I added specific implementations for the ISAAC generators as suggested. General notes:

BlockRng implementations require three levels of definitions of SeedableRng functions. Not really a concern here, just a limitation of the design.
documentation on the implementation of seed_from_u64 is included in the generated documentation but is somewhat hidden; this is probably fine however (in this case just a note that "seed 0" corresponds to the reference implementation)
The ISSAC implementations simply set the first 64 bits of the key and leave the rest 0, which for this PRNG (and any crypto PRNG) should be fine.

Another option for seed_from_u64 would be to have no default implementation and force every PRNG to implement the method itself, while providing one or two implementation helpers in rand_core::impls. (This would be a breaking change, but we can add a compatibility release bridging to rand_core 0.3.)

In a couple of ways, not having a default implementation is better:

forces each PRNG to choose its own version
avoids implicitly adding seed_from_u64 to PRNGs which must then have a value-breaking change if a custom implementation is used

The disadvantage is that we then need to add implementation helpers (perhaps fn stretch_seed_pcg32(seed: u64, key: &mut [u8]) and SplitMix equivalent, or perhaps actual RngCore implementations — though we didn't really want to add those to rand_core).

I would prefer to not implement it at all for CryptoRng

See #543

vks · 2018-07-07T13:44:32Z

@dhardy

you propose not adding to rand_core then but just recommending that non-crypto PRNGs implement this (like Isaac*Rng)? This is another option, but I still think a trait function with default implementation is better.

Yes, this is what I was suggesting. It does not support using them for just their statistical quality, as you mentioned.

Another option for seed_from_u64 would be to have no default implementation and force every PRNG to implement the method itself, while providing one or two implementation helpers in rand_core::impls.

This is similar to my suggestion, except that it would not force them to implement it.

dhardy · 2018-07-08T13:08:41Z

I still think we should make all seedable RNGs implement seed_from_u64, but not having a default implementation sounds like the way to go.

Interesting point in your other thread about not implementing SeedableRng for StdRng; that's another breaking change but makes a lot of sense (the only real use of SeedableRng without reproducibility is to allow faster seeding by using a PRNG instead of OsRng; this may be useful for SmallRng but not so much for StdRng I think).

dhardy · 2018-07-08T13:37:23Z

Implementation strategy for adding as a required function without default:

rand_core 0.3 adds the function to SeedableRng
rand_core 0.2 gets a compatibility release which implements all 0.3 traits for objects supporting the 0.2 traits (or just re-exports if identical); seed_from_u64 gets implemented via unimplemented!() (i.e. run-time failure on use; this allows PRNGs implementing 0.2 to continue functioning as they do now)
rand 0.6 depends on rand_core 0.3 and adds real implementations

vks · 2018-07-09T14:56:21Z

If we don't provide a default implementation, why do we want to force implementors of SeedableRng to implement seed_from_u64? We could just recommend to implement it on the struct and provide helper functions/macros.

There might be use cases for manually seeding StdRng, but those could possibly be deferred to using the underlying RNG directly (which doesn't have the associated reproducibility hazards). If we don't implement SeedableRng for StdRng, does it make sense not to provide SmallRng at all, given we don't guarantee reproducibility and it's not very useful without seeding? Probably not, SmallRng seems to be used by quite a lot of code on Github, so removing it would cause a lot of churn.

I think we might want to offer reproducibility guarantees for minor versions of Rand.

dhardy · 2018-07-13T15:20:30Z

Maybe there's just insufficient use-cases to justify the existence of this method.

The better option may be a "universal seeder" based on a hash function, e.g. SipHash. The hash itself can be used to extend the result as necessary. But rather than push that functionality into another hash function implementation or into Rand proper, it may be better to create a sub-crate for it (rand_seeder).

Usage could then be something like:

let rng = Seeder::from(x).make_rng::<Xoshiro128>();

// or, with less "magic":
let mut seeder = Seeder::new();
seeder.hash(x);
let mut seed = <Xoshiro128 as SeedableRng>::Seed::default();
seeder.fill(&mut seed[..]);
let rng = Xoshiro128::from_seed(seed);

burdges · 2018-07-13T20:08:09Z

Another option would be seed_from_u128 no?

sicking · 2018-07-15T03:42:16Z

I think seed_from_u64 is fine as-is. I don't think the risk of people using seed_from_u64 and then using it for crypto-critical stuff is any higher than someone doing

let seed = [0u8; 32];
for i in 0..4 {
    seed[i] = myseed[i];
}
let mut rng = StdRng::from_seed(seed);

or someone doing

let mut rng = StdRng::from_entropy();
let my_secure_token = Alphanumeric.sample_iter(&mut rng).take(6).collect::<String>();

I.e. we can't prevent people from doing stupid stuff. The best thing we can do is make it easier to do the right thing than the wrong thing. I.e. have an API which make it easy to get a CSPRNG which has been seeded from a good source of entropy. Then point to that API in the documentation for the CryptoRng trait and the SeedableRng trait.

I think the "recipes" in the documentation and the simplicity of the secrets object here does a lot more for security than any risk introduced by having no requirements on seed size here.

pitdicker · 2018-07-15T06:53:15Z

I much liked this PR and would like to see it merged as it is now (please don't close 😄). It is a win for users porting from rand 0.3 and 0.4, for simple tests, games, and possibly simulations.

When it comes to security, we already make it easy to do the right thing with FromEntropy.

And I don't think we can really design Rand with the motto 'make it hard to do the insecure thing'. Good security practices are just not always possible to make convenient. And for many use cases of Rand convenience is an important aspect.

Since 0.4 we made good improvements for security and correctness when it comes to seeding: easy seeding from the OS, error reporting, handling of endianness, not silently accepting small sizes. But we do not have to go overboard with it, and use 'security' as a reason to not add simple, reproducible seeding that can benefit many use cases.

pitdicker · 2018-07-15T06:56:42Z

What is this discussion around removing SeedableRng from StdRng?
We had it before, and it was the reason to have a split of SeedableRng into two traits. One for from_seed and one for from_rng. Didn't we conclude at the time that one trait was more convenient, and encoding reproducibility guarantees in the type system was not really worth it?

dhardy · 2018-07-15T07:49:40Z

No love for #554 then? It covers the "convenient seeding" thing nicely enough IMO, but is complicated enough to warrant being a separate crate.

One strange thing about adding seed_from_u64 to SeedableRng is that it allows impl's to override it with more "optimal" implementations — only, there's not really any reason to optimise. Because of that this addition is not as simple as it could have been (i.e. it potentially caries more costs for new RNGs and helpers like BlockRng). Because of that I'm less sure about it.

The other thing to consider is use-cases. Many users don't care about reproducibility; they don't need this. Then there are those wanting a random secret generator; they can't use it. That leaves only simulations and a few games wanting reproducibility. Using an external crate like #554 should be good enough for these people, and that one is significantly more capable (I don't want to say it's suitable for cryptography because I haven't tried doing any crypto-analysis, but it carries 256 bits of state and uses a fairly strong algorithm, so it should be good enough — though not for password hashing).

sicking · 2018-07-15T08:59:09Z

I think #554 is awesome. I could definitely see making a separate crate for it for the use cases that would benefit from it. But I think most use cases that wants reproducibility doesn't need something that powerful.

I agree with you that seed_from_u64 is only needed for the set "simulations and games needing reproducibility". But I think that's quite a large set. If we didn't think reproducibility was that important, I'd say that we should structure SeedableRng quite differently and only allow seeding from entropy.

But I really like the SeedableRng design as it is now, and I think the PR here really makes rand significantly more pleasant to use when reproducibility is needed, which I think is quite common.

pitdicker · 2018-07-15T13:17:30Z

No love for #554 then? It covers the "convenient seeding" thing nicely enough IMO, but is complicated enough to warrant being a separate crate.

Sorry, didn't look at the open PRs yet... You have resurrected the hashing approach!
On the one hand it has potential. coming up with some 'random' seed string (let mut rng = Seeder::from("my_crate::test_run::string_test").make_rng::<XorShiftRng>()) is easier than coming up with a u64.

I still feel a bit uneasy about adding hashing to a core randomness trait, the idea we had in dhardy#18. There was no clear algorithm that fit best. But you made it an external crate, so that's good.

One strange thing about adding seed_from_u64 to SeedableRng is that it allows impl's to override it with more "optimal" implementations — only, there's not really any reason to optimise.

I see this as a nice property. Many PRNG designs come with a method to seed from a smaller integer. This extra method will give implementations the chance to implement the method chosen by the author. Seems like a nice thing to me. In theory it could also help with reproducibility from other languages, but I don't believe that is easy or worthwhile.

pitdicker · 2018-07-18T20:41:08Z

I was surprised this PR didn't need rebasing after the ISAAC PRNGs were split out, but the base branch is 0.5. Probably doesn't matter much, this just means users get the feature sooner.

What needs to happen here? I see no reason why we would have to choose between seed_from_u64 and #554, they can both be options.

dhardy · 2018-07-27T15:39:47Z

Yes, this should affect rand_core only, so I thought better to use the 0.5 branch (stable) to make a new rand_core release.

There is absolutely no reason we cannot have both this and #554, though #554 makes this redundant.

You are correct that allowing PRNGs to override the seed_from_u64 implementation allows compliance with the original RNG's specification and potentially faster, simpler code. There are two drawbacks to this: (1) it is more likely that SomeRng::seed_from_u64 will change in the future (although perhaps this is not a big deal), and (2) some RNGs may not handle some common seeds well with the default implementation (what would XorShiftRng::seed_from_u64(0) do? panic?); in contrast with our own implementation we can state that seed_from_u64(0) is fine and (seed_from_u64(n), seed_from_u64(n+1)) yields two fully independent RNGs.

vks · 2018-07-27T17:02:49Z

what would XorShiftRng::seed_from_u64(0) do? panic?

According to Rand's docs, SeedableRng::from_seed should never panic, so I don't see why seed_from_u64 should either.

TheIronBorn · 2018-07-27T18:38:58Z

It's a little late to change anything but this may be useful http://www.pcg-random.org/posts/developing-a-seed_seq-alternative.html

dhardy · 2018-07-27T19:22:40Z

SeedableRng::from_seed should never panic, so I don't see why seed_from_u64 should either.

Aside from the two being independent, what should it do?

The key point here is:

should we have an all round good quality seeding mechanism (as linked to by @TheIronBorn and as Universal seeder #554 is, I hope),
or should we make it a hook to implement seeding compatible with the original PRNG specification (if specified) / most common seeding method?

Because @pitdicker appears to be proposing the latter, but I'm not keen. On the other hand, the former is redundant with #554, but simpler and less capable.

dhardy · 2018-08-16T13:26:10Z

I much liked this PR and would like to see it merged as it is now (please don't close smile). It is a win for users porting from rand 0.3 and 0.4, for simple tests, games, and possibly simulations.

I guess this deserves fleshing out: unit tests often want to be deterministic (so cannot use from_entropy) but don't have much else in the way of requirements (many currently use new_unseeded). from_seed is less easy to use here, though perhaps XorShiftRng::from_seed(Default::default()) is sufficient.

Allowing XorShiftRng::seed_from_u64(0) is a decent replacement for XorShiftRng::new_unseeded.
#554 is less useful in this regard: it is an extra dependency that is otherwise unnecessary.

dhardy · 2018-08-23T18:08:48Z

Note on licence: PCG is Apache 2.0 only, so we would have to make a slight exception here instead of the dual MIT / Apache 2.0 we use elsewhere (unless @imneme agrees otherwise).

imneme · 2018-08-23T18:25:41Z

Note on licence: PCG is Apache 2.0 only, so we would have to make a slight exception here instead of the dual MIT / Apache 2.0 we use elsewhere (unless @imneme agrees otherwise).

To make various people happy, I relicensed the C++ version of the code to MIT/Apache 2.0 a while back. I have no problem with a similar relicense of the C code; the only reason I've not done so is the hassle of making a patch to update all the notices. If someone submits a pull request to update the license for the C code, I'd happily accept it (check out the equivalent patch for the C++ version for what's needed).

But if you just want my okay, it's okay.

dhardy · 2018-08-24T08:05:46Z

Your word is good enough for me. This is actually based off the snipped on your download page.

Since I got your attention @imneme, is there any particular reason for preferring the hash "function" constructed in your blog post over PCG for seeding RNGs? And is there any reason for concern when using PCG to seed PCG?

As far as I can see PCG has a little better mixing but both are probably sufficient.

dhardy · 2018-09-03T16:42:18Z

Well, this PR has been open for comment for two months now, and response is mostly positive. I've been hesitant because once people start using it, almost any tweaking becomes a breaking change. This thing has two use-cases (unit tests and reproducible simulations) and meets the requirements for those use-cases.

The remaining question is whether SeedableRng implementations should provide their own implementations of this function or not. There is sometimes a little rationale for this (compatibility with other seeding routines), but I will suggest that custom implementations are not used for non-crypto PRNGs unless (a) "simple" input values like 0 and 1 produce good sequences and (b) any two "close" inputs (like x, x+1) should produce fully independent sequences. Note that this is a non-issue for CSPRNGs but can be an issue for some PRNGs.

dhardy force-pushed the from_u64_stable branch from 8dda820 to 1818b99 Compare July 1, 2018 09:20

dhardy added E-question Participation: opinions wanted F-new-int Functionality: new, within Rand D-review Do: needs review labels Jul 1, 2018

vks reviewed Jul 5, 2018

View reviewed changes

dhardy mentioned this pull request Jul 7, 2018

Is CryptoRng useful? #543

Closed

dhardy force-pushed the from_u64_stable branch 2 times, most recently from 02750aa to 52563b0 Compare July 7, 2018 10:15

dhardy closed this Jul 13, 2018

dhardy mentioned this pull request Jul 14, 2018

Universal seeder #554

Closed

pitdicker reopened this Jul 15, 2018

pitdicker approved these changes Jul 15, 2018

View reviewed changes

dhardy added 2 commits July 16, 2018 11:13

Implement SeedableRng::seed_from_u64

d38fd3e

Isaac*Rng: replace new_from_u64 with seed_from_u64

f2d813e

dhardy force-pushed the from_u64_stable branch from 52563b0 to f2d813e Compare July 16, 2018 10:14

dhardy mentioned this pull request Aug 23, 2018

Tracker: 0.6 release #520

Closed

28 tasks

seed_from_u64: tweak documentation and test

16291b6

dhardy merged commit 24fbd1f into rust-random:0.5 Sep 3, 2018

dhardy deleted the from_u64_stable branch February 15, 2019 10:58

Implement SeedableRng::seed_from_u64 #537

Implement SeedableRng::seed_from_u64 #537

Conversation

dhardy commented Jun 29, 2018

vks Jul 5, 2018

Choose a reason for hiding this comment

dhardy Jul 5, 2018 • edited Loading

Choose a reason for hiding this comment

vks commented Jul 5, 2018

dhardy commented Jul 5, 2018

vks commented Jul 5, 2018

pitdicker commented Jul 5, 2018

vks commented Jul 5, 2018

pitdicker commented Jul 5, 2018

vks commented Jul 6, 2018

dhardy commented Jul 6, 2018

vks commented Jul 6, 2018

vks commented Jul 6, 2018 • edited Loading

dhardy commented Jul 6, 2018

dhardy commented Jul 7, 2018

vks commented Jul 7, 2018

dhardy commented Jul 8, 2018

dhardy commented Jul 8, 2018

vks commented Jul 9, 2018 • edited Loading

dhardy commented Jul 13, 2018

burdges commented Jul 13, 2018

sicking commented Jul 15, 2018

pitdicker commented Jul 15, 2018

pitdicker commented Jul 15, 2018

dhardy commented Jul 15, 2018

sicking commented Jul 15, 2018

pitdicker commented Jul 15, 2018

pitdicker commented Jul 18, 2018

dhardy commented Jul 27, 2018 • edited Loading

vks commented Jul 27, 2018

TheIronBorn commented Jul 27, 2018

dhardy commented Jul 27, 2018

dhardy commented Aug 16, 2018

dhardy commented Aug 23, 2018

imneme commented Aug 23, 2018

dhardy commented Aug 24, 2018

dhardy commented Sep 3, 2018

dhardy Jul 5, 2018 •

edited

Loading

vks commented Jul 6, 2018 •

edited

Loading

vks commented Jul 9, 2018 •

edited

Loading

dhardy commented Jul 27, 2018 •

edited

Loading