Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Universal seeder #554

Closed
wants to merge 5 commits into from
Closed

Universal seeder #554

wants to merge 5 commits into from

Conversation

dhardy
Copy link
Member

@dhardy dhardy commented Jul 14, 2018

This is much more powerful than seed_from_u64 (#537).

let mut rng = Seeder::from("stripy zebra").make_rng::<XorShiftRng>()

SipRng code

SipHash is a keyed hash function optimised for speed on short messages. I adapted this to support unlimited length output with SipRng.

Quality should be roughly crypto-grade, though I haven't attempted any kind of crypto-analysis and would not like to bet on the strength without at least some further review.

Unsurprisingly, PractRand hasn't picked up any issues (two "unusual" items under 64 GiB on this run, none up to 128 GiB with a slightly different construction).

Performance is reasonable though of course not close to fast RNGs (I didn't bother using caching for next_u32 here; for seeding at least it's not useful):

test gen_bytes_sip         ... bench:     701,138 ns/iter (+/- 15,817) = 1460 MB/s
test gen_u32_sip           ... bench:       5,167 ns/iter (+/- 145) = 774 MB/s
test gen_u64_sip           ... bench:       5,212 ns/iter (+/- 98) = 1534 MB/s
test init_sip              ... bench:          25 ns/iter (+/- 1)
# for comparison:
test gen_bytes_hc128       ... bench:     448,607 ns/iter (+/- 35,948) = 2282 MB/s
test gen_u32_hc128         ... bench:       2,196 ns/iter (+/- 214) = 1821 MB/s
test gen_u64_hc128         ... bench:       3,938 ns/iter (+/- 240) = 2031 MB/s
test init_hc128            ... bench:       4,863 ns/iter (+/- 382)
test gen_bytes_xorshift    ... bench:     298,913 ns/iter (+/- 8,830) = 3425 MB/s
test gen_u32_xorshift      ... bench:       1,206 ns/iter (+/- 54) = 3316 MB/s
test gen_u64_xorshift      ... bench:       1,804 ns/iter (+/- 107) = 4434 MB/s
test init_xorshift         ... bench:          13 ns/iter (+/- 0)

For the intended usage, seeding other RNGs, this performance is perfectly adequate.

Note that the Seeder type is mostly for convenience. We could just recommend using from_rng, except that this method is currently documented as not being value-stable (which thus defeats the whole point of this code).

I may still make some tweaks to this, but so far it looks quite nice to me. Comments welcome.

@dhardy
Copy link
Member Author

dhardy commented Jul 14, 2018

Small issue: this requires u128 since it converts usize to u128 for portability. So it needs Rust 1.26 (though we could just convert to two u64 values instead for now, since I don't think Rust targets any platforms with 128 bit pointers yet).

@dhardy
Copy link
Member Author

dhardy commented Jul 15, 2018

This proves there are no stupid mistakes, I guess. No surprises really:

~/rust/rand/rand_sip$ ../target/release/examples/cat | practrand stdin64
RNG_test using PractRand version 0.93
RNG = RNG_stdin64, seed = 0xf050c744
test set = normal, folding = standard (64 bit)

rng=RNG_stdin64, seed=0xf050c744
length= 256 megabytes (2^28 bytes), time= 3.4 seconds
  no anomalies in 159 test result(s)

rng=RNG_stdin64, seed=0xf050c744
length= 512 megabytes (2^29 bytes), time= 7.1 seconds
  no anomalies in 169 test result(s)

rng=RNG_stdin64, seed=0xf050c744
length= 1 gigabyte (2^30 bytes), time= 14.3 seconds
  no anomalies in 180 test result(s)

rng=RNG_stdin64, seed=0xf050c744
length= 2 gigabytes (2^31 bytes), time= 28.0 seconds
  no anomalies in 191 test result(s)

rng=RNG_stdin64, seed=0xf050c744
length= 4 gigabytes (2^32 bytes), time= 55.7 seconds
  no anomalies in 201 test result(s)

rng=RNG_stdin64, seed=0xf050c744
length= 8 gigabytes (2^33 bytes), time= 110 seconds
  no anomalies in 212 test result(s)

rng=RNG_stdin64, seed=0xf050c744
length= 16 gigabytes (2^34 bytes), time= 217 seconds
  Test Name                         Raw       Processed     Evaluation
  [Low16/64]FPF-14+6/16:cross       R=  +3.9  p =  1.1e-3   unusual          
  ...and 222 test result(s) without anomalies

rng=RNG_stdin64, seed=0xf050c744
length= 32 gigabytes (2^35 bytes), time= 428 seconds
  Test Name                         Raw       Processed     Evaluation
  [Low1/64]BCFN(2+0,13-1,T)         R=  -7.0  p =1-1.3e-3   unusual          
  ...and 232 test result(s) without anomalies

rng=RNG_stdin64, seed=0xf050c744
length= 64 gigabytes (2^36 bytes), time= 855 seconds
  no anomalies in 244 test result(s)

rng=RNG_stdin64, seed=0xf050c744
length= 128 gigabytes (2^37 bytes), time= 8532 seconds
  no anomalies in 255 test result(s)

rng=RNG_stdin64, seed=0xf050c744
length= 256 gigabytes (2^38 bytes), time= 10195 seconds
  Test Name                         Raw       Processed     Evaluation
  [Low4/64]BCFN(2+2,13-0,T)         R=  +8.0  p =  8.6e-4   unusual          
  ...and 264 test result(s) without anomalies

rng=RNG_stdin64, seed=0xf050c744
length= 512 gigabytes (2^39 bytes), time= 13601 seconds
  no anomalies in 276 test result(s)

rng=RNG_stdin64, seed=0xf050c744
length= 1 terabyte (2^40 bytes), time= 20634 seconds                                                                                   
  no anomalies in 287 test result(s)                                                                                                   
                                                                                                                                       
rng=RNG_stdin64, seed=0xf050c744                                                                                                       
length= 2 terabytes (2^41 bytes), time= 34936 seconds                                                                                  
  no anomalies in 297 test result(s)                                                                                                   
                                                                                                                                       
rng=RNG_stdin64, seed=0xf050c744                                                                                                       
length= 4 terabytes (2^42 bytes), time= 61827 seconds                                                                                  
  no anomalies in 308 test result(s)                                                                                                   

@sicking
Copy link
Contributor

sicking commented Jul 15, 2018

In general I think this looks great. Two very minor comments that I don't feel strongly about.

Could we use a more discoverable name, like rand_seeder or hash_seeder or some such? At least I make no association rand_sip and seeding.

Syntax like XorShiftRng::seed_from_hashable(...) might fit better with other "constructors". But I haven't thought through if that's implementable.

I haven't reviewed in detail, but happy to do so.

@dhardy
Copy link
Member Author

dhardy commented Jul 15, 2018

I originally named it rand_seeder but figured it's also a SIP crate... but we can use that name.

Using an extension trait is technically possible but not my preferred syntax.

@dhardy
Copy link
Member Author

dhardy commented Jul 27, 2018

Updated. We don't need to use turbo-fish syntax:

-let mut rng = Seeder::from("stripy zebra").make_rng::<XorShiftRng>();
+let mut rng: XorShiftRng = Seeder::from("stripy zebra").make_rng();

Syntax like XorShiftRng::seed_from_hashable(...) might fit better with other "constructors".

This is possible with an extension trait (like FromEntropy). But is it desirable? The advantage of the current approach is that one seeder can happily produce multiple RNGs — though this may not be used very often.

I'm also wondering whether we should re-think SeedableRng::from_rng (#572) to be "value stable", in which case there's less need for Seeder since XorShiftRng::from_rng(SipRng::from(123)) would also work (with another From implementation).

@dhardy
Copy link
Member Author

dhardy commented Jul 27, 2018

Still needs a fix for Rust < 1.26 (i128); wait on #571

@dhardy dhardy mentioned this pull request Jul 31, 2018
28 tasks
@vks
Copy link
Collaborator

vks commented Aug 14, 2018

Small issue: this requires u128 since it converts usize to u128 for portability. So it needs Rust 1.26 (though we could just convert to two u64 values instead for now, since I don't think Rust targets any platforms with 128 bit pointers yet).

I think converting usize to u64 is fine for now. We can add a test that fails if usize doesn't fit into a u64.

Still needs a fix for Rust < 1.26 (i128); wait on #571

#571 was merged.

From the code:

/// Although the SipHash algorithm is considered to be generally strong,
/// it is not intended for cryptographic purposes. As such, all
/// cryptographic uses of this implementation are _strongly discouraged_.

Maybe we want to use a stronger hash algorithm? However, you discourage from using this method of seeding for cryptographic purposes anyway. (Seeding with a low-entropy password might require an expensive key-derivation function, making it unsuitable for from_seed_u64 use cases).

0x8f6092dd2692af57, 0xbdf362ab8e29260b];
// for _ in 0..8 {
// println!("0x{:x}, 0x{:x},", rng.next_u64(), rng.next_u64());
// }
Copy link
Collaborator

@vks vks Aug 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can probably uncomment this code. The output will only be shown if the test fails.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory it makes the tests slower to run.


Sip24Rounds::c_rounds(&mut self.state);

self.state.v0 ^ self.state.v1 ^ self.state.v2 ^ self.state.v3
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your comment above refers to this code, right? What is the motivation for this construction?

Copy link
Member Author

@dhardy dhardy Aug 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

XORing the state together? No, I guess you mean using two rounds here?

Well, the default construction of SIP hash uses two rounds between each input consumed and four rounds after consuming all input (although this is configurable, and the authors invite others to try attacking the hash function with different constructions, even with zero rounds at the end).

Using two rounds here, and two rounds when switching from input consuming to and output phases, mirrors the input.

I'm not qualified to say anything about the security, but it mirrors the input design and seems a reasonable construction. I was wondering about asking the SIP hash authors for their input on this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should add that all the variants of SIP hash mutate the state somehow between each set of rounds, in addition to mutation by input. Mutating the adjustor (adj) is my invention to add unique input each time; as I understand the bit propagation of the "rounds" is perhaps good enough alone, but I believe this increases the strength vs any kind of analysis.

Of course, making SIP rng stronger vs crytoanalysis is beyond our requirements, but still seems prudent to do what's simple.

// This is supposed to be d - c rounds (here 4 - 2 = 2)
S::c_rounds(&mut state);

SipRng::from_state(state)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this come from and how does it relate to the original algorithm?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the original: https://github.com/rust-lang/rust/blob/master/src/libcore/hash/sip.rs#L334

It looks like I missed the tweak to state.v2 before the end rounds — though the state is tweaked before the next two rounds anyway. Might be better with an extra tweak then but not too important probably.

I originally tried to make the first output match the initial one (or the first two from the 128-bit output version), but this wasn't so easy to turn into a nice RNG (because one version mutates two different parts of the state, and the other doesn't mutate any state between these four rounds).

@vks
Copy link
Collaborator

vks commented Aug 14, 2018

If this is to replace #537, I think we should add some prominent examples in Rand's documentation for the from_seed_u64 use cases.

@dhardy dhardy mentioned this pull request Aug 23, 2018
@dhardy dhardy added the P-postpone Waiting on something else label Sep 22, 2018
@dhardy
Copy link
Member Author

dhardy commented Sep 22, 2018

#537 (seed_from_u64) is now merged.

I still think this PR is interesting but not a priority, so postponing for now. If anyone wants to take this further, you could start by reviewing this PR.

@dhardy dhardy mentioned this pull request Oct 8, 2018
@dhardy
Copy link
Member Author

dhardy commented Oct 8, 2018

Note: with four u64 outputs, there is essentially a bijection from internal state to output which may make it easier to reverse the hash to recover the input (though since mixing also happens during the input phase and between input and output phases, there is still significant protection).

@dhardy
Copy link
Member Author

dhardy commented Sep 16, 2019

It doesn't make sense to merge this new crate into this repo; we are already discussing removing other content to reduce the size of this repo (#885). This code is also not a core part of the Rand project.

However, this code may have some use for somebody and seems to be in reasonable shape. Therefore it seems sensible to create a new rust-random/seeder repo and publish this as rand_seeder.

@dhardy
Copy link
Member Author

dhardy commented Oct 12, 2019

This now lives at https://github.com/rust-random/seeder

@dhardy dhardy closed this Oct 12, 2019
@dhardy dhardy mentioned this pull request Oct 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P-postpone Waiting on something else
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants