Universal seeder #554

dhardy · 2018-07-14T12:22:09Z

This is much more powerful than seed_from_u64 (#537).

let mut rng = Seeder::from("stripy zebra").make_rng::<XorShiftRng>()

SipRng code

SipHash is a keyed hash function optimised for speed on short messages. I adapted this to support unlimited length output with SipRng.

Quality should be roughly crypto-grade, though I haven't attempted any kind of crypto-analysis and would not like to bet on the strength without at least some further review.

Unsurprisingly, PractRand hasn't picked up any issues (two "unusual" items under 64 GiB on this run, none up to 128 GiB with a slightly different construction).

Performance is reasonable though of course not close to fast RNGs (I didn't bother using caching for next_u32 here; for seeding at least it's not useful):

test gen_bytes_sip         ... bench:     701,138 ns/iter (+/- 15,817) = 1460 MB/s
test gen_u32_sip           ... bench:       5,167 ns/iter (+/- 145) = 774 MB/s
test gen_u64_sip           ... bench:       5,212 ns/iter (+/- 98) = 1534 MB/s
test init_sip              ... bench:          25 ns/iter (+/- 1)
# for comparison:
test gen_bytes_hc128       ... bench:     448,607 ns/iter (+/- 35,948) = 2282 MB/s
test gen_u32_hc128         ... bench:       2,196 ns/iter (+/- 214) = 1821 MB/s
test gen_u64_hc128         ... bench:       3,938 ns/iter (+/- 240) = 2031 MB/s
test init_hc128            ... bench:       4,863 ns/iter (+/- 382)
test gen_bytes_xorshift    ... bench:     298,913 ns/iter (+/- 8,830) = 3425 MB/s
test gen_u32_xorshift      ... bench:       1,206 ns/iter (+/- 54) = 3316 MB/s
test gen_u64_xorshift      ... bench:       1,804 ns/iter (+/- 107) = 4434 MB/s
test init_xorshift         ... bench:          13 ns/iter (+/- 0)

For the intended usage, seeding other RNGs, this performance is perfectly adequate.

Note that the Seeder type is mostly for convenience. We could just recommend using from_rng, except that this method is currently documented as not being value-stable (which thus defeats the whole point of this code).

I may still make some tweaks to this, but so far it looks quite nice to me. Comments welcome.

dhardy · 2018-07-14T14:32:03Z

Small issue: this requires u128 since it converts usize to u128 for portability. So it needs Rust 1.26 (though we could just convert to two u64 values instead for now, since I don't think Rust targets any platforms with 128 bit pointers yet).

dhardy · 2018-07-15T08:07:26Z

This proves there are no stupid mistakes, I guess. No surprises really:

~/rust/rand/rand_sip$ ../target/release/examples/cat | practrand stdin64
RNG_test using PractRand version 0.93
RNG = RNG_stdin64, seed = 0xf050c744
test set = normal, folding = standard (64 bit)

rng=RNG_stdin64, seed=0xf050c744
length= 256 megabytes (2^28 bytes), time= 3.4 seconds
  no anomalies in 159 test result(s)

rng=RNG_stdin64, seed=0xf050c744
length= 512 megabytes (2^29 bytes), time= 7.1 seconds
  no anomalies in 169 test result(s)

rng=RNG_stdin64, seed=0xf050c744
length= 1 gigabyte (2^30 bytes), time= 14.3 seconds
  no anomalies in 180 test result(s)

rng=RNG_stdin64, seed=0xf050c744
length= 2 gigabytes (2^31 bytes), time= 28.0 seconds
  no anomalies in 191 test result(s)

rng=RNG_stdin64, seed=0xf050c744
length= 4 gigabytes (2^32 bytes), time= 55.7 seconds
  no anomalies in 201 test result(s)

rng=RNG_stdin64, seed=0xf050c744
length= 8 gigabytes (2^33 bytes), time= 110 seconds
  no anomalies in 212 test result(s)

rng=RNG_stdin64, seed=0xf050c744
length= 16 gigabytes (2^34 bytes), time= 217 seconds
  Test Name                         Raw       Processed     Evaluation
  [Low16/64]FPF-14+6/16:cross       R=  +3.9  p =  1.1e-3   unusual          
  ...and 222 test result(s) without anomalies

rng=RNG_stdin64, seed=0xf050c744
length= 32 gigabytes (2^35 bytes), time= 428 seconds
  Test Name                         Raw       Processed     Evaluation
  [Low1/64]BCFN(2+0,13-1,T)         R=  -7.0  p =1-1.3e-3   unusual          
  ...and 232 test result(s) without anomalies

rng=RNG_stdin64, seed=0xf050c744
length= 64 gigabytes (2^36 bytes), time= 855 seconds
  no anomalies in 244 test result(s)

rng=RNG_stdin64, seed=0xf050c744
length= 128 gigabytes (2^37 bytes), time= 8532 seconds
  no anomalies in 255 test result(s)

rng=RNG_stdin64, seed=0xf050c744
length= 256 gigabytes (2^38 bytes), time= 10195 seconds
  Test Name                         Raw       Processed     Evaluation
  [Low4/64]BCFN(2+2,13-0,T)         R=  +8.0  p =  8.6e-4   unusual          
  ...and 264 test result(s) without anomalies

rng=RNG_stdin64, seed=0xf050c744
length= 512 gigabytes (2^39 bytes), time= 13601 seconds
  no anomalies in 276 test result(s)

rng=RNG_stdin64, seed=0xf050c744
length= 1 terabyte (2^40 bytes), time= 20634 seconds                                                                                   
  no anomalies in 287 test result(s)                                                                                                   
                                                                                                                                       
rng=RNG_stdin64, seed=0xf050c744                                                                                                       
length= 2 terabytes (2^41 bytes), time= 34936 seconds                                                                                  
  no anomalies in 297 test result(s)                                                                                                   
                                                                                                                                       
rng=RNG_stdin64, seed=0xf050c744                                                                                                       
length= 4 terabytes (2^42 bytes), time= 61827 seconds                                                                                  
  no anomalies in 308 test result(s)

sicking · 2018-07-15T09:08:40Z

In general I think this looks great. Two very minor comments that I don't feel strongly about.

Could we use a more discoverable name, like rand_seeder or hash_seeder or some such? At least I make no association rand_sip and seeding.

Syntax like XorShiftRng::seed_from_hashable(...) might fit better with other "constructors". But I haven't thought through if that's implementable.

I haven't reviewed in detail, but happy to do so.

dhardy · 2018-07-15T13:33:30Z

I originally named it rand_seeder but figured it's also a SIP crate... but we can use that name.

Using an extension trait is technically possible but not my preferred syntax.

dhardy · 2018-07-27T15:22:49Z

Updated. We don't need to use turbo-fish syntax:

-let mut rng = Seeder::from("stripy zebra").make_rng::<XorShiftRng>();
+let mut rng: XorShiftRng = Seeder::from("stripy zebra").make_rng();

Syntax like XorShiftRng::seed_from_hashable(...) might fit better with other "constructors".

This is possible with an extension trait (like FromEntropy). But is it desirable? The advantage of the current approach is that one seeder can happily produce multiple RNGs — though this may not be used very often.

I'm also wondering whether we should re-think SeedableRng::from_rng (#572) to be "value stable", in which case there's less need for Seeder since XorShiftRng::from_rng(SipRng::from(123)) would also work (with another From implementation).

dhardy · 2018-07-27T15:26:11Z

Still needs a fix for Rust < 1.26 (i128); wait on #571

vks · 2018-08-14T12:05:39Z

Small issue: this requires u128 since it converts usize to u128 for portability. So it needs Rust 1.26 (though we could just convert to two u64 values instead for now, since I don't think Rust targets any platforms with 128 bit pointers yet).

I think converting usize to u64 is fine for now. We can add a test that fails if usize doesn't fit into a u64.

Still needs a fix for Rust < 1.26 (i128); wait on #571

#571 was merged.

From the code:

/// Although the SipHash algorithm is considered to be generally strong,
/// it is not intended for cryptographic purposes. As such, all
/// cryptographic uses of this implementation are _strongly discouraged_.

Maybe we want to use a stronger hash algorithm? However, you discourage from using this method of seeding for cryptographic purposes anyway. (Seeding with a low-entropy password might require an expensive key-derivation function, making it unsuitable for from_seed_u64 use cases).

vks · 2018-08-14T12:04:49Z

rand_seeder/src/sip.rs

+            0x8f6092dd2692af57, 0xbdf362ab8e29260b];
+        // for _ in 0..8 {
+        //     println!("0x{:x}, 0x{:x},", rng.next_u64(), rng.next_u64());
+        // }


You can probably uncomment this code. The output will only be shown if the test fails.

In theory it makes the tests slower to run.

vks · 2018-08-14T12:09:43Z

rand_seeder/src/sip.rs

+
+        Sip24Rounds::c_rounds(&mut self.state);
+
+        self.state.v0 ^ self.state.v1 ^ self.state.v2 ^ self.state.v3


Your comment above refers to this code, right? What is the motivation for this construction?

XORing the state together? No, I guess you mean using two rounds here?

Well, the default construction of SIP hash uses two rounds between each input consumed and four rounds after consuming all input (although this is configurable, and the authors invite others to try attacking the hash function with different constructions, even with zero rounds at the end).

Using two rounds here, and two rounds when switching from input consuming to and output phases, mirrors the input.

I'm not qualified to say anything about the security, but it mirrors the input design and seems a reasonable construction. I was wondering about asking the SIP hash authors for their input on this.

I should add that all the variants of SIP hash mutate the state somehow between each set of rounds, in addition to mutation by input. Mutating the adjustor (adj) is my invention to add unique input each time; as I understand the bit propagation of the "rounds" is perhaps good enough alone, but I believe this increases the strength vs any kind of analysis.

Of course, making SIP rng stronger vs crytoanalysis is beyond our requirements, but still seems prudent to do what's simple.

vks · 2018-08-14T12:10:11Z

rand_seeder/src/sip.rs

+        // This is supposed to be d - c rounds (here 4 - 2 = 2)
+        S::c_rounds(&mut state);
+
+        SipRng::from_state(state)


Where does this come from and how does it relate to the original algorithm?

This is the original: https://github.com/rust-lang/rust/blob/master/src/libcore/hash/sip.rs#L334

It looks like I missed the tweak to state.v2 before the end rounds — though the state is tweaked before the next two rounds anyway. Might be better with an extra tweak then but not too important probably.

I originally tried to make the first output match the initial one (or the first two from the 128-bit output version), but this wasn't so easy to turn into a nice RNG (because one version mutates two different parts of the state, and the other doesn't mutate any state between these four rounds).

vks · 2018-08-14T12:13:13Z

If this is to replace #537, I think we should add some prominent examples in Rand's documentation for the from_seed_u64 use cases.

dhardy · 2018-09-22T11:19:40Z

#537 (seed_from_u64) is now merged.

I still think this PR is interesting but not a priority, so postponing for now. If anyone wants to take this further, you could start by reviewing this PR.

dhardy · 2018-10-08T11:10:19Z

Note: with four u64 outputs, there is essentially a bijection from internal state to output which may make it easier to reverse the hash to recover the input (though since mixing also happens during the input phase and between input and output phases, there is still significant protection).

dhardy · 2019-09-16T09:16:49Z

It doesn't make sense to merge this new crate into this repo; we are already discussing removing other content to reduce the size of this repo (#885). This code is also not a core part of the Rand project.

However, this code may have some use for somebody and seems to be in reasonable shape. Therefore it seems sensible to create a new rust-random/seeder repo and publish this as rand_seeder.

dhardy · 2019-10-12T10:59:50Z

This now lives at https://github.com/rust-random/seeder

dhardy mentioned this pull request Jul 15, 2018

Implement SeedableRng::seed_from_u64 #537

Merged

dhardy mentioned this pull request Jul 15, 2018

Add finish_buf output function for 128-bit hasher jedisct1/rust-siphash#8

Closed

dhardy added 5 commits July 27, 2018 15:38

rand_seeder: add crate shell

731e255

rand_seeder: add portable SipHash implementation

8794017

rand_seeder: add SipRng

9b1fafa

rand_seeder: add Seeder type

b37c70d

rand_seeder: add benchmarks

7b7dc79

dhardy force-pushed the seeder branch from b71223d to 7b7dc79 Compare July 27, 2018 15:08

dhardy mentioned this pull request Jul 31, 2018

Tracker: 0.6 release #520

Closed

28 tasks

vks reviewed Aug 14, 2018

View reviewed changes

dhardy mentioned this pull request Aug 23, 2018

Sip RNG veorq/SipHash#10

Closed

dhardy added the P-postpone Waiting on something else label Sep 22, 2018

vks mentioned this pull request Sep 25, 2018

ThreadRng / EntropyRng improvements #579

Closed

dhardy mentioned this pull request Oct 8, 2018

Hasher2Rng #627

Closed

dhardy closed this Oct 12, 2019

dhardy mentioned this pull request Oct 12, 2019

Review rust-random/seeder#1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Universal seeder #554

Universal seeder #554

dhardy commented Jul 14, 2018 •

edited

Loading

dhardy commented Jul 14, 2018 •

edited

Loading

dhardy commented Jul 15, 2018

sicking commented Jul 15, 2018

dhardy commented Jul 15, 2018

dhardy commented Jul 27, 2018

dhardy commented Jul 27, 2018

vks commented Aug 14, 2018

vks Aug 14, 2018 •

edited

Loading

dhardy Aug 14, 2018

vks Aug 14, 2018

dhardy Aug 14, 2018 •

edited

Loading

dhardy Aug 14, 2018

vks Aug 14, 2018

dhardy Aug 14, 2018

vks commented Aug 14, 2018

dhardy commented Sep 22, 2018

dhardy commented Oct 8, 2018

dhardy commented Sep 16, 2019

dhardy commented Oct 12, 2019


		Sip24Rounds::c_rounds(&mut self.state);

		self.state.v0 ^ self.state.v1 ^ self.state.v2 ^ self.state.v3

Universal seeder #554

Universal seeder #554

Conversation

dhardy commented Jul 14, 2018 • edited Loading

dhardy commented Jul 14, 2018 • edited Loading

dhardy commented Jul 15, 2018

sicking commented Jul 15, 2018

dhardy commented Jul 15, 2018

dhardy commented Jul 27, 2018

dhardy commented Jul 27, 2018

vks commented Aug 14, 2018

vks Aug 14, 2018 • edited Loading

Choose a reason for hiding this comment

dhardy Aug 14, 2018

Choose a reason for hiding this comment

vks Aug 14, 2018

Choose a reason for hiding this comment

dhardy Aug 14, 2018 • edited Loading

Choose a reason for hiding this comment

dhardy Aug 14, 2018

Choose a reason for hiding this comment

vks Aug 14, 2018

Choose a reason for hiding this comment

dhardy Aug 14, 2018

Choose a reason for hiding this comment

vks commented Aug 14, 2018

dhardy commented Sep 22, 2018

dhardy commented Oct 8, 2018

dhardy commented Sep 16, 2019

dhardy commented Oct 12, 2019

dhardy commented Jul 14, 2018 •

edited

Loading

dhardy commented Jul 14, 2018 •

edited

Loading

vks Aug 14, 2018 •

edited

Loading

dhardy Aug 14, 2018 •

edited

Loading