Skip to content

Commit

Permalink
Merge pull request #52 from rust-random/work
Browse files Browse the repository at this point in the history
Add src/guide-parallel.md
  • Loading branch information
dhardy authored Jul 8, 2022
2 parents a64f805 + 58b97ce commit dba3b50
Show file tree
Hide file tree
Showing 3 changed files with 146 additions and 0 deletions.
1 change: 1 addition & 0 deletions src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
- [Types of generators](guide-gen.md)
- [Our RNGs](guide-rngs.md)
- [Seeding RNGs](guide-seeding.md)
- [Parallel RNGs](guide-parallel.md)
- [Random values](guide-values.md)
- [Random distributions](guide-dist.md)
- [Random processess](guide-process.md)
Expand Down
144 changes: 144 additions & 0 deletions src/guide-parallel.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
# Parallel RNGs

## Theory: multiple RNGs

If you want to use random generators in multiple worker threads simultaneously,
then you will want to use multiple RNGs. A few suggested approaches:

1. Use [`thread_rng`] in each worker thread. This is seeded automatically
(lazily and uniquely) on each thread where it is used.
2. Use [`thread_rng`] (or another master RNG) to seed a custom RNG on each
worker thread. The main advantage here is flexibility over the RNG used.
3. Use a custom RNG per *work unit*, not per *worker thread*. If these RNGs
are seeded in a deterministic fashion, then deterministic results are
possible. Unfortunately, seeding a new RNG for each work unit from a master
generator cannot be done in parallel, thus may be slow.
4. Use a single master seed. For each work unit, seed an RNG using the master
seed and set the RNG's stream to the work unit number. This is potentially a
faster than (3) while still deterministic, but not supported by all RNGs.

Note: do not simply clone RNGs for worker threads/units. Clones return the same
sequence of output as the original. You may however use clones if you then set
a unique stream on each.

### Streams

Which RNG families support multiple streams?

- [ChaCha](https://docs.rs/rand_chacha/latest/rand_chacha/): the ChaCha RNGs
support 256-bit seed, 64-bit stream and 64-bit counter (per 16-word block),
thus supporting 2<sup>64</sup> streams of 2<sup>68</sup> words each.
- [Hc128](https://docs.rs/rand_hc/latest/rand_hc/) is a cryptographic RNG
supporting a 256-bit seed; one could construct this seed from (e.g.) a
smaller 192-bit key plus a 64-bit stream.

Note that the above approach of constructing the seed from a smaller key plus a
stream counter can only be recommended with cryptographic PRNGs since simpler
RNGs often have correlations in the RNG's output using two similar keys, and
may also require "random looking" seeds to produce high quality output.

Non-cryptographic PRNGs may still support multiple streams, but likely with
significant limitations (especially noting that a common recommendation with
such PRNGs is not to consume more than the square root of the generator's
period).

- [Xoshiro](https://docs.rs/rand_xoshiro/latest/rand_xoshiro/): the Xoshiro
family of RNGs support `jump` and `long_jump` methods which may effectively
be used to divide the output of a single RNG into multiple streams. In
practice this is only useful with a small number of streams, since `jump`
must be called `n` times to select the nth "stream".
- [Pcg](https://docs.rs/rand_pcg/latest/rand_pcg/): these RNGs support
construction with `state` and `stream` parameters. Note, however, that the
RNGs have been critiqued in that multiple streams using the same key are
often strongly correlated. See the [author's own comments](https://www.pcg-random.org/posts/critiquing-pcg-streams.html).

The PCG RNGs *also* support an `fn advance(delta)` method, which might be
used to divide a single stream into multiple sub-streams as with Xoshiro's
`jump` (but better since the offset can be specified).

## Practice: non-deterministic multi-threaded

We use [Rayon]'s parallel iterators, using [`map_init`] to initialize an RNG in
each worker thread. Note: this RNG may be re-used across multiple work units,
which may be split between worker threads in non-deterministic fashion.

```rust
use rand::distributions::{Distribution, Uniform};
use rayon::prelude::*;

static SAMPLES: u64 = 1_000_000;

fn main() {
let range = Uniform::new(-1.0f64, 1.0);

let in_circle = (0..SAMPLES)
.into_par_iter()
.map_init(|| rand::thread_rng(), |rng, _| {
let a = range.sample(rng);
let b = range.sample(rng);
if a * a + b * b <= 1.0 {
1
} else {
0
}
})
.reduce(|| 0usize, |a, b| a + b);

// prints something close to 3.14159...
println!(
"π is approximately {}",
4. * (in_circle as f64) / (SAMPLES as f64)
);
}
```

## Practice: deterministic multi-threaded

We use approach (4) above to achieve a deterministic result: initialize all RNGs
from a single seed, but using multiple streams.
We use [`ChaCha8Rng::set_stream`] to achieve this.

Note further that we manually batch multiple work-units according to
`BATCH_SIZE`. This is important since the cost of initializing an RNG is large
compared to the cost of our work unit (generating two random samples plus some
trivial calculations). Manual batching could improve performance of the above
non-deterministic simulation too.

(Note: this example is <https://github.com/rust-random/rand/blob/master/examples/rayon-monte-carlo.rs>.)

```rust
use rand::distributions::{Distribution, Uniform};
use rand_chacha::{rand_core::SeedableRng, ChaCha8Rng};
use rayon::prelude::*;

static SEED: u64 = 0;
static BATCH_SIZE: u64 = 10_000;
static BATCHES: u64 = 1000;

fn main() {
let range = Uniform::new(-1.0f64, 1.0);

let in_circle = (0..BATCHES)
.into_par_iter()
.map(|i| {
let mut rng = ChaCha8Rng::seed_from_u64(SEED);
rng.set_stream(i);
let mut count = 0;
for _ in 0..BATCH_SIZE {
let a = range.sample(&mut rng);
let b = range.sample(&mut rng);
if a * a + b * b <= 1.0 {
count += 1;
}
}
count
})
.reduce(|| 0usize, |a, b| a + b);

// prints 3.1409052 (determinstic and reproducible result)
println!(
"π is approximately {}",
4. * (in_circle as f64) / ((BATCH_SIZE * BATCHES) as f64)
);
}
```
1 change: 1 addition & 0 deletions tests/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,4 @@ rand_chacha = "0.3"
rand_distr = "0.4"
rand_distr_0_2 = { package = "rand_distr", version = "0.2" }
rand_seeder = "0.2"
rayon = "1.5.3"

0 comments on commit dba3b50

Please sign in to comment.