Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add src/guide-parallel.md #52

Merged
merged 3 commits into from
Jul 8, 2022
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
- [Types of generators](guide-gen.md)
- [Our RNGs](guide-rngs.md)
- [Seeding RNGs](guide-seeding.md)
- [Parallel RNGs](guide-parallel.md)
- [Random values](guide-values.md)
- [Random distributions](guide-dist.md)
- [Random processess](guide-process.md)
Expand Down
144 changes: 144 additions & 0 deletions src/guide-parallel.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
# Parallel RNGs

## Theory: multiple RNGs

If you want to use random generators in multiple worker threads simultaneously,
then you will want to use multiple RNGs. A few suggested approaches:

1. Use [`thread_rng`] in each worker thread. This is seeded automatically
(lazily and uniquely) on each thread where it is used.
2. Use [`thread_rng`] (or another master RNG) to seed a custom RNG on each
worker thread. The main advantage here is flexibility over the RNG used.
3. Use a custom RNG per *work unit*, not per *worker thread*. If these RNGs
are seeded in a deterministic fashion, then deterministic results are
possible. Unfortunately, seeding a new RNG for each work unit from a master
generator cannot be done in parallel, thus may be slow.
4. Use a single master seed. For each work unit, seed an RNG using the master
seed and set the RNG's stream to the work unit number. This is potentially a
faster than (3) while still deterministic, but not supported by all RNGs.

Note: do not simply clone RNGs for worker threads/units. Clones return the same
sequence of output as the original. You may however use clones if you then set
a unique stream on each.

### Streams

Which RNG families support multiple streams?

- [ChaCha](https://docs.rs/rand_chacha/latest/rand_chacha/): the ChaCha RNGs
support 256-bit seed, 64-bit stream and 64-bit counter (per 16-word block),
thus supporting 2<sup>64</sup> streams of 2<sup>68</sup> words each.
- [Hc128](https://docs.rs/rand_hc/latest/rand_hc/) is a cryptographic RNG
supporting a 256-bit seed; one could construct this seed from (e.g.) a
smaller 192-bit key plus a 64-bit stream.

Note that the above approach of constructing the seed from a smaller key plus a
stream counter can only be recommended with cryptographic PRNGs since simpler
RNGs often have correlations in the RNG's output using two similar keys, and
may also require "random looking" seeds to produce high quality output.

Non-cryptographic PRNGs may still support multiple streams, but likely with
significant limitations (especially noting that a common recommendation with
such PRNGs is not to consume more than the square root of the generator's
period).

- [Xoshiro](https://docs.rs/rand_xoshiro/latest/rand_xoshiro/): the Xoshiro
family of RNGs support `jump` and `long_jump` methods which may effectively
be used to divide the output of a single RNG into multiple streams. In
practice this is only useful with a small number of streams, since `jump`
must be called `n` times to select the nth "stream".
- [Pcg](https://docs.rs/rand_pcg/latest/rand_pcg/): these RNGs support
construction with `state` and `stream` parameters. Note, however, that the
RNGs have been critiqued in that multiple streams using the same key are
often strongly correlated. See the [author's own comments](https://www.pcg-random.org/posts/critiquing-pcg-streams.html).

The PCG RNGs *also* support an `fn advance(delta)` method, which might be
used to divide a single stream into multiple sub-streams as with Xoshiro's
`jump` (but better since the offset can be specified).

## Practice: non-deterministic multi-threaded

We use [Rayon]'s parallel iterators, using [`map_init`] to initialize an RNG in
each worker thread. Note: this RNG may be re-used across multiple work units,
which may be split between worker threads in non-deterministic fashion.

```rust
use rand::distributions::{Distribution, Uniform};
use rayon::prelude::*;

static SAMPLES: u64 = 1_000_000;

fn main() {
let range = Uniform::new(-1.0f64, 1.0);

let in_circle = (0..SAMPLES)
.into_par_iter()
.map_init(|| rand::thread_rng(), |rng, _| {
let a = range.sample(rng);
let b = range.sample(rng);
if a * a + b * b <= 1.0 {
1
} else {
0
}
})
.reduce(|| 0usize, |a, b| a + b);

// prints something close to 3.14159...
println!(
"π is approximately {}",
4. * (in_circle as f64) / (SAMPLES as f64)
);
}
```

## Practice: determinsitic multi-threaded
dhardy marked this conversation as resolved.
Show resolved Hide resolved

We use approach (4) above to achieve a deterministic result: initialize all RNGs
from a single seed, but using multiple streams.
We use [`ChaCha8Rng::set_stream`] to achieve this.

Note further that we manually batch multiple work-units according to
`BATCH_SIZE`. This is important since the cost of initializing an RNG is large
compared to the cost of our work unit (generating two random samples plus some
trivial calculations). Manual batching could improve performance of the above
non-deterministic simulation too.

(Note: this example is <https://github.com/rust-random/rand/blob/master/examples/rayon-monte-carlo.rs>.)

```rust
use rand::distributions::{Distribution, Uniform};
use rand_chacha::{rand_core::SeedableRng, ChaCha8Rng};
use rayon::prelude::*;

static SEED: u64 = 0;
static BATCH_SIZE: u64 = 10_000;
static BATCHES: u64 = 1000;

fn main() {
let range = Uniform::new(-1.0f64, 1.0);

let in_circle = (0..BATCHES)
.into_par_iter()
.map(|i| {
let mut rng = ChaCha8Rng::seed_from_u64(SEED);
rng.set_stream(i);
let mut count = 0;
for _ in 0..BATCH_SIZE {
let a = range.sample(&mut rng);
let b = range.sample(&mut rng);
if a * a + b * b <= 1.0 {
count += 1;
}
}
count
})
.reduce(|| 0usize, |a, b| a + b);

// prints something close to 3.14159...
dhardy marked this conversation as resolved.
Show resolved Hide resolved
println!(
"π is approximately {}",
4. * (in_circle as f64) / ((BATCH_SIZE * BATCHES) as f64)
);
}
```
1 change: 1 addition & 0 deletions tests/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,4 @@ rand_chacha = "0.3"
rand_distr = "0.4"
rand_distr_0_2 = { package = "rand_distr", version = "0.2" }
rand_seeder = "0.2"
rayon = "1.5.3"