Skip to content

Commit

Permalink
Improve documentation of value/distribution sampling
Browse files Browse the repository at this point in the history
Addresses rust-random#21
  • Loading branch information
dhardy committed Nov 14, 2019
1 parent 97fb54e commit 37f01c4
Show file tree
Hide file tree
Showing 3 changed files with 148 additions and 69 deletions.
10 changes: 4 additions & 6 deletions src/guide-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,12 +63,10 @@ random value (an unbiased random value) is in several contexts:
random value (it would be infinitely large, and use infinite amounts of memory)
- `f64`: we treat this as an approximation of the real numbers, and,
*by convention*, restrict to the range 0 to 1 (if not otherwise specified).
Note that this type has finite precision, so we use the coin-flipping method
above (but with random bits instead of coins) until we get as much precision
as the type can represent; however, since floating-point numbers are much
more precise close to 0 than they are near 1, we typically simplify here and
stop once we have enough precision to differentiate between 1 and the next
smallest value representable (`1 - ε/2`).
We will come back to the conversions used later;
for now note that these produce 52-53 bits of precision (depending on which
conversion is used, output will be in steps of `ε` or `ε/2`, where `1+ε` is
the smallest representable value greater than `1`).

## Random data

Expand Down
171 changes: 123 additions & 48 deletions src/guide-dist.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,47 +10,137 @@ pub trait Distribution<T> {
fn sample<R: Rng + ?Sized>(&self, rng: &mut R) -> T;

// a convenience function defined using sample:
fn sample_iter<'a, R>(&'a self, rng: &'a mut R) -> DistIter<'a, Self, R, T>
fn sample_iter<R>(self, rng: R) -> DistIter<Self, R, T>
where
Self: Sized,
R: Rng,
{ ... }
}
```

Rand provides implementations of many different distributions; for the full
list see the [`distributions`] module; the most common are highlighted below.
Rand provides implementations of many different distributions; we cover the most
common of these here, but for full details refer to the [`distributions`] module
and the [`rand_distr`] crate.

## Uniform distributions
# Uniform distributions

The most obvious type of distribution is the one we already discussed: one
without pattern, where each value or range of values is equally likely. This is
known as *uniform*.

Rand actually has several variants of this:

- [`Standard`] requires no parameters and produces uniformly distributed
values over the entire range of the output type (for `bool` and integers)
or over the range from 0 to 1 (for floats) or over valid Unicode code
points. It also has extensions to tuples, array types and `Option`.
- [`Uniform`] is parametrised with `low` and `high` points, and produces
values uniformly distributed within this range.
- [`Alphanumeric`] is uniform over the values `0-9A-Za-z`
- [`Open01`] and [`OpenClosed01`] are variations of [`Standard`] for floating
point numbers between 0 and 1 (partially) exclusive of end points.

For convenience, [`Rng::gen`] and [`random`] are short-cuts to [`Standard`],
and [`Rng::gen_range`] is a short-cut to [`Uniform`], allowing things like:

```rust
# extern crate rand;
# use rand::prelude::*;
let mut rng = thread_rng();
let coord: (f64, f64) = rng.gen();
let die_roll = rng.gen_range(1, 7);
```

## More continuous distributions
Rand actually has several variants of this, repesenting different ranges:

- [`Standard`] requires no parameters and samples values uniformly according
to the type. [`Rng::gen`] provides a short-cut to this distribution.
- [`Uniform`] is parametrised by `Uniform::new(low, high)` (including `low`,
excluding `high`) or `Uniform::new_inclusive(low, high)` (including both),
and samples values uniformly within this range.
[`Rng::gen_range`] is a convenience method defined over
[`Uniform::sample_single`], optimised for single-sample usage.
- [`Alphanumeric`] is uniform over the `char` values `0-9A-Za-z`.
- [`Open01`] and [`OpenClosed01`] are provide alternate sampling ranges for
floating-point types (see below).

## Uniform sampling by type

Lets go over the distributions by type:

- For `bool`, [`Standard`] samples each value with probability 50%.
- For `Option<T>`, the [`Standard`] distribution samples `None` with
probability 50%, otherwise `Some(value)` is sampled, according to its type.
- For integers (`u8` through to `u128`, `usize`, and `i*` variants),
[`Standard`] samples from all possible values while
[`Uniform`] samples from the parameterised range.
- For `NonZeroU8` and other "non-zero" types, [`Standard`] samples uniformly
from all non-zero values (rejection method).
- `Wrapping<T>` integer types are sampled as for the corresponding integer
type by the [`Standard`] distribution.
- For floats (`f32`, `f64`),

- [`Standard`] samples from the half-open range `[0, 1)` with 24 or 53
bits of precision (for `f32` and `f64` respectively)
- [`OpenClosed01`] samples from the half-open range `(0, 1]` with 24 or
53 bits of precision
- [`Open01`] samples from the open range `(0, 1)` with 23 or 52 bits of
precision
- [`Uniform`] samples from a given range with 23 or 52 bits of precision
- For the `char` type, the [`Standard`] distribution samples from all
available Unicode code points, uniformly; many of these values may not be
printable (depending on font support). The [`Alphanumeric`] samples from
only a-z, A-Z and 0-9 uniformly.
- For tuples and arrays, each element is sampled as above, where supported.
The [`Standard`] and [`Uniform`] distributions each support a selection of
these types (up to 12-tuples and 32-element arrays).
This includes the empty tuple `()` and array.
- For SIMD types, each element is sampled as above, for [`Standard`] and
[`Uniform`] (for the latter, `low` and `high` parameters are *also* SIMD
types, effectively sampling from multiple ranges simultaneously). SIMD
support is gated behind a [feature flag](../features.html#simd-support).

# Non-uniform distributions

Non-uniform distributions can be divided into two categories, as follows.
Some of these discrete and all of the continuous distributions have been moved
from the main [`rand`] crate to a dedicated [`rand_distr`] crate.

## Discrete non-uniform distributions

Discrete distributions sample from boolean or integer types. As above, these
can be sampled uniformly, or, as below, via a non-uniform distribution.

Potentially a discrete distribution could sample directly from a set of discrete
values such as a slice or an `enum`. See the section on [Sequences] regarding
Rand's traits for slice and iterator types. Rand does not provide direct
sampling from `enum`s, with the exception of `Option` (see above).

### Booleans

The [`Bernoulli`] distribution is a fancy name for generating a boolean
with a given a probability `p` of being `true`, or defined via a
`success : failure` ratio. Often this is described as a *trial* with
probability `p` of *success* (`true`).

The methods [`Rng::gen_bool`] and [`Rng::gen_ratio`] are short-cuts to this
distribution.

### Integers

The [`Binomial`] distribution is related to the [`Bernoulli`] in that it
models running `n` independent trials each with probability `p` of success,
then counts the number of successes.

Note that for large `n` the [`Binomial`] distribution's implementation is
much faster than sampling `n` trials individually.

The [`Poisson`] distribution expresses the expected number of events
occurring within a fixed interval, given that events occur with fixed rate λ.

### Weighted sequences

The [`WeightedIndex`] distribution samples an index from sequence of weights.
See the [Sequences] section for convenience wrappers directly sampling a slice
element.

For example, weighted sampling could be used to model the colour of a marble
sampled from a bucket containing 5 green, 15 red and 80 blue marbles.

Currently the Rand lib only implements *sampling with replacement*, i.e.
repeated sampling assumes the same distribution (that any sampled marble
has been replaced). An alternative distribution implementing
*sampling without replacement* has been
[requested](https://github.com/rust-random/rand/issues/596).

Note also that two implementations of [`WeightedIndex`] are available; the
first is optimised for a small number of samples while
[`alias_method::WeightedIndex`] is optimised for a large number of samples
(where "large" may mean "> 1000"; benchmarks recommended).

## Continuous non-uniform distributions

Continuous distributions model samples drawn from the real number line ℝ, or in
some cases a point from a higher dimension (ℝ², ℝ³, etc.). We provide
implementations for `f64` and for `f32` output in most cases, although currently
the `f32` implementations simply reduce the precision of an `f64` sample.

The exponential distribution, [`Exp`], simulates time until decay, assuming a
fixed rate of decay (i.e. exponential decay).
Expand All @@ -72,28 +162,11 @@ The [`Beta`] distribution is a two-parameter probability distribution, whose
output values lie between 0 and 1. The [`Dirichlet`] distribution is a
generalisation to any positive number of parameters.

## Discrete distributions

The [`Bernoulli`] distribution is very simple: given a probability `p` (or a
ratio `num / denom`), a boolean value is produced with the given probability
of being `true` (simulating a trial with probability `p` of success).

For convenience, [`Rng::gen_bool`] and [`Rng::gen_ratio`] are short-cuts to [`Bernoulli`].

The [`Binomial`] distribution is related: given a probability `p` and a number
`n`, this distribution simulates running `n` Bernoulli trials and tells you the
number which were successful.

The [`Poisson`] distribution expresses the expected number of events occurring
within a fixed interval, given that events occur with fixed rate λ.

## Weighted sampling

Finally, [`WeightedIndex`] is a discrete distribution sampling from a finite
selection of choices each with given weight.

[Sequences]: ../guide-seq.html
[`Distribution`]: ../rand/rand/distributions/trait.Distribution.html
[`distributions`]: ../rand/rand/distributions/index.html
[`rand`]: ../rand/rand/index.html
[`rand_distr`]: ../rand/rand_distr/index.html
[`Rng::gen_range`]: ../rand/rand/trait.Rng.html#method.gen_range
[`random`]: ../rand/rand/fn.random.html
[`Rng::gen_bool`]: ../rand/rand/trait.Rng.html#method.gen_bool
Expand All @@ -102,6 +175,7 @@ selection of choices each with given weight.
[`Rng`]: ../rand/rand/trait.Rng.html
[`Standard`]: ../rand/rand/distributions/struct.Standard.html
[`Uniform`]: ../rand/rand/distributions/struct.Uniform.html
[`Uniform::sample_single`]: ../rand/rand/distributions/struct.Uniform.html#method.sample_single
[`Alphanumeric`]: ../rand/rand/distributions/struct.Alphanumeric.html
[`Open01`]: ../rand/rand/distributions/struct.Open01.html
[`OpenClosed01`]: ../rand/rand/distributions/struct.OpenClosed01.html
Expand All @@ -117,3 +191,4 @@ selection of choices each with given weight.
[`Beta`]: ../rand/rand/distributions/struct.Beta.html
[`Dirichlet`]: ../rand/rand/distributions/struct.Dirichlet.html
[`WeightedIndex`]: ../rand/rand/distributions/weighted/struct.WeightedIndex.html
[`alias_method::WeightedIndex`]: ../rand/rand/distributions/weighted/alias_method/struct.WeightedIndex.html
36 changes: 21 additions & 15 deletions src/guide-values.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,29 +7,34 @@ This is a trick question: we need to know both the *range* we want and the type
of *distribution* of this value (which is what the [`next`](guide-dist.md) section
is all about).

## The `Rng` trait

For convenience, all generators automatically implement the [`Rng`] trait,
which provides short-cuts to a few ways of generating values. This has several
convenience functions for producing uniformly distributed values:

- [`gen`] generates an unbiased random value from a range appropriate for the
- [`Rng::gen`] generates an unbiased (uniform) random value from a range
appropriate for the
type. For integers this is normally the full representable range
(e.g. from `0u32` to `std::u32::MAX`), for floats this is between 0 and 1,
and some other types are supported, including arrays and tuples.
More on this in the [next section](guide-dist.html#uniform-distributions).
- [`gen_range`] generates an unbiased random value with given bounds

This method is a convenience wrapper around the [`Standard`] distribution,
as documented in the [next section](guide-dist.html#uniform-distributions).
- [`Rng::gen_range`] generates an unbiased random value with given bounds
`low` (inclusive) and `high` (exclusive)
- [`fill`] and [`try_fill`] are optimised functions for filling any byte or
- [`Rng::fill`] and [`Rng::try_fill`] are optimised functions for filling any byte or
integer slice with random values

It also has convenience functions for producing non-uniform boolean values:

- [`gen_bool`] generates a boolean with the given probability
- [`gen_ratio`] also generates a boolean, where the probability is defined
- [`Rng::gen_bool`] generates a boolean with the given probability
- [`Rng::gen_ratio`] also generates a boolean, where the probability is defined
via a fraction

Finally, it has a function to sample from arbitrary distributions:

- [`sample`] samples directly from some [distribution](guide-dist.md)
- [`Rng::sample`] samples directly from some [distribution](guide-dist.md)

Examples:

Expand All @@ -48,20 +53,21 @@ let x: f64 = rng.gen();
let roll = rng.gen_range(1, 7);
```

Additionally, the [`random`] function is a short-cut to [`gen`] on the [`thread_rng`]:
Additionally, the [`random`] function is a short-cut to [`Rng::gen`] on the [`thread_rng`]:
```rust
if rand::random() {
println!("we got lucky!");
}
```

[`Rng`]: ../rand/rand/trait.Rng.html
[`gen`]: ../rand/rand/trait.Rng.html#method.gen
[`gen_range`]: ../rand/rand/trait.Rng.html#method.gen_range
[`sample`]: ../rand/rand/trait.Rng.html#method.sample
[`gen_bool`]: ../rand/rand/trait.Rng.html#method.gen_bool
[`gen_ratio`]: ../rand/rand/trait.Rng.html#method.gen_ratio
[`fill`]: ../rand/rand/trait.Rng.html#method.fill
[`try_fill`]: ../rand/rand/trait.Rng.html#method.try_fill
[`Rng::gen`]: ../rand/rand/trait.Rng.html#method.gen
[`Rng::gen_range`]: ../rand/rand/trait.Rng.html#method.gen_range
[`Rng::sample`]: ../rand/rand/trait.Rng.html#method.sample
[`Rng::gen_bool`]: ../rand/rand/trait.Rng.html#method.gen_bool
[`Rng::gen_ratio`]: ../rand/rand/trait.Rng.html#method.gen_ratio
[`Rng::fill`]: ../rand/rand/trait.Rng.html#method.fill
[`Rng::try_fill`]: ../rand/rand/trait.Rng.html#method.try_fill
[`random`]: ../rand/rand/fn.random.htm
[`thread_rng`]: ../rand/rand/fn.thread_rng.html
[`Standard`]: ../rand/rand/distributions/struct.Standard.html

0 comments on commit 37f01c4

Please sign in to comment.