Define and compose random variables.
First, we need a way to generate random numbers. Here, a random number generator is defined dynamically as an object having a method that returns a uniform number in [0,1]:
type Random = object
random: proc(): float
One can obtain instances of Random
by wrapping the RNG defined in
nim-random, such as in
import random/urandom, random/mersenne
import alea
var rng = wrap(initMersenneTwister(urandom(16)))
The reason why we need to wrap them is that random number generators in nim-random are defined as a concept, while it will be simpler in the sequel to represent them as a single type.
A random variable of type A
is just something that can take a random number
generator and provide an instance of type A
:
type RandomVar[A] = concept x
var rng: Random
rng.sample(x) is A
In other word, the only operation that we need to define on a type T
to
make it an instance of RandomVar[A]
is
proc sample(rng: var Random, t: T): A = ...
Here we require the first parameter to be of type var Random
because drawing
a random number mutates the internal state of the RNG. It may be more clear to
return a new state together with the value of type A
, much like in the
state monad
but we avoid doing so for performance reason.
If we think of the internal state space of the random number generator as the
probability space Ω
, the similarity between our definition and the mathematical
definition of random variable is apparent.
A few core random variables are defined:
ConstantVar[A]
is a just a trivial random variable that always samples the same valueUniform
is a uniform variable over a real intervalChoice[A]
is a discrete random variable that can take a finite number of values with equal probabilityClosureVar[A]
is a wrapper over aproc(rng: var Random): A
Most random variables that arise by manipulating other variables are of
ClosureVar
type.
Here is an example showing how to costruct instances of these variables. Types are inferred, and are there just for explanatory purposes:
import alea
proc f(rng: var Random): float = 2 * rng.random()
let
c: ConstantVar[string] = constant("hello")
u: Uniform = uniform(2, 14)
d: Choice[int] = choice(@[1, 2, 3, 4, 5])
x: ClosureVar[float] = closure(f)
A few common operations on random variables are supported - in particular mapping and filtering:
import alea, future
let
a = uniform(3, 12)
b = a.map((x: float) => 3 - x)
c = a.filter((x: float) => x > 5)
Mapping, in particular, is a common operation, and there is a macro lift
that takes a function A => B
and declares a function of the same name of
type RandomVar[A] => ClosureVar[B]
that is obtained by mapping. It can be
used with a type hint in case the function is overloaded, as in
import math
proc sq(x: float): float = x * x
lift(abs, float)
lift(sq) # No ambiguity here
let
a = uniform(3, 12)
b = sq(abs(a))
There is also a version of two arguments map2
, that takes two random variables
and a binary function. Generalization for more than two arguments can be done
using the fact that random variables form
a monad
but the relevant functions are still to be implemented.
Many mathematical functions, as well as the arithmetic operations, are already lifted, so the following is valid:
let
a = uniform(3, 12)
b = choice(@[1.0, 2.5, 3.7])
c = abs(a - b) * sqrt(a)
Random variables can also be conditioned with respect to each other. For
instance, if a
and b
are real random variables and we want to condition
a
to the occurrence that b
is positive, we can do:
let c = a.where(b, (x: float) => x > 0)
where the last parameter to where
is a predicate that should be satisfied
by samples from b
.
How to make this work? Drawing from b
will change the status of the random
number generator, which in theory prevents us from sampling a
at the same
point.
To avoid this issue, we use a fake random number generator that wraps another
instance of Random
, but repeats its result twice. That is, internally we
use an auxiliary (fake) RNG defined like
var repeated = rng.repeat(2)
You can use the same trick whenever there is the need to draw more than a single sample from the same point of the probability space.
A few common statistics are implemented on RandomVar[float]
, such as the mean,
variance and so on. There is a generic implementation that will work for any
random variable, but particular types of random variables can use more specialized
methods.
An example of their usage is:
let x = uniform(2, 5) + choice(@[1.2, 3.3, 4.5])
var rng = ...
echo rng.sample(x)
echo rng.mean(x)
echo rng.variance(x)
echo rng.stddev(x)
The covariance is also implemented, again by using the trick of a repeating random number generator to draw from the two distributions at the same time.
All there operations admit an optional parameter which is the number of samples to compute the statistics with more or less accuracy:
echo rng.mean(x, samples = 1000000)
Finally, complex random variables, that are represented by chains of closures,
can be approximated by sampling enough times. There is a function discretize
that will take any RandomVar[A]
and produce an instance of Choice[A]
that
will wrap a certain number of samples:
let f = ... # Some complex random variable
let d = f.discretize(samples = 20000)
A few common real random variables are implemented:
let
g = gaussian(mu = 0, sigma = 1)
p = poisson(3.5)
b = bernoulli(0.7)
Usually, the statistics for these notable random variables are known, so we have overloads, in such a way that, for instance, the mean of a Gaussian variable will always be exact.
To define your own random variables, you can take inspiration, say, from
bernoulli.nim
. The only mandatory operation for a type T
to be an instance
of RandomVar[A]
is
proc sample(rng: var Random, t: T): A
If other statistics are known (such as mean, variance and so on), one can also define overloads such as
proc mean(rng: var Random, t: T, samples = 100000)
Here is a small example that combines all of the above:
import future
import random/urandom, random/mersenne
import alea
var rng = wrap(initMersenneTwister(urandom(16)))
let
a = uniform(0, 9)
b = choice([1, 2, 3, 4, 5]).map((x: int) => x.float)
c = poisson(13)
d = gaussian(mu = 3, sigma = 5).filter((x: float) => x > 3)
s = ln(abs((sqrt(a) * b) - (a.floor / log10(c)))) + d
t = c.where(s, (x: float) => x > 5)
u = rng.discretize(t)
echo rng.mean(s)
echo rng.stddev(u)
- improve the DSL for conditioning
- higher moments and other statistics
- monad composition
- histograms
- add more standard distributions (beta, gamma, geometric...)
- entropy etc.