- Probability is a measure of the likelihood of an event occurring;
-
$P(A)$ is the probability of event$A$ ; -
$P(A) = \frac{N_A}{N}$ where$N_A$ is the number of ways event$A$ can occur and$N$ is the total number of possible outcomes; -
Frequentist definition:
$P(A) = \lim_{N \to \infty} \frac{N_A}{N}$ - the probability of an event is the limit of its relative frequency in a large number of trials. - Sample space is the set of all possible outcomes of an experiment;
- Event is a subset of the sample space.
-
$P(A) \geq 0$ for all events$A$ -
$P(X) = 1$ where$X$ is the sample space -
$P(A \cup B) = P(A) + P(B)$ for all disjoint events$A$ and$B$
From these axioms, we can derive the following:
$P(\emptyset) = 0$ $C \subseteq D \implies P(C) \leq P(D)$ $P(A \cup B) = P(A) + P(B) - P(A \cap B)$
-
Conditional probability of event
$A$ given event$B$ is$P(A|B) = \frac{P(A \cap B)}{P(B)}$ , where$P(B) > 0$ ; - Events
$A$ and$B$ are independent if$P(A \cap B) = P(A)P(B)$ ;- If
$A$ and$B$ are independent, then$P(A|B) = P(A)$ and$P(B|A) = P(B)$ .
- If
-
Law of Total Probability:
$P(A) = \sum_i P(A|B_i)P(B_i)$ where$B_i$ are disjoint events such that$\cup_i B_i = X$ ; -
Bayes Theorem:
$P(B|A) = \frac{P(A|B)P(B)}{P(A)}$ .
Random variable is a function
- Discrete random variable is a random variable that takes on a finite or countably infinite number of values;
-
Distribution function of a discrete random variable
$X$ is$F_X(x) = P(X \leq x)$ ; -
Probability mass function of a discrete random variable
$X$ is$p_X(x) = P(X = x)$ .
There are many discrete probability distributions, including:
-
Uniform:
$f_X(x_i) = \frac{1}{n}$ for$i = 1, \dots, n$ ; -
Bernoulli:
$f_X(x) = p^x(1-p)^{1-x}$ for$x \in {0, 1}$ , or:
-
Binomial is the sum of
$n$ independent Bernoulli trials:$f_X(x) = Binomial(x;n,p) = \binom{n}{x}p^x(1-p)^{n-x}$ for$x \in {0, 1, \dots, n}$ ;- The binomial coefficient
$\binom{n}{x} = \frac{n!}{x!(n-x)!}$ is the number of ways to choose$x$ items from$n$ items.
- The binomial coefficient
- Continuous random variable is a random variable that takes on an uncountably infinite number of values;
-
Distribution function of a continuous random variable
$X$ is$F_X(x) = P(X \leq x) = \int_{-\infty}^x f_X(t)dt$ ; -
Probability density function of a continuous random variable
$X$ is$f_X(x)$ such that$P(a \leq X \leq b) = \int_a^b f_X(x)dx$ .
There are many continuous probability distributions, including:
- Uniform:
- Normal (or Gaussian):
- Exponential:
-
Expectation of a random variable
$X$ is:
-
Linearity of expectation:
$E[X + Y] = E[X] + E[Y]$ ; -
$E[aX + b] = aE[X] + b$ for constants$a$ and$b$ ; - The expectation of a function of a random variable
$g(X)$ is:
-
Joint distribution of two random variables
$X$ and$Y$ is$F_{XY}(x,y) = P(X \leq x, Y \leq y)$ ; -
Joint probability mass function of two discrete random variables
$X$ and$Y$ is$p_{XY}(x,y) = P(X = x, Y = y)$ ; -
Joint probability density function of two continuous random variables
$X$ and$Y$ is$f_{XY}(x,y)$ such that$P((X,Y) \in A) = \iint_A f_{XY}(x,y)dxdy$ ; - Marginalization is the process of obtaining the distribution of one variable from the joint distribution of two variables:
-
Independence of two random variables
$X$ and$Y$ is$F_{XY}(x,y) = F_X(x)F_Y(y)$ for all$x$ and$y$ .
There are many joint distributions, including:
- Multinomial is the generalization of the binomial distribution to more than two outcomes:
- Multivariate Gaussian is the generalization of the normal distribution to more than one dimension:
-
Conditional pmf of
$X$ given$Y$ is$p_{X|Y}(x|y) = P(X = x|Y = y) = \frac{p_{XY}(x,y)}{p_Y(y)}$ ; -
Conditional pdf of
$X$ given$Y$ is$f_{X|Y}(x|y) = \frac{f_{XY}(x,y)}{f_Y(y)}$ ; -
Bayes' Theorem for two random variables
$X$ and$Y$ is:
-
Covariance of two random variables
$X$ and$Y$ is$cov(X,Y) = E[(X - E[X])(Y - E[Y])] = E[XY] - E[X]E[Y]$ ;-
$cov(X,X) = var(X)$ ;
-
-
Covariance matrix of
$X = (X_1, \dots, X_k)$ is:
- The covariance of Gaussian:
$N(x;\mu,\Sigma)$ is$\Sigma$ .
-
Entropy of a discrete random variable
$X$ is the expected value of the information content of$X$ - is the uncertainty/randomness of$X$ :
-
Positivity:
$H(X) \geq 0$ ; -
Maximum entropy:
$H(X) \leq \log_2 n$ where$n$ is the number of possible values of$X$ ;
The entropy of a continuous random variable
The Kullback-Leibler divergence of two distributions
-
Positivity:
$D_{KL}(p||q) \geq 0$ ; -
Non-negativity:
$D_{KL}(p||q) = 0$ if and only if$p = q$ .
For continuous distributions, the KL divergence is:
-
Positivity:
$D_{KL}(p||q) \geq 0$ ; -
Non-negativity:
$D_{KL}(p||q) = 0$ if and only if$p = q$ .