The normal (or Gaussian) distribution is a very common continuous probability distribution. Normal distributions are important in statistics and are often used in the natural and social sciences to represent real-valued random variables whose distributions are not known. A random variable with a Gaussian distribution is said to be normally distributed and is called a normal deviate.
Let's say:
If x is normally distributed then it may be displayed as follows.
- "~" means that "x is distibuted as ..."
Then Gaussian distribution (probability that some x may be a part of distribution with certain mean and variance) is given by:
We may use the following formulas to estimate Gaussian parameters (mean and variation) for ith feature:
- number of training examples.
So we have a training set:
We assume that each feature of the training set is normally distributed:
Then:
- Given new example x, compute p(x):
The algorithm may be evaluated using F1 score.
The F1 score is the harmonic average of the precision and recall, where an F1 score reaches its best value at 1 (perfect precision and recall) and worst at 0.
Where:
tp - number of true positives.
fp - number of false positives.
fn - number of false negatives.
- demo.m - main file that you should run from Octave console in order to see the demo.
- server_params.mat - training data set.
- estimate_gaussian.m - this function estimates the parameters of a Gaussian distribution using the data in X.
- multivariate_gaussian.m - function that computes the probability density function of the multivariate gaussian distribution.
- select_threshold.m - function that finds the best threshold (epsilon) to use for selecting outliers.
- visualize_fit.m - Function that visualizes the data set and its estimated distribution.