Name		Name	Last commit message	Last commit date
parent directory ..
formulas		formulas
README.md		README.md
demo.m		demo.m
estimate_gaussian.m		estimate_gaussian.m
multivariate_gaussian.m		multivariate_gaussian.m
select_threshold.m		select_threshold.m
servers_params.mat		servers_params.mat
visualize_fit.m		visualize_fit.m

README.md

Anomaly Detection Using Gaussian Distribution

Gaussian (Normal) Distribution

The normal (or Gaussian) distribution is a very common continuous probability distribution. Normal distributions are important in statistics and are often used in the natural and social sciences to represent real-valued random variables whose distributions are not known. A random variable with a Gaussian distribution is said to be normally distributed and is called a normal deviate.

Let's say:

If x is normally distributed then it may be displayed as follows.

- mean value,

- variance.

- "~" means that "x is distibuted as ..."

Then Gaussian distribution (probability that some x may be a part of distribution with certain mean and variance) is given by:

Estimating Parameters for a Gaussian

We may use the following formulas to estimate Gaussian parameters (mean and variation) for i^th feature:

- number of training examples.

- number of features.

Density Estimation

So we have a training set:

We assume that each feature of the training set is normally distributed:

Then:

Anomaly Detection Algorithm

Choose features that might be indicative of anomalous examples ().
Fit parameters using formulas:

Given new example x, compute p(x):

Anomaly if

- probability threshold.

Algorithm Evaluation

The algorithm may be evaluated using F1 score.

The F1 score is the harmonic average of the precision and recall, where an F1 score reaches its best value at 1 (perfect precision and recall) and worst at 0.

Where:

tp - number of true positives.

fp - number of false positives.

fn - number of false negatives.

Files

demo.m - main file that you should run from Octave console in order to see the demo.
server_params.mat - training data set.
estimate_gaussian.m - this function estimates the parameters of a Gaussian distribution using the data in X.
multivariate_gaussian.m - function that computes the probability density function of the multivariate gaussian distribution.
select_threshold.m - function that finds the best threshold (epsilon) to use for selecting outliers.
visualize_fit.m - Function that visualizes the data set and its estimated distribution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

anomaly-detection

anomaly-detection

README.md

Anomaly Detection Using Gaussian Distribution

Gaussian (Normal) Distribution

Estimating Parameters for a Gaussian

Density Estimation

Anomaly Detection Algorithm

Algorithm Evaluation

Files

Demo visualizations

References

Files

anomaly-detection

Directory actions

More options

Directory actions

More options

Latest commit

History

anomaly-detection

Folders and files

parent directory

README.md

Anomaly Detection Using Gaussian Distribution

Gaussian (Normal) Distribution

Estimating Parameters for a Gaussian

Density Estimation

Anomaly Detection Algorithm

Algorithm Evaluation

Files

Demo visualizations

References