GDA is a generative learning algorithm, where it learns P(x|y) rather than, learning the mapping function between features(x) and labels(y) i.e. what discriminative algorithms do. In this type of algorithm, we try to model the distribution of the features knowing their source(labels) assuming they come from a Gaussian distribution.
What is Gaussian Distribution?
It is a classic distribution over single scalar random variable 'x', parameterized
by mean(mu) and standard deviation(sigma). Which looks like a typical bell
curve. It's probability density function is as follows:
What is Multivariate Gaussian?
A multivariate gaussian is a generalization of the gaussian defined over one
dimensional random variable, to multiple random variable at the same time. These
are vector valued random variable rather than univariate random variable.
It's probability density function is as follows:
A multivariate gaussian in 2 dimension would look something like this, where the right hand side image shows the contour plot of the gaussian.
Multivariate gaussian is parameterized by mean(mu) which controls the location of the gaussian and covariance matrix(sigma) which controls the shape of the gaussian.
How to fit the training set?
In order to fit these parameters, we need to maximize the joint likelihood. Once
we do this we would have the maximum likelihood estimates for mu and sigma.
Refer this link for the derivation
of maximizing the joint likelihood.
In the above code, GDA is used to perform bi-class classification, by modelling class A and class B separately. Once we fit Multivariate Gaussians to each class individually, we can get the probability of any new data point from the probability density function.
A two dimensional dataset with 100 data points from class A and 100 data points from class B, where the red points show their mean.
After fitting Gaussians to both class independently, we can get an approximation of the decision boundary as show in the contour plot.
Better visualization of how GDA fits gaussians to the dataset would be as follows.
Note: This plot is not for the provided dataset. This is to get an picture as of how GDA fits gaussians to distribution.