This is an exercise in C programming language.
I recommend a professional library for production use.
Make sure to copy the command in env.sh
and run in the shell before running main.
The learning rate -- alpha -- is dynamically calculated in main.c. The speed of linear regression depending on the learning rate and number of iterations.
To establish notation for future use, we'll use
To describe the supervised learning problem slightly more formally, our goal
is, given a training set, to learn a function
When the target variable that we're trying to predict is continuous, such as
in our housing example, we call the learning problem a regression problem.
When
We can measure the accuracy of our hypothesis function by using a cost function. This takes an average difference (actually a fancier version of an average) of all the results of the hypothesis with inputs from x's and the actual output y's.
This is an example of cost function for 2 features.
When specifically applied to the case of linear regression, a new form of the gradient descent equation can be derived. We can substitute our actual cost function and our actual hypothesis function and modify the equation to :
repeat until convergence: {
}
in other words:
repeat until convergence:{
}
Where
Note that we have separated out the two cases for
The point of all this is that if we start with a guess for our hypothesis and then repeatedly apply these gradient descent equations, our hypothesis will become more and more accurate.
So, this is simply gradient descent on the original cost function
The ellipses shown above are the contours of a quadratic function. Also shown
is the trajectory taken by gradient descent, which was initialized at (48,30).
The
Gradient descent gives one way of minimizing
There is no need to do feature scaling with the normal equation.
The following is a comparison of gradient descent and the normal equation:
Gradient Descent | Normal Equation |
---|---|
Need to choose alpha | No need to choose alpha |
Needs many iterations | No need to iterate |
|
|
Works well when |
Slow if |
With the normal equation, computing the inversion has complexity