Skip to content

serlintamilselvam/logistic-regression-from-scratch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Logistic Regression using R

Logistic Regression:

Logistic regression is a machine learning classification algorithm used to assign observations to a discrete set of classes. Given a feature vector X and a qualitative response Y taking values in the set C = {C1 , . . . , Ck }, a classification task is to build a function f : X → Y (classifier) that takes as input the feature vector X and predicts the value for Y , i.e. Y ∈ C. The model or function or classifier f is built using a set of training observations (X1 , y1), . . . , (Xn , Yn) for a given n.

In logistic regression, we relate p(y), the probability of Y belonging to a certain class (which ranges between 0 and 1), to the features X1 , . . . , Xk via the logistic (or logit) transformation given by

p(y) = S(β0 + β1 * x1 + . . . + βk * xk), Where S(w) is the logistic sigmoid function given by s(w) = 1/(1+e^(-w))

Maximum Likelihood Estimation (MLE) of the Model

In logistic regression, our goal is to learn a set of parameters βT = (β0 , β1 , ... , βn ) using the available training data. For linear regression, the typical method used is the least squares estimation. Although we could use (non-linear) least squares to fit the logistic regression model, the more general method of maximum likelihood estimation (MLE) is preferred, since it has better statistical properties. The idea behind MLE is to choose the most likely values of the parameters β0 , . . . , βn given the observed sample

{(Xi1 , . . . , Xik , Yi), 1 ≤ i ≤ n}.

In logistic regression, the probability model is based on the binomial distributions:

where xi = (x1 ,..., xk) is the vector of features and 0 < pi < 1 are the probabilities associated to the binomials in the model. In other words, the probability of the feature vector xi specifying the class yi = 1 occurs with probability pi , that is

Given a dataset with n training examples and k features, then the conditional likelihood L(β) is given by

Cost/Objective function/Log Likelihood

The cost function for logistic regression is the log of conditional likelihood and it is given by

Gradient function

The gradient function to find the local maxima is obtained using the following equation

Implementation in R:

The goal of the project is to implement logistic regression classifier using gradient ascent. Gradient ascent is used to find the best weight and bias. The below algorithm is used to find the optimal weights.

Gradient_Ascent()
1. Set α ∈ [0,1] (Set learning rate)
2. Set ε > 0 (Tolerance Term)
3. β0 <- initial value
4. for t = 0, 1, ... do
5. Compute the gradient: gt = ∇l(β(t))
6. Update the coefficients: β(t+1) <- β(t) + αgt
7. Iterate until: || β(t+1) − β(t) || < ε
8. end for
9. Return the final coefficients: β(t final)

The feature variable x1 is normalized before weights are calculated and the following formulae is used to do so

DataSet

Data available at https://web.stanford.edu/~hastie/ElemStatLearn/datasets/SAheart.data. This data set is a retrospective sample of males in a heart-disease high-risk region of the Western Cape, South Africa. Many of the coronary heart disease (CHD) positive men have undergone blood pressure reduction treatment and other programs to reduce their risk factors after their CHD event. In some cases the measurements were made after these treatments. The class label indicates if the person has a coronary heart disease (negative or positive) and is hidden for our analysis. Individuals are described by the following nine variables. The continuous variables are systolic blood pressure (sbp), cumulative tobacco (tobacco), low density lipoprotein cholesterol (ldl), adiposity, obesity and current alcohol consumption (alcohol). The integer variables are type-A behavior (typea) and age at onset (age). Finally, the binary variable indicates the presence or not of heart disease in the family history(famhist).

Accuracy

The gradient ascent algorithm to find optimal weights is performed on SA heart dataset. Out of 9 different features available, low density lipoprotein cholesterol(ldl) is selected as a feature to train the model and Coronary heart disease(chd) is predicted. First 100 data is used to train the model and next 362 data is used for testing the accuracy of the model.

PARAMETER VALUES

  • Learning Rate (α) = 0.001
  • Tolerance Term (ε) = 1e-5
  • Max Iteration = 10000

An accuracy of 63% is obtained on train data(100).


An accuracy of 67.67% is obtained on test data(362).

Graph with different learning rate

The regression plot is drawn on train data with different values of learning rates. The learning rates used are 1, 0.9, 0.1, 0.001, 1e-5 and 1e-10.

REGRESSION PLOT

Gradient Convergence Analysis

By using the log likelihood function the convergence of gradient ascent is tested. The convergence is tested for various values of learning rate(1, 0.9, 0.1, 0.001, 1e-5 and 1e-10) and the maximum number of iteration is set to 100000. The loglikelihood curve for different iteration is plotted below and it is found that for learning rate values of 0.001 and 1e-5 the curve is constant after sometime. The plots are as follows:

Confusion Matrix

Confusion matrix on train data

Confusion matrix on test data

Contributors:

  1. Bhuvaneshwaran Ravi
  2. Jayashree Srinivasan
  3. Kameswaran Rangasamy
  4. Serlin Tamilselvam

Releases

No releases published

Packages

No packages published

Languages