Classification of Benign and Malignant Breast Cancer using Supervised Machine Learning Algorithm Logistic Regression
Phase 0 — Data Preparation - Dataset from Kaggle is used. It contains 596 rows and 32 columns of tumor shape and specifications. The tumor is classified as benign or malignant based on its geometry and shape.
Phase 1 — Data Exploration - The dataset has 569 rows and 33 columns. All the values are non null.
Phase 2 — Encoding Categorical Data - Transform the categorical variable column (diagnosis) to a numeric type. sklearn’s LabelEncoder is used for this purpose. The M and B variables were changed to 1 and 0 by the label encoder.
Phase 3 — Feature Scaling - It fits the input data within a specific scale, like 0–100 or 0–1
Phase 4 — Model Selection - sklearn’s Logistic Regression is used to classify tumor as benign or malignant, Logistic Regression is also implemented from scratch using same Dataset in a different file which includes the following steps
-
Defining a sigmoid function
-
Defining the Loss function
-
Gradient Descent
-
A fit method which requires the learning rate and the number of iterations as the input arguments.
-
Method to predict the Output
Phase 5 — Prediction
Phase 6 — Visualization