Customer Churn Prediction

An end to end machine learning project implementation with Azure deployment

"Dinosaurs that failed to adapt went extinct. The same thing will happen to data scientists who think that training ML models inside Jupyter notebooks is enough." - Pau Labarta Bajo.

Overview

This repository is an end-to-end machine learning project that focuses on predicting customer churn. It follows a comprehensive workflow that includes data ingestion, validation, transformation, model training, and model evaluation. The project aims to develop a predictive model that can identify customers who are likely to churn, allowing businesses to take proactive measures to retain them.

Objective

Building on the foundational end to end workflow utilized in my previous project, "Prediction of Mohs Hardness", the objective of this project is to integrate the use of MLflow and DVC into my workflow. MLflow is a machine learning lifecycle management platform that enables tracking experiments, packaging code, and managing models. DVC (Data Version Control) is a version control system for machine learning projects that allows for efficient data and model versioning. By integrating MLflow and DVC, I aim to enhance my code reproducibility and efficient version control of my datasets and models.

Customer Churn and What it's all about

Customer churn refers to the phenomenon where customers stop doing business with a company or stop using its products or services. It is a critical metric for businesses, especially in industries with subscription-based models or recurring revenue streams.

Identifying customers who are likely to churn can help businesses take proactive measures to retain them, thereby reducing revenue loss and improving customer satisfaction.

Dataset

The dataset used for this project is obtained from Kaggle. It contains the following attributes:

Customer ID: A unique identifier for each customer
Surname: The customer's surname or last name
Credit Score: A numerical value representing the customer's credit score
Geography: The country where the customer resides (France, Spain, or Germany)
Gender: The customer's gender (Male or Female)
Age: The customer's age
Tenure: The number of years the customer has been with the bank
Balance: The customer's account balance
NumOfProducts: The number of bank products the customer uses (e.g., savings account, credit card)
HasCrCard: Whether the customer has a credit card (1 = yes, 0 = no)
IsActiveMember: Whether the customer is an active member (1 = yes, 0 = no)
EstimatedSalary: The estimated salary of the customer
Exited: Whether the customer has churned (1 = yes, 0 = no)

MlFlow Integration

To integrate MLflow into the project, I used Dagshub as my remote server where I can easily log and compare different experiments and also track the performance of my model.

Data Version Control (DVC)

To integrate Data Version Control (DVC) into the project, I defined a YAML file that specifies the different stages of the pipeline. Each stage has a command (cmd) that runs a Python script, dependencies (deps) that are required for the script to execute, and outputs (outs) that are generated by the script. Additionally, some stages have parameters (params) and metrics (metrics) that are used for model training and evaluation, respectively.

Here is an overview of the stages defined in the YAML file:

data_ingestion: This stage runs the stage_01_data_ingestion.py script, which is responsible for ingesting the data. The dependencies include the script itself, the data_ingestion.py component, the config.yaml file, and the output CSV file Churn_Modelling.csv.
data_validation: This stage runs the stage_02_data_validation.py script, which validates the ingested data. The dependencies include the script, the data_validation.py component, the output CSV file from the previous stage, the config.yaml file, and the schema.yaml file. The output is a status.txt file indicating the status of the validation.
data_transformation: This stage runs the stage_03_data_transformation.py script, which transforms the validated data. The dependencies include the script, the data_transformation.py component, the status.txt file from the previous stage, and the config.yaml file. The outputs include a preprocessor joblib file, and train and test CSV files.
model_training: This stage runs the stage_04_model_trainer.py script, which trains a machine learning model. The dependencies include the script, the model_trainer.py component, the train CSV file from the previous stage, and the config.yaml file. The parameters for the model training are specified in the YAML file. The output is a trained model joblib file.
model_evaluation: This stage runs the stage_05_model_evaluation.py script, which evaluates the trained model. The dependencies include the script, the model_evaluation.py component, the test CSV file from the previous stage, the trained model joblib file, and the config.yaml file. The metrics generated during the evaluation are stored in a metrics.json file.

Azure

The last step is to actually deploy this project. However, I could not deploy it because I am currently out of my student Azure subscription. If you have an Azure subscription, you can follow the steps below to deploy the project:

Create an Azure Machine Learning workspace.
Set up the necessary resources such as compute instances, storage accounts, and container registries.
Build a Docker image of the project.
Deploy the Docker image to Azure Container Instances or Azure Kubernetes Service.

This project was however deployed to Heroku. You can access the code here.

Running Locally

STEP 00 - Clone the repository

git clone https://github.com/Oyebamiji-Micheal/End-to-End-Customer-Churn-Prediction-using-MLflow-and-DVC

STEP 01 - Create a virtual environment

Windows (cmd)

cd End-to-End-Customer-Churn-Prediction-using-MLflow-and-DVC
pip install virtualenv
python -m virtualenv venv

or

python3 -m venv venv

macOS/Linux

cd End-to-End-Customer-Churn-Prediction-using-MLflow-and-DVC
pip install virtualenv
python -m virtualenv venv

STEP 02 - Activate environment

Windows (cmd)

venv\scripts\activate

macOS/Linux

. venv/bin/activate

or

source venv/bin/activate

STEP 03 - Install the Requirements

Windows/macOS/Linux

pip install -r requirements.txt

STEP 04 - Run app.py

python app.py

Now,

Open the url: http://127.0.0.1:5000/

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.github/workflows		.github/workflows
config		config
data		data
images		images
logs		logs
research		research
src		src
static		static
templates		templates
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
app.py		app.py
dvc.yaml		dvc.yaml
main.py		main.py
params.yaml		params.yaml
requirements.txt		requirements.txt
schema.yaml		schema.yaml
setup.py		setup.py
submission.csv		submission.csv
template.py		template.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Customer Churn Prediction

An end to end machine learning project implementation with Azure deployment

Table of Contents

Overview

Objective

Customer Churn and What it's all about

Dataset

MlFlow Integration

Data Version Control (DVC)

Azure

Running Locally

STEP 00 - Clone the repository

STEP 01 - Create a virtual environment

STEP 02 - Activate environment

STEP 03 - Install the Requirements

STEP 04 - Run app.py

About

Releases

Packages

Languages

License

Oyebamiji-Micheal/End-to-End-Customer-Churn-Prediction-using-MLflow-and-DVC

Folders and files

Latest commit

History

Repository files navigation

Customer Churn Prediction

An end to end machine learning project implementation with Azure deployment

Table of Contents

Overview

Objective

Customer Churn and What it's all about

Dataset

MlFlow Integration

Data Version Control (DVC)

Azure

Running Locally

STEP 00 - Clone the repository

STEP 01 - Create a virtual environment

STEP 02 - Activate environment

STEP 03 - Install the Requirements

STEP 04 - Run app.py

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages