Predict Customer Churn

Project Predict Customer Churn of ML DevOps Engineer Nanodegree Udacity

Project Description

Project for the Udacity course.
The aim of this part of the course is to learn about good practices for writing clean code.
This data science project predicts churn in a bank.

Inputs and outputs

Raw data needs to be provided as the .csv file bank_data.csv in the DATA_FOLDER
Folders used for inputs and outputs can be specified in constants.py:
- DATA_FOLDER: (default ./data) raw data
- IMG_FOLDER: (default ./images) exploratory data analysis (EDA) plots
- MODEL_FOLDER: (default ./models) pickled models
- RESULT_FOLDER: (default ./results) model reports, feature importance and ROC curves
- LOG_FOLDER: (default ./logs) logs
constants.py also allows to specify:
- KEEP_COLS: features used for modeling
- RESULTS_LOG: (default ./logs/churn_library.log) the file where the progress is logged
  (intended to be located within the LOG_FOLDER)
- TMP_TEST_FOLDER: (default ./tmp) folder to be used for selected tests involving file creation

Analysis

The following data science analysis is performed:

Data are loaded from the DATA_FOLDER
EDA is performed and the resulting plots are saved in the IMG_FOLDER
Features are engineered, including encoding categorical columns into proportion of churned in that category
Data are split into train and test set
Cross-validated random forest and a logistic regression are trained and saved in the MODEL_FOLDER
Predictions are generated
Model performance is evaluated and reports are saved in the RESULT_FOLDER

A log of the progress of this analysis can be found in the RESULT_LOG file

Running Files

Make sure the raw data are found in a .csv file called bank_data.csv in the DATA_FOLDER (by default ./data/bank_data.csv)

The analysis described aboved are performed by running
python churn_library.py

Unit tests for all the functions in churn_library.py are performed by running
python churn_script_logging_and_tests.py

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
data		data
images		images
models		models
results		results
tmp		tmp
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Guide.ipynb		Guide.ipynb
LICENSE		LICENSE
README.md		README.md
churn_library.py		churn_library.py
churn_notebook.ipynb		churn_notebook.ipynb
churn_script_logging_and_tests.py		churn_script_logging_and_tests.py
constants.py		constants.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predict Customer Churn

Project Description

Inputs and outputs

Analysis

Running Files

About

Releases

Packages

Languages

License

Davide-Ragazzon/udac_proj_1_churn

Folders and files

Latest commit

History

Repository files navigation

Predict Customer Churn

Project Description

Inputs and outputs

Analysis

Running Files

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages