Telco Customer Chun

Kaggle dataset

GOAL: predict behavior to retain customers.

Architecture

EDA

The features:

gender (Male, Female)
SeniorCitizen (Yes, No)
Partner (Yes, No)
Dependents (Yes, No)
tenure
PhoneService (Yes, No)
MultipleLines (Yes, No, No phone service)
InternetService (DSL, No, Fiber optic)
OnlineSecurity (Yes, No, No internet service)
OnlineBackup (Yes, No, No internet service)
DeviceProtection (Yes, No, No internet service)
TechSupport (Yes, No, No internet service)
StreamingTV (Yes, No, No internet service)
StreamingMovies (Yes, No, No internet service)
Contract (Month-to-month, One year, Two year)
PaperlessBilling (Yes, No)
PaymentMethod (Bank transfer (automatic), Mailed check, Electronic check, Credit card (automatic))
MonthlyCharges
TotalCharges
Churn

gender - gender does not affect the client's decision
Senior Citizen - older people are more likely to refuse services
Partner & Dependents - clients in relationships, as well as clients with children, are less likely to refuse services. Perhaps the company will present favorable family tariffs
InternetService - customers with fiber optic more often refuse services. Customers who do not use the Internet very rarely refuse
OnlineSecurity, OnlineBackup & DeviceProtection - clients who use protection systems, as well as those who use cloud storage, are more likely to refuse. Competitors also have favorable package offers with additional services
TechSupport - customers who do not contact technical support are more likely to refuse
Contract - logical, clients with a short-term contract leave more often
PaperlessBilling & PaperlessBilling - customers who receive and pay bills in a conservative way are less likely to change service providers
Clients who have about 3-6 services most often churn.

CatBoost

Plot created with:

cat_model = CatBoostClassifier(verbose=False,
                               random_state=RANDOM_SEED,
                               custom_loss=['AUC', 'Accuracy',  'Precision', 'Recall', 'F1'])
cat_model.fit(X_train, y_train,
              eval_set=(X_test, y_test),
              plot=True)

After the 92nd iteration, overfitting begins, the logloss value is 0.42.
The AUC value at the 108th iteration is 0.8373. After about the 300th iteration the AUC value less than 0.8.
Max value of accuracy at the 330th iteration (0.79). Also at about 50-500 iterations the value is around 0.79, then decreases slightly.
The maximum value of precision is fixed at the 7th iteration (0.66), during the first 500 iterations it is around 0.63, then drops rapidly.
The value of the recall at the 450th iteration is 0.52.
The F1-score at the 479th iteration is 0.57.

Default value:

Learning Rate - 0.04697500169277191
subsample - 0.800000011920929
depth - 6
min_data_in_leaf - 1
max_leaves - 64

Obviously, 1000 default iterations is too much for this data. It semms the optimal number is no more than 500
Since the model is starting to overfitting, the Learning Rate is optimal in the range [0.01, 0.02, 0.03, 0.04, 0.05]

CatBoost with Optuna

Total metrics:

	Default CatBoost	CatBoost tuned with Optuna
Accuracy:	0.7886	0.7929
Precision:	0.4938	0.4991
Recall:	0.6310	0.6422
F1_score:	0.5540	0.5617

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
pictures		pictures
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
telco_customer_churn.ipynb		telco_customer_churn.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Telco Customer Chun

Architecture

EDA

CatBoost

CatBoost with Optuna

Feature Importance

About

Releases

Packages

Languages

volk-anastasia/Telco_Customer_Churn

Folders and files

Latest commit

History

Repository files navigation

Telco Customer Chun

Architecture

EDA

CatBoost

CatBoost with Optuna

Feature Importance

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages