Case Study - Classification

What customer will accept a credit card? 🏦

Mid-course project: Data Analytics Ironhack Bootcamp, Berlin - 2021

Dolores Umbridge Team:

Karina Condeixa
Davis Pudans
Prebitha Abraham

Scenario

"You are working as a risk analyst with a bank. Apart from the other banking and loan services, the bank also provides credit card services that are a very important source of revenue for the bank. The bank wants to understand the demographics and other characteristics of its customers that accept a credit card offer and that do not accept a credit card.

Usually, the observational data for these kinds of problems is somewhat limited in that often the company sees only those who respond to an offer. To get around this, the bank designs a focused marketing study, with 18,000 current bank customers. This focused approach allows the bank to know who does and does not respond to the offer, and to use existing demographic data that is already available on each customer."

Objective

Build a model that will provide insight into why some bank customers accept credit card offers. There are also other potential areas of opportunities that the bank wants to understand from the data.

Your senior management has also posted these other questions that will help them better understand their customers.

Dataset

All the necessary files, as well as the dataset, can be found in the following repository: Mid-bootcamp project - Classification.

The data set consists of information on 18,000 current bank customers in the study. These are the definitions of data points provided:

Customer Number: A sequential number assigned to the customers (this column is hidden and excluded – this unique identifier will not be used directly).
Offer Accepted: Did the customer accept (Yes) or reject (No) the offer. R- eward: The type of reward program offered for the card.
Mailer Type: Letter or postcard.
Income Level: Low, Medium, or High.
Bank Accounts Open: How many non-credit-card accounts are held by the customer.
Overdraft Protection: Does the customer have overdraft protection on their checking account(s) (Yes or No).
Credit Rating: Low, Medium, or High.
Credit Cards Held: The number of credit cards held at the bank.
Homes Owned: The number of homes owned by the customer.
Household Size: The number of individuals in the family.
Own Your Home: Does the customer own their home? (Yes or No).
Average Balance: Average account balance (across all accounts over time). Q1, Q2, Q3, and Q4
Balance: The average balance for each quarter in the last

Statements

Look the statements

Deliveries to make

Slides deck
Oral presentation
SQL queries
A Tableau
Python code
A Tableau dashboard

What did we do?

We used a Progressive Regression model to predict which customers accept or reject credit card offers, using the bank credit card holder data set.

Python

Read data from xlsx files
Made data cleaning
EDA - exploratory data analysis with seaborn
Categorized some of the numerical data
Made data wrangling
Create a model (Normalizing numerical data, Splitting train and test data, Applying the Progressive Regression model, Evaluating accuracy and testing)
Run and evaluate the model
Applied SMOTE to reduce the imbalance
Made feature engineering (handling outliers, binning, dropping unnecessary data, multicollinearity )
Run and evaluate the model again

We explore different options seeking to find a better score.

SMOTE: "SMOTE is an oversampling technique where the synthetic samples are generated for the minority class. This algorithm helps to overcome the overfitting problem posed by random oversampling.6 Oct 2020"(source)

Tableau

Visualize the existing data
Create calculated fields to explore a deeper subset of data.
Create a dashboard visualizing data about customer that accepted the bank offer.

SQL

Import the database and creating the table
Run queries, create views to analyse the data

Slides deck
SQL queries
Tableau
Tableau dashboard
Tableau questions
Python
model iteration 1: data cleaning and data wrangling
model iteration 2: feature engineering, handiling outliers based on box plots for balance columns, dropping columns with unrelated data - mailer type, dropping columns with insufficient data - Q1 balance, dropping columns with multiloinerar data - Q2 balance
model iteration 3: upsampling - manual copying of YES records in the data set
model iteration 4: binning of Household size
model iteration 5: binning and dropping Avg balance
model iteration 6: binning and datawrangling (difference in Q1&Q2 and Q3&Q4)
model iteration 7: binning + Datawrangling - sum - q1,q2,q3,q4

General conclusions

The best experiment was the one of the hypothesis 5 where we made binning and data wrangling for Q1, Q2, Q3, Q4. This experiment meet the best accuracy score.

It dealts with the imbalances with SMOTE and removed outliers using IQR. The better hypothesis the 5th one where we did binning using Q-cut to the Household Size column and wrangled the data on columns with the quarterly balances into half-yearly balances.

Limitations

The data set had a high imbalance for the target variable (95%) which led to rather low accuracy and needed to be handled using SMOTE. Dropping outliers and creating bins slightly improved accuracy, but dropping multicollinear data mostly didn’t. Overall, the accuracy ranged from 69-72% by applying different feature engineering techniques.

Regarding Tableau, there was almost no distinct pattern that separated customers that accepted the credit card offer or not. It could be observed that customers with low income and low credit ratings are more likely to accept the offer. On the other hand, customers with higher average balance and higher average credit ratings are very unlikely to accept the offer.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.ipynb_checkpoints		.ipynb_checkpoints
images		images
python_files		python_files
sql_file		sql_file
temp		temp
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Case Study - Classification

What customer will accept a credit card? 🏦

Scenario

Objective

Dataset

Statements

Deliveries to make

What did we do?

Python

Tableau

SQL

Contents

General conclusions

Limitations

Tools

About

Releases

Packages

Contributors 2

Languages

License

KC2016/Case-Study-Classification

Folders and files

Latest commit

History

Repository files navigation

Case Study - Classification

What customer will accept a credit card? 🏦

Scenario

Objective

Dataset

Statements

Deliveries to make

What did we do?

Python

Tableau

SQL

Contents

General conclusions

Limitations

Tools

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages