Skip to content

Latest commit

 

History

History
5 lines (5 loc) · 1.31 KB

README.md

File metadata and controls

5 lines (5 loc) · 1.31 KB

This repository consisted a Machine Learning Model(Predictive Analysis) to predict the default rate of Lending Club. Lending Club is an American peer-to-peer lending platform connecting investor to borrower. The dataset has 396,000 observations ranging from 2007 to 2016 with data imbalanced 1-5 in favor of Fully Paid. In 2019, default borrower wiped off roughly $811 million USD from Lending Club's investors.
Since this is an imbalanced data on classification problem. The data preprocessing included Robust Scale, Standarization, QuantileTransform, SMOTENC, ADASYN and under-sampling to feed to predictive models included: LogisticRegression, AdaptiveBoosting, RandomForest, Neural Network and Extreme Gradient Boost. Overall the Adaptive Boosting seems to performed better than other models by Recall Metrics(aka. correctly classify default borrower-minimise False Negative). However, Hyperparameters and Probability Calibration provided a better result in term of F1 score and roc curve. End Notes: The million dollar question is which side should the company endorse in the trade-off, for this model, it is the trade-off between investor's return and company profitability. Key finding: Revolving Line Utilization Rate, DTI, Interest rate, Grade, Employment length, total number of credit lines are key indicators for default.