This repository consisted a Machine Learning Model(Predictive Analysis) to predict the default rate of Lending Club. Lending Club is an American peer-to-peer lending platform connecting investor to borrower. The dataset has 396,000 observations ranging from 2007 to 2016 with data imbalanced 1-5 in favor of Fully Paid. In 2019, default borrower wiped off roughly $811 million USD from Lending Club's investors.
Since this is an imbalanced data on classification problem. The data preprocessing included Robust Scale, Standarization, QuantileTransform, SMOTENC, ADASYN and under-sampling to feed to predictive models included: LogisticRegression, AdaptiveBoosting, RandomForest, Neural Network and Extreme Gradient Boost. Overall the Adaptive Boosting seems to performed better than other models by Recall Metrics(aka. correctly classify default borrower-minimise False Negative). However, Hyperparameters and Probability Calibration provided a better result in term of F1 score and roc curve. End Notes: The million dollar question is which side should the company endorse in the trade-off, for this model, it is the trade-off between investor's return and company profitability. Key finding: Revolving Line Utilization Rate, DTI, Interest rate, Grade, Employment length, total number of credit lines are key indicators for default.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Files

README.md

Latest commit

History

README.md

File metadata and controls