Kaggle Competition:https://www.kaggle.com/c/GiveMeSomeCredit/overview
Aim: Improve on the state of the art in credit scoring by predicting the probability that somebody will experience financial distress in the next two years.
What do:
-
Preprocessed data and used random forest to impute missing values.
-
Applied optimal segmentation for continuous variables, calculated WOE and IV.
-
Replaced the data of top five important variables with WOE and constructed Logistics Regression and lgb model.
-
The AUC on test dataset arrived 86%.