CREDIT RISK MODEL

Live Credit Risk Predictor

https://credit-risk-brianic.herokuapp.com/

Business Problems

As a Financing Company, the user wants to build a credit scoring model to predict whether the client will default or not after their loan application.

Business Goals

Research and develop the model to predict applicants whether the applicant will default or not, and also find the best metrics since this is an imbalance class dataset.

Data Dictionary

Feature Name	Description
person_age	Age
person_income	Annual income
person_home_ownership	Home ownership
person_emp_length	Employment length (in years)
loan_intent	Loan intent
loan_amnt	Loan amount
loan_int_rate	Interest rate
loan_percent_income	Percent income by loan
cb_person_default_on_file	Historical default
cb_person_cred_hist_length	Credit history length
loan_status	Loan status

Results

    +----------------------+--------------------------------+--------------------------------+--------------------------------+
    |        		   | Train	    	   	    | Test	    	   	     | Holdout Sample	   	      | 
    | Model                +----------+----------+----------+----------+----------+----------+----------+----------+----------+
    |   		   | Recall   | F1-Score | AUC	    | Recall   | F1-Score | AUC	     | Recall   | F1-Score | AUC      |
    +----------------------+----------+----------+----------+----------+----------+----------+----------+----------+----------+	
    | Logistic Regression  | 0.525296 | 0.618740 | 0.737900 | 0.524548 | 0.621112 | 0.738705 | 0.470407 | 0.584248 | 0.717754 |
    | RandomForest	   | 0.000000 | 0.000000 | 0.500000 | 0.000000 | 0.000000 | 0.500000 | 0.000000 | 0.000000 | 0.500000 |
    | XGBoost  		   | 0.695587 | 0.813239 | 0.845633 | 0.689061 | 0.798403 | 0.839225 | 0.696387 | 0.802480 | 0.843304 |
    +----------------------+----------+----------+----------+----------+----------+----------+----------+----------+----------+

Conclusions

Since this case is an imbalanced dataset (non-default:77.7% ; default:22.3%), it's worth looking at the AUC and Recall metrics instead. Why? Especially for Recall metrics. For business purposes, we assume to minimize Type 2 (minimize False Negative -- predict non-default (0), actual default (1)). Hence, we use Recall metrics for optimum result.

It can be seen in the table above, the model which has the highest and the most stable AUC and Recall is XGBoost AUC: 0.839225 and XGBoost Recall: 0.689061.

In addition, the Recall and AUC scores on the train and test are not much different. It means that we can conclude that this model is 'just right' to classify target 1 and target 0, neither overfitting nor underfitting.

If we look back at the features importance by Logistic Regression with Lasso regularization, the selected features seem make sense. Features which affect loan_status are:

Percentage of Income ('loan_percent_income'),
Loan Amount ('loan_amnt_WOE'),
Employement Length ('person_emp_length_WOE'),
Owning Home ('person_home_ownership_OWN'),
Loan Grade ('loan_grade'),
Intention for Venture ('loan_intent_VENTURE'),
Intention for Education ('loan_intent_EDUCATION'),
Renting home ('person_home_ownership_RENT'),
Age ('person_age_WOE'),
Credit History Length ('cb_person_cred_hist_length_WOE'),
Intention for personal purposes ('loan_intent_PERSONAL'),
Intention for home improvement ('loan_intent_HOMEIMPROVEMENT').

After tuning the models and get each metrics, we could predict the holdout sample using our previous models. We see that the XGBoost algorithm shows its best performance among the others. In the holdout sample, XGBoost can reach the AUC: 0.843304 and the Recall: 0.696387. It tells us that XGBoost could be our model for production, because it's not overfitted and it can predicts the holdout sample very well.

GUIDANCE FOR INPUT AND OUTPUT FORMAT

Guidance for input and output format when access it on web.

1. Input Format

Endpoint: https://credit-risk-brianic.herokuapp.com/predict-api.

Using 'POST' method, input variables like the data dictionary above except loan status. It must be JSON like this below for example:

{
	"person_age":24,
	"person_income":168000,
	"person_home_ownership":"MORTGAGE",
	"person_emp_length":0.0,
	"loan_intent":"PERSONAL",
	"loan_grade":"E",
	"loan_amnt":25000,
	"loan_int_rate":16.45,
	"loan_percent_income":0.15,
	"cb_person_default_on_file":"N",
	"cb_person_cred_hist_length":3
	}

2. Output Format

The expected output should be like this below:

{
    "model": "XGB-Credit-Risk",
    "prediction": "87.76% Non-default",
    "version": "1.0.0"
}

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
templates		templates
Guidance for io.md		Guidance for io.md
Procfile		Procfile
README.md		README.md
app.py		app.py
credit_risk_dataset.csv		credit_risk_dataset.csv
credit_risk_deploy.ipynb		credit_risk_deploy.ipynb
credit_risk_live.png		credit_risk_live.png
predict.py		predict.py
requirements.txt		requirements.txt
runtime.txt		runtime.txt
xgb_piped_model_only		xgb_piped_model_only
xgb_piped_preps_only		xgb_piped_preps_only

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CREDIT RISK MODEL

Live Credit Risk Predictor

https://credit-risk-brianic.herokuapp.com/

Business Problems

Business Goals

Data Dictionary

Results

Conclusions

GUIDANCE FOR INPUT AND OUTPUT FORMAT

1. Input Format

2. Output Format

About

Releases

Packages

Languages

brdx88/trial_deploy_model

Folders and files

Latest commit

History

Repository files navigation

CREDIT RISK MODEL

Live Credit Risk Predictor

https://credit-risk-brianic.herokuapp.com/

Business Problems

Business Goals

Data Dictionary

Results

Conclusions

GUIDANCE FOR INPUT AND OUTPUT FORMAT

1. Input Format

2. Output Format

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages