Skip to content

A Classification Model which predicts whether a loan will be approved based on applicant information, it leverages various classification algorithms to do so, featuring data preprocessing, exploratory data analysis, and model comparison to provide insights into the loan approval process

Notifications You must be signed in to change notification settings

dvelkow/loan_prediction_classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Loan Classification Project

Welcome to my loan classification project! This repository contains my work on predicting loan approval outcomes using various machine learning techniques. This project is based on a dataset from Kaggle.

Table of Contents

Overview

In this project, I aim to predict whether a loan application will be approved based on a set of features provided in the dataset. This involves several steps including data cleaning, exploratory data analysis, feature engineering, model building, and evaluation.

Dataset Description

The dataset consists of various features related to loan applicants. Here's a quick rundown of the files:

loan.csv: The main dataset containing applicant information and loan approval status. Data_Dictionary.xlsx: An Excel file that provides detailed descriptions of the features in the dataset.

Setup Instructions

To get started with the project, you'll need to set up your Python environment with the required libraries. Here's how you can do it:

Clone the repository:

git clone https://github.com/dvelkow/loan_prediction_classification.git
cd Solution
pip install pandas numpy matplotlib seaborn scikit-learn xgboost

How to use

Place the dataset files (loan.csv and Data_Dictionary.xlsx) in the project directory. Then open and run the Jupyter notebook.

jupyter notebook Loan_Classification_Solution.ipynb

In the notebook, you'll find all the steps for data preprocessing, model training, and evaluation.

Approach

Here's a brief overview of my approach to solving the loan classification problem:

Data Preprocessing:

-Checked for and handled missing values appropriately.

-Encoded categorical variables using one-hot encoding.

-Scaled numerical features to ensure uniformity.

Exploratory Data Analysis:

-Visualized the distribution of various features.

-Identified correlations and relationships between features.

Feature Engineering:

-Created new features to improve model performance.

-Selected the most relevant features based on exploratory analysis.

Model Building:

-Experimented with several algorithms including RandomForestClassifier.

-Performed hyperparameter tuning using GridSearchCV.

-Used cross-validation to ensure model stability.

Model Evaluation:

-Evaluated models using metrics such as accuracy, precision, recall, F1-score, and ROC-AUC.

-Analyzed the confusion matrix and ROC curves to assess performance.

Results

The final model achieved an accuracy of 92% on the test dataset. For more detailed results and insights, please refer to the Jupyter notebook.

About

A Classification Model which predicts whether a loan will be approved based on applicant information, it leverages various classification algorithms to do so, featuring data preprocessing, exploratory data analysis, and model comparison to provide insights into the loan approval process

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published