GitHub - dvelkow/loan_prediction_classification: A Classification Model which predicts whether a loan will be approved based on applicant information, it leverages various classification algorithms to do so, featuring data preprocessing, exploratory data analysis, and model comparison to provide insights into the loan approval process

Loan Classification Project

Welcome to my loan classification project! This repository contains my work on predicting loan approval outcomes using various machine learning techniques. This project is based on a dataset from Kaggle.

Overview

In this project, I aim to predict whether a loan application will be approved based on a set of features provided in the dataset. This involves several steps including data cleaning, exploratory data analysis, feature engineering, model building, and evaluation.

Dataset Description

The dataset consists of various features related to loan applicants. Here's a quick rundown of the files:

loan.csv: The main dataset containing applicant information and loan approval status. Data_Dictionary.xlsx: An Excel file that provides detailed descriptions of the features in the dataset.

Setup Instructions

To get started with the project, you'll need to set up your Python environment with the required libraries. Here's how you can do it:

Clone the repository:

git clone https://github.com/dvelkow/loan_prediction_classification.git
cd Solution
pip install pandas numpy matplotlib seaborn scikit-learn xgboost

How to use

Place the dataset files (loan.csv and Data_Dictionary.xlsx) in the project directory. Then open and run the Jupyter notebook.

jupyter notebook Loan_Classification_Solution.ipynb

In the notebook, you'll find all the steps for data preprocessing, model training, and evaluation.

Approach

Here's a brief overview of my approach to solving the loan classification problem:

Data Preprocessing:

-Checked for and handled missing values appropriately.

-Encoded categorical variables using one-hot encoding.

-Scaled numerical features to ensure uniformity.

Exploratory Data Analysis:

-Visualized the distribution of various features.

-Identified correlations and relationships between features.

Feature Engineering:

-Created new features to improve model performance.

-Selected the most relevant features based on exploratory analysis.

Model Building:

-Experimented with several algorithms including RandomForestClassifier.

-Performed hyperparameter tuning using GridSearchCV.

-Used cross-validation to ensure model stability.

Model Evaluation:

-Evaluated models using metrics such as accuracy, precision, recall, F1-score, and ROC-AUC.

-Analyzed the confusion matrix and ROC curves to assess performance.

Results

The final model achieved an accuracy of 92% on the test dataset. For more detailed results and insights, please refer to the Jupyter notebook.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Dataset		Dataset
Solution		Solution
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Loan Classification Project

Table of Contents

Overview

Dataset Description

Setup Instructions

How to use

Approach

Results

About

Releases

Packages

Languages

dvelkow/loan_prediction_classification

Folders and files

Latest commit

History

Repository files navigation

Loan Classification Project

Table of Contents

Overview

Dataset Description

Setup Instructions

How to use

Approach

Results

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages