Loan Dataset Cleaning

This repository contains code for cleaning and preprocessing a loan dataset using Python and pandas. The dataset is loaded into a pandas DataFrame, and various data cleaning operations are performed to ensure the dataset is suitable for analysis.

Introduction

This project focuses on cleaning a loan dataset to prepare it for analysis. The dataset is loaded using pandas, and various data cleaning techniques are applied to address issues such as duplicates, inconsistent data, missing values, and outliers.

Dataset Overview

The loan dataset consists of 13 columns, including 'UID', 'Marital_status', 'Dependents', 'Is_graduate', 'Income', 'Loan_amount', 'Term_months', 'Credit_score', 'approval_status', 'Age', 'Sex', 'Purpose', and 'Hobby'.

UID: Unique identifier for each loan application.
Marital_status: Marital status of the loan applicant (e.g., married, single, divorced).
Dependents: Number of dependents the applicant has.
Is_graduate: Indicates whether the applicant is a graduate (e.g., yes or no).
Income: The income of the loan applicant.
Loan_amount: The amount of the loan requested by the applicant.
Term_months: The term or duration of the loan in months.
Credit_score: The credit score of the applicant.
Approval_status: Indicates whether the loan was approved or not.
Age: Age of the loan applicant.
Sex: Gender of the loan applicant.
Purpose: Purpose of the loan (e.g., home purchase, education, business).
Hobby: Hobby of the loan applicant.

Data Cleaning Operations

1. Dropping Duplicates

Duplicate rows are identified and removed from the dataset, both based on the entire row and specifically on the 'UID' column.

2. Data Standardization

String values in the 'Marital_status' and 'Sex' columns are standardized by converting them to uppercase. Additionally, 'M' and 'F' values in the 'Sex' column are replaced with 'Male' and 'Female', respectively.

3. Handling Incorrect Records

Negative values in the 'Age' column are replaced with a minimum valid age of 20.

4. Handling Missing Values

Missing values in the dataset are identified and addressed:

'Loan_amount', 'Term_months', and 'Age' columns are filled with their mean values.
Missing values in the 'Is_graduate' column are filled with 'Graduate'.
Rows with any remaining missing values are dropped.

5. Converting Data Types

Categorical data types are assigned to the 'Marital_status', 'Sex', and 'Is_graduate' columns. The 'Income' column is converted to the float data type.

6. Outliers

Outliers in the 'Income' column are identified and capped at the 10th and 90th percentiles.

7. Dropping Irrelevant Columns

The 'Hobby' column is dropped from the dataset.

Requirements

pandas
NumPy

Conclusion

The dataset cleaning process ensures that the data is consistent, free of duplicates, and suitable for further analysis. The cleaned dataset is ready for exploration and modeling in subsequent steps.

Colab Link: You can use following link to view and comment on the project:

Loan Data Processing & Cleaning

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.deepsource.toml		.deepsource.toml
LICENSE		LICENSE
README.md		README.md
loan.csv		loan.csv
loan.ipynb		loan.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Loan Dataset Cleaning

Table of Contents

Introduction

Dataset Overview

Data Cleaning Operations

1. Dropping Duplicates

2. Data Standardization

3. Handling Incorrect Records

4. Handling Missing Values

5. Converting Data Types

6. Outliers

7. Dropping Irrelevant Columns

Requirements

Conclusion

About

Releases

Packages

Contributors 2

Languages

License

durgeshgowdac/loan_data_processing

Folders and files

Latest commit

History

Repository files navigation

Loan Dataset Cleaning

Table of Contents

Introduction

Dataset Overview

Data Cleaning Operations

1. Dropping Duplicates

2. Data Standardization

3. Handling Incorrect Records

4. Handling Missing Values

5. Converting Data Types

6. Outliers

7. Dropping Irrelevant Columns

Requirements

Conclusion

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages