The primary goals of this project are:
Data Cleaning and Preprocessing:
Identify and address missing values, duplicate records, and irrelevant columns. Standardize data formats by removing unnecessary suffixes and ensuring uniformity across numerical fields. Exploratory Data Analysis (EDA):
Understand the relationships between key financial variables, such as loan amounts, interest rates, and debt-to-income ratios. Visualize distributions and correlations using tools like correlation matrices and plots. Insights and Recommendations:
Derive actionable insights from the analysis to improve loan processing, risk assessment, and portfolio management. This project emphasizes clear and structured data handling, enabling more accurate analysis and decision-making in the financial domain.
- Loading and Understanding Data
- Data Cleaning and Transformation
- Analysis
- Conclusion
This project focuses on analyzing loan data to uncover meaningful insights that can aid in financial decision-making and risk assessment.
- Background: The project is based on a real-world dataset containing information about loans, borrowers, and repayments. The data is rich with details such as loan amounts, interest rates, borrower incomes, and more.
- Business Problem: The analysis aims to address challenges in loan management, such as identifying factors affecting defaults, optimizing loan approvals, and improving investor confidence by providing transparent insights into the loan portfolio.
- Dataset: The dataset includes various attributes, such as loan_amount, funded_amount, interest_rate, installments, annual_income, and debt_to_income. It also contains borrower-specific details and other metadata, some of which were cleaned or removed during preprocessing.
- Risk Factors: Loans with lower grades (E, F, G) tend to have higher loan amounts, interest rates, and DTI ratios, increasing the likelihood of default
- Income Influence: Borrowers with higher incomes are more likely to fully repay loans
- Current Loans: These loans often have higher loan amounts, funded amounts, and installments, particularly in lower grades, indicating they are still at risk of default
- Key Findings: Loans with shorter terms, higher grades, home ownership, unverified income, and a debt consolidation purpose are more likely to be fully paid
- NumPy version: 1.26.4
- Pandas version: 2.0.2
- Matplotlib version: 3.8.0
- Seaborn version: 0.12.2
- This project was created as a case study required for Executive PG Programme in Machine Learning & AI - IIIT, Bangalore
Created by [@salimbasha513] - feel free to contact me!
# LendingClubCaseStudy