In this project, I have used Exploratory Data Analysis (EDA) to predict loan default among finance company customers. I addressed missing data, identified outliers, analyzed data imbalance, conducted various analyses on customer and loan attributes, and identified top correlations for loan default. These insights informed better loan approval and risk assessment decisions.
It involved comprehensive data analysis of a loan application dataset, focusing on tasks that are essential for ensuring data quality, identifying outliers, assessing data balance, and exploring relationships between variables.
A. Handling Missing Data 🧩
Identified missing data using Excel functions like COUNT, ISBLANK, and IF. Employed Excel built-in features to decide on appropriate methods for dealing with missing data, including imputation using AVERAGE or MEDIAN. B. Identifying Outliers 🚀
Detected outliers in the dataset by utilizing Excel statistical functions like QUARTILE and IQR. Employed conditional formatting and thresholds to identify potential outliers, ensuring data quality. C. Analyzing Data Imbalance ⚖️
Determined data imbalance in the loan application dataset by calculating the ratio of data imbalance using Excel functions like COUNTIF and SUM. Visualized the distribution of the target variable through pie charts or bar charts to highlight class imbalance. D. Exploring Data Attributes 📊
Conducted univariate analysis to understand variable distributions using Excel functions like COUNT, AVERAGE, and MEDIAN. Utilized Excel's sorting, filtering, and pivot tables for segmented univariate and bivariate analysis, comparing variable distributions and exploring relationships with the target variable. E. Identifying Top Correlations 🔍
Segmented the dataset based on different scenarios (e.g., clients with payment difficulties) and identified the top correlations for each segment using Excel functions like CORREL. Visualized the correlations between variables within each segment using correlation matrices. This project aimed to provide valuable insights into the factors influencing loan default, ensuring data quality, and addressing issues such as missing data, outliers, and data imbalance. The analysis results can be used to guide decision-making and develop predictive models for loan applications. 📈