This repository contains code for a loan classification model that predicts whether a loan will be approved ('Y') or not ('N') based on various features. The code includes preprocessing steps, model training using k-Nearest Neighbors (k-NN), Support Vector Machine (SVM), and Decision Tree Classifier, along with insights derived from the analysis.
- k-Nearest Neighbors (k-NN)
- The k-NN algorithm predicts the classification of a data point by identifying the majority class among its k nearest neighbors in the feature space.
- Support Vector Machine (SVM)
- SVM is a supervised machine learning algorithm used for classification tasks. It finds the hyperplane that best separates classes in a high-dimensional space.
- Decision Tree Classifier
- Decision trees create models that predict the value of a target variable by learning simple decision rules inferred from the features.
- Gender Encoding: The 'Gender' feature was transformed into binary values (0 and 1) to facilitate classification.
- One-Hot Encoding: The 'Property_Area' feature underwent one-hot encoding to convert categorical data into a format suitable for analysis.
- Qualitative to Binary Conversion: Columns like 'Married', 'Education', 'Self_Employed', and 'Loan_Status' were converted to binary (0/1) values for model training.
- Train-Test Split: The dataset was split into training and testing sets (80/20 split) to train and evaluate the models.
- Model Performance: k-NN, SVM, and Decision Tree Classifier models were trained and evaluated using confusion matrices to assess their performance.
- Standardization: Features were standardized using StandardScaler to ensure all features contributed equally to model training.
- Tree Visualization: A Decision Tree Classifier with a maximum depth of 3 was created and visualized to understand the decision-making process.
To run the project, you have a few options:
-
Google Colab via Gist: Use the Google Colab environment via the Gist website and include the learning data from the GitHub project resources. Gist Link
-
Local Environment:
- DataSpell or PyCharm by JetBrains: Download the repository branch and launch the project locally using DataSpell or PyCharm by JetBrains. Ensure dataset files are inside your project directory.
- Spyder IDE: Alternatively, you can launch the project using the Spyder IDE. Remember to have the dataset files within your project directory to avoid any issues.
By following these steps, you can execute and explore the loan classification model in your preferred environment.
The analysis provided valuable insights into the loan approval process. Features like marital status, education, and employment status seem to have a significant influence on loan approvals. The decision tree visualization highlighted key factors considered in the loan approval process, providing a clear view of the decision-making pathway.
This model can serve as a starting point for further refinements and improvements in predicting loan approvals, aiding financial institutions in making informed decisions.
Based on that there can be defined following metrics: sensivity - percentage that real true was classified as true
precission - percentage that prediction was accurate
specificity - percentage that the false was correctly specified as false
accuracy - percentage that all classifications were correct