Content for the README.md file

This README provides an overview of a machine learning pipeline for credit default prediction. The pipeline includes data preprocessing, feature selection, model selection, and evaluation of various classifiers.

Introduction

The goal of this project is to predict credit default using a machine learning approach. The following steps are followed:

Data Loading: loading data from CSV files for client information, client bureau information, loan information, and default data.
Data Cleaning and Preprocessing: Performing data cleaning and preprocessing to handle missing values, convert data types, and create new features.
Data Merging: Merging the cleaned and processed data to create a comprehensive dataset for analysis.
Feature Selection: Selecting relevant features to improve model performance and reduce dimensionality.
Model Selection: Evaluating various machine learning classifiers, including logistic regression, support vector machine, random forest, gradient boosting, decision tree, and neural network.
Model Evaluation: Assessing the performance of each classifier using accuracy, precision, recall, and F1-score metrics.

Requirements

Before running the code, ensure you have the following libraries installed:

pandas
numpy
matplotlib
seaborn
scikit-learn

You can install these libraries using pip:

pip install -r requirements.txt

Getting Started

After installing requirements just run main.py to navigate through API

Clone this repository to your local machine.

Place the following CSV files in the same directory as your Jupyter Notebook or Python script:

Client_Bureau_Information.csv Client_Information.csv Default_Data.csv Loan_Information.csv Open the Jupyter Notebook or Python script containing the code.

Run the code step by step to perform data preprocessing, feature selection, model selection, and evaluation.

Data Preprocessing

Data is loaded from CSV files for client information, client bureau information, loan information, and default data. Data cleaning and preprocessing steps include handling missing values, converting data types, and creating new features. Data from different sources are merged to create a comprehensive dataset for analysis.

Feature Selection

Features are selected to improve model performance and reduce dimensionality. Low-variance features and highly correlated features are removed. Mutual information is used to identify and eliminate low-information features.

Model Selection

Various machine learning classifiers are evaluated, including logistic regression, support vector machine, random forest, gradient boosting, decision tree, and neural network. Grid search is used to optimize hyperparameters for each classifier. The best classifiers and their parameters are stored for evaluation.

Evaluation

Model performance is assessed using accuracy, precision, recall, and F1-score metrics. A bar plot is created to visualize the performance of each classifier.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
artifacts		artifacts
data		data
notebook		notebook
src		src
templates		templates
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
Dockerfile		Dockerfile
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Content for the README.md file

Table of Contents

Introduction

Requirements

Getting Started

Data Preprocessing

Feature Selection

Model Selection

Evaluation

About

Releases

Packages

Languages

kraiyani/EFS

Folders and files

Latest commit

History

Repository files navigation

Content for the README.md file

Table of Contents

Introduction

Requirements

Getting Started

Data Preprocessing

Feature Selection

Model Selection

Evaluation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages