Heart Disease Predictor: Project Overview

This project aims to facilitate early prediction of heart disease to aid in proactive healthcare management.

Utilizes Heart Failure Prediction Dataset from Kaggle.
Performs Exploratory Data Analysis (EDA) for initial insights.
Builds a robust transformation pipeline for data preparation.
Trains and evaluates various machine learning models using cross-validation.
Deploys a user-friendly API using Flask for heart disease prediction.

Note: This project was made for educational purposes.

Code and Resources

Python Version: 3.8
Packages: numpy, pandas, matplotlib, seaborn, scikit-learn, xgboost, flask, json, pickle
Setting Up Environment:

conda create -p venv python=3.8 -y
pip install -r requirements.txt

Dataset: https://www.kaggle.com/datasets/fedesoriano/heart-failure-prediction

Getting Data

The project utilizes the Heart Failure Prediction Dataset, obtained from Kaggle. This dataset boasts a unique origin, combining five previously independent heart disease datasets that share 11 common features. Following the merging process, duplicate observations were removed, resulting in a final dataset of 918 observations.

EDA

The EDA examined data distributions and value counts for categorical variables. Key insights from pivot tables are visualized in the following figures:

Model Building

Split the data into train and test sets with a test size of 20%
Constructed a data transformation pipeline that encodes categorical features and standardizes numerical features.
Leveraging cross-validation, we evaluated multiple models to predict heart disease, prioritizing both accuracy and training efficiency. Ultimately, the Random Forest model emerged as the optimal choice due to its superior performance.
Fine-tune the Random Forest model to achieve optimal performance.

The results of cross-validation for the models are as follows:

Productionization

Deployed a user-friendly API using Flask. The API endpoint accepts user requests and returns the predicted heart disease classification.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.ebextensions		.ebextensions
.github/workflows		.github/workflows
artifacts		artifacts
config		config
notebooks		notebooks
reports/figures		reports/figures
src/heartdisease		src/heartdisease
static/css		static/css
templates		templates
.gitignore		.gitignore
README.md		README.md
application.py		application.py
params.yaml		params.yaml
requirements.txt		requirements.txt
setup.py		setup.py
template.py		template.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Heart Disease Predictor: Project Overview

Code and Resources

Getting Data

EDA

Model Building

Productionization

About

Releases

Packages

Languages

polaternez/predicting-heart-disease

Folders and files

Latest commit

History

Repository files navigation

Heart Disease Predictor: Project Overview

Code and Resources

Getting Data

EDA

Model Building

Productionization

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages