Skip to content

Latest commit

 

History

History
77 lines (56 loc) · 4.44 KB

README.md

File metadata and controls

77 lines (56 loc) · 4.44 KB

Cardiovascular Risk Prediction Classification Project

An analysis of cardiovascular risk prediction using machine learning techniques.

Project Overview

This project focuses on predicting the 10-year risk of cardiovascular disease using demographic, clinical, and laboratory data. Various machine learning algorithms are applied and evaluated for their performance in predicting cardiovascular risk.

Python Pandas Matplotlib Seaborn Scikit-learn

Jupyter Notebook Google Colab GitHub

Logistic Regression Random Forest Classifier XGBoost KNN SVC NBClassifier

Key Findings

  • Age and Gender: Age and gender are significant risk factors for cardiovascular disease, with men being more likely to develop CHD than women.
  • Smoking: Smoking is a risk factor for CHD, and smoking intensity plays a role in determining the risk.
  • Clinical Variables: High blood pressure, stroke, and diabetes are associated with a higher risk of CHD.
  • Laboratory Variables: Patients with high cholesterol levels may be at a slightly higher risk for CHD.
  • Model Performance: Random Forest Classifier and XGBoost models performed the best, with high accuracy, precision, and recall scores.
  • Accuracy Rate: The Random Forest Classifier model achieved an accuracy rate of 90.36% in predicting cardiovascular risk.

Tools and Skills

  • Python: Used for data analysis, manipulation, and visualization.
  • Pandas: Employed for data manipulation and analysis.
  • Matplotlib and Seaborn: Utilized for data visualization to create insightful plots and graphs.
  • Scikit-learn: Implemented various machine learning algorithms for predictive modeling.

Model Performance Metrics

Model Test Accuracy Test Precision Test Recall Test ROC AUC
Logistic Regression 0.6571 0.6273 0.6945 0.6587
Random Forest Classifier 0.9036 0.8791 0.9255 0.9046
XGBoost 0.9019 0.8951 0.9000 0.9018
KNN 0.8194 0.7317 0.9818 0.8265
SVC 0.7899 0.7369 0.8709 0.7934
NBClassifier 0.5694 0.6985 0.1727 0.5523

Takeaways

  • Improved Risk Assessment: Machine learning models can provide more accurate predictions of cardiovascular risk compared to traditional risk assessment methods.
  • Early Intervention: Early identification of individuals at high risk of cardiovascular disease allows for timely intervention and preventive measures.
  • Personalized Medicine: Machine learning models can help tailor interventions and treatments based on individual risk profiles.
  • Healthcare Resource Allocation: Predictive models can assist healthcare providers in allocating resources more efficiently by targeting high-risk individuals.

Acknowledgments

Special thanks to the Framingham Heart Study for providing the dataset used in this project.

This project was completed as part of the Data Science Trainee program at AlmaBetter.

LinkedIn