An Analytical Model For Early Intervention Of Heart Disease, implemented in 2 stages
- data-cleaning-preprocessing.ipynb
- exploratory-data-analysis_1.ipynb
- exploratory-data-analysis_2.ipynb
- stage1-modelling.ipynb
- stage2-modelling.ipynb
This report aims to deploy data analytics to solve the business problem for National Heart Centre Singapore (NHCS). Given the increasing incidence of reported cases of cardiovascular disease (CVD) in Singapore, NHCS handles more than 120,000 outpatient consultations each year. The sudden onset of heart disease is severe and expensive to treat. Therefore, NHCS can shift the focus to early prevention rather than treating post-diagnosis.
To increase the involvement of individuals and primary care sectors in the prevention of heart disease, our team proposes a 2-step solution β HeartDetect.
- The first stage is to raise individuals' awareness and manage their heart health regularly.
- The second stage is to enable the prediction of heart disease risk in the primary care sector to provide timely prevention.
Open your terminal and run
git clone https://github.com/xJQx/bc2406-project.git
Data Cleaning and Pre-processing
a) data-cleaning-preprocessing.ipynb
Stage 1:
b) exploratory-data-analysis_1.ipynb
c) stage1-modelling.ipynb
Stage 2:
d) exploratory-data-analysis_2.ipynb
e) stage2-modelling.ipynb
View the Data Dictionary here.
Dataset created from the data-cleaning-preprocessing.ipynb
notebook:
.
βββ heart_pki_2020_original.csv # original dataset
| βββ heart_pki_2020_cleaned.csv # for EDA and visualization
| βββ heart_pki_2020_correlation.csv # for EDA correlation (IntegerEncoding done)
| βββ heart_pki_2020_encoded.csv # for analytical models (OneHotEncoding done)
|
βββ o2Saturation_original.csv # original dataset
βββ heart_attack_original.csv # original dataset
β βββ heart_attack_cleaned.csv # for EDA and analytical model (default integer encoding)
β βββ heart_attack_cleaned_text.csv # for EDA and visualization (meaningful values)
βββ|
The models directory contain all the trained models from stages 1 and 2. They can be imported and used for a dataset that fits their data dimensions.
An example of importing and using an analytical model is as shown:
# Library
import joblib
# Load the model from disk
loaded_random_forest_m3 = joblib.load('models/stage2_random_forest_m3.sav')
# Using the analytical model
result = cross_val_score(loaded_random_forest_m3, X_test, y_test, cv=5, scoring = "roc_auc").mean()
print(result)