Breast cancer is one of the most common cancers affecting women worldwide, with significant implications for public health. Early detection and accurate prognosis are critical for improving survival rates and quality of life for patients. Understanding the factors that influence prognosis can aid healthcare providers in making informed treatment decisions and developing personalized care plans.
This repository contains the code developed for the MSB1015 project, which aims to create a model for predicting the prognosis of breast cancer patients. Random Forest was used as the classification method. Through data exploration and testing various variable combinations, it was found that the categorical data from the breast cancer dataset yields the best performance for the prognosis model.
Please ensure that the functions and datasets are kept in the same folder as the code. Note that the imputed data is provided in the data folder, as the imputation process requires significant time.
More information on the data at: https://www.kaggle.com/datasets/raghadalharbi/breast-cancer-gene-expression-profiles-metabric/data