This repository contains source code for a data exploration project which explored various machine learning techniques to ultimately predict malignant or benign breast tumours.
The data for this project can be found here: http://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+%28diagnostic%29
This analysis relied heavily on using R packages for machine learning and compared trees compiled using CART and C4.5 techniques. The influence of balanced class design was also explored for each algorithm.
Complete analysis for this project can be found here: http://cbobbie.wixsite.com/colleenbobbie/predictingbreastcancer
- This tutorial explored two decision tree algorithms, CART and C4.5, to help predict whether a breast mass was benign or malignant.
- Ultimately, the C4.5 balanced class design decision tree produced the highest accuracy and lowest false negative rates.
- The C4.5 balanced tree highlighted the largest concave point of the cell nuclei as the most influential predictor for tumour outcomes in this dataset.