Skip to content

CART and C4.5 decision trees, Synthetic Minority Over-sampling Techniques, and visualizations in R.

Notifications You must be signed in to change notification settings

ColleenBobbie/Cancer-Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

Predicting breast cancer using machine learning

This repository contains source code for a data exploration project which explored various machine learning techniques to ultimately predict malignant or benign breast tumours.

Dataset

The data for this project can be found here: http://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+%28diagnostic%29

Skills

This analysis relied heavily on using R packages for machine learning and compared trees compiled using CART and C4.5 techniques. The influence of balanced class design was also explored for each algorithm.

Complete Analysis

Complete analysis for this project can be found here: http://cbobbie.wixsite.com/colleenbobbie/predictingbreastcancer

Summary

  • This tutorial explored two decision tree algorithms, CART and C4.5, to help predict whether a breast mass was benign or malignant.
  • Ultimately, the C4.5 balanced class design decision tree produced the highest accuracy and lowest false negative rates.
  • The C4.5 balanced tree highlighted the largest concave point of the cell nuclei as the most influential predictor for tumour outcomes in this dataset.

About

CART and C4.5 decision trees, Synthetic Minority Over-sampling Techniques, and visualizations in R.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages