This is a breast cancer classification project using the Wisconsin Breast Cancer Dataset (Diagnostic). The dataset contains 569 samples of malignant and benign tumor cells. The dataset contains 30 features that are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image. The goal is to predict whether the cancer is benign or malignant based on these features. The dataset can be found here.
- Note: This isn't the original dataset as I wanted to experiment and get a feel with a dataset with many features as a way to experiment with parameters, hyperparameters, etc.
This dataset contains the following attributes with 3 "sub-attributes" (if that is a way to say it):
Sub-Attributes: (worst, se standard error, and mean values)
- a) radius (mean of distances from center to points on the perimeter)
- b) texture (standard deviation of gray-scale values)
- c) perimeter
- d) area
- e) smoothness (local variation in radius lengths)
- f) compactness (perimeter^2 / area - 1.0)
- g) concavity (severity of concave portions of the contour)
- h) concave points (number of concave portions of the contour)
- i) symmetry
- j) fractal dimension ("coastline approximation" - 1)