This dataset contains 3383 mammogram images focused on breast tumors, annotated in a folder structure (70% negative, 30% positive). The dataset was exported from Roboflow, a platform for computer vision projects.
The dataset was augmented using techniques such as horizontal and vertical flips, rotations, and the application of random black rectangles on the images.
4 mammogram images after augmentation, with their labels (1 positive, 0 negative)
I used transfer learning on the following CNN architectures : ResNet50, ResNet101, EfficientNetV2 and ConvNeXt by fine-tuning the classification layers and some of the last feature layers.
Binary classification model using CNN (C-H-W)
My final model achieved an accuracy of 71% on the test dataset, with a recall of 66%, a precision of 61% and a f1 score of 64% for the positive class. These results are promising, considering the limited amount of data and the class imbalance in the dataset.
Some predictions on the test dataset (83% accuracy here)
Despite these promising results, there is still room for improvement.
One approach is to balance the data by adding more positive class images. However, it's important to be cautious about the quality and type of the added images. Indeed, tumor can be benign or malignant, and depending on its type, the cancerous cells will likely appear different. According to the results, it appears that the classifier struggles to detect benign tumors : false negatives may occur when no abnormal mass of cells is present. However, false positives can arise from detecting an abnormal mass that is not cancerous. It is challenging to assess the quality of the data without extensive medical knowledge in this field.
With a better GPU and increased RAM, it becomes possible to train deeper feature layers and conduct more extensive tests on the neural networks. However, more data is also required for this.