Classification of IDC in breast cancer histology images
IDC breast cancer data is available in different repositories. We use the Invasive Ductal Carcinoma (IDC) breast cancer dataset on Kaggle which was first curated and presented by [5], [6]. This dataset consists of 277,524 50x50 image patches extracted from 162 whole slide images. The IDC(-/+) ratio is 2:1.
We need to organize the data in accordance with the Keras data directory structure (i.e. data/train/class, data/val/class). The make_directory.py script is used for this.
The learning rate is the most important tuning parameter in a deep learning system. In this exercise we look at the use of cyclical learning rates to train faster. This follows the work of Leslie Smith [1], [2] and has been popularized by its inclusion in the FastAI deep learning course and library [3].
We will report on three learning rate experiments: (i) Cyclical Learning Rate, (ii) 1cycle, (iii) Fixed. The learning rate range test is a method of calibrating a curve in order to find a good learning rate for the model. It is an important first step for the cycling schedules. We will use it to baseline all three experiments. The LR range test and LR schedules have been implemented in Keras via callbacks building on starter code available on Github [4].
[1] Leslie N. Smith. (2016). Cyclical Learning Rates for Training Neural Networks
[2] Leslie N. Smith. (2018). A disciplined approach to neural network hyper-parameters: Part 1 -- learning rate, batch size, momentum, and weight decay. CoRR abs/1803.09820 (2018)
[3] Jeremy Howard and others. (2018) fastai. On Github
[4] Brad Kentsler. (2018) CLR. On Github
[5] Janowczyk A, Madabhushi A. (2016). Deep learning for digital pathology image analysis: A comprehensive tutorial with selected use cases. J Pathol Inform. 2016.
[6] Cruz-Roa A. et. al. (2014). Proc. SPIE 9041, Medical Imaging 2014: Digital Pathology, 904103