Learning domain-agnostic visual representation for computational pathology using medically-irrelevant style transfer augmentation
This repository contains the code for learning robust and generalizable visual representation using unrealistic style transfer augmentation in digital pathology. We focus on a particular task of classifying colorectal cancer into distinct genetic subtypes called microsatellite status using H&E-stained FFPE histopathology images.
This code was developed and tested in the following settings.
- Ubuntu 18.04
- Nvidia GeForce RTX 2080 Ti
- captum (0.2.0)
- h5py (2.9.0)
- histomicstk (1.0.3.dev56)
- matplotlib (3.1.0)
- numpy: (1.18.1)
- pandas (0.25.3)
- pillow (7.0.0)
- pytables (3.5.1)
- python (3.6.10)
- pytorch (1.4.0)
- scikit-learn (0.21.3)
- scipy (1.3.2)
- seaborn (0.11.0)
- torchvision (0.5.0)
- tqdm (4.41.1)
-
Install Miniconda on your machine (download the distribution that comes with python3).
-
Create a conda environment with environment.yml:
conda env create -f environment.yml
- Activate the environment:
conda activate strap
- Prepare your own dataset following this repository.
- Download CRC-DX-TRAIN and CRC-DX-TEST datasets from here.
- Download the train.zip file of the Kaggle’s Painter by Numbers dataset from here.
- Download the miniImageNet dataset from here.
python create_stylized_dataset.py --content-path /path/to/content_images.hdf5 \
--style-dir /path/to/style_images --out-path /path/to/save/stylized_dataset.hdf5 \
--alpha 1.0 --content-size 1024 --style-size 256 --save-size 256
python train.py --path2hdf5 /path/to/development-dataset.hdf5 \
--save-dir directory /path/to/save/state-dicts --experiment 'style_transfer'
python train.py --path2hdf5 /path/to/development-dataset.hdf5 \
--save-dir /path/to/save/state-dicts --experiment 'stain_augmentation'
python train.py --path2hdf5 /path/to/development-dataset.hdf5 \
--save-dir /path/to/save/state-dicts --experiment 'stain_normalization'
python eval.py --data-dir /path/to/CRC-DX-TEST-dataset \
--state-dict-dir /path/to/state-dicts --experiment 'style_transfer'
python eval.py --data-dir /path/to/CRC-DX-TEST-dataset \
--state-dict-dir /path/to/state-dicts --experiment 'stain_augmentation'
python eval.py --data-dir /path/to/CRC-DX-TEST-dataset \
--state-dict-dir /path/to/state-dicts --experiment 'stain_normalization'
python decompose_frequency.py --data-dir /path/to/CRC-DX-TEST-dataset \
--save-dir /path/to/save/low-frequency-datasets
python eval_on_low_freq.py --data-dir /path/to/low-freq-CRC-DX-TEST-dataset \
--state-dict-dir /path/to/state-dicts --out-dir /path/to/save/low-frequency-results
python plot_low_freq_results.py --csv-path /path/to/low-frequency-results.csv \
--out-dir /path/to/save/low-frequency-plots
python integrated_gradients.py --data-dir /path/to/CRC-DX-TEST-dataset \
--low-data-dir /path/to/low-freq-CRC-DX-TEST-dataset --state-dict-dir /path/to/state-dicts \
--out-dir /path/to/save/integrated-gradient-plots \
--idx 25600 --kfold 1 --radius 70 --reference 'uniform'
Note: please edit paths above.
@article{yamashita2021learning,
title={Learning domain-agnostic visual representation for computational pathology using medically-irrelevant style transfer augmentation},
author={Rikiya Yamashita and Jin Long and Snikitha Banda and Jeanne Shen and Daniel L. Rubin},
journal={arXiv preprint arXiv:2102.01678},
year={2021}
}