Skip to content


Repository files navigation

Pytorch implementation Self-Rule to Adapt (SRA):

Self-Rule to Adapt: Generalized Multi-source Feature Learning Using Unsupervised Domain Adaptation for Colorectal Cancer Tissue Detection

Supervised learning is constrained by the availability of labeled data, which are especially expensive to acquire in the field of digital pathology. Making use of open-source data for pre-training or using domain adaptation can be a way to overcome this issue. However, pre-trained networks often fail to generalize to new test domains that are not distributed identically due to variations in tissue stainings, types, and textures. Additionally, current domain adaptation methods mainly rely on fully-labeled source datasets.

In this work, we propose SRA, which takes advantage of self-supervised learning to perform domain adaptation and removes the necessity of a fully-labeled source dataset. SRA can effectively transfer the discriminative knowledge obtained from a few labeled source domain's data to a new target domain without requiring additional tissue annotations. Our method harnesses both domains' structures by capturing visual similarity with intra-domain and cross-domain self-supervision. Moreover, we present a generalized formulation of our approach that allows the architecture to learn from multi-source domains. We show that our proposed method outperforms baselines for domain adaptation of colorectal tissue type classification and further validate our approach on our in-house clinical cohort. The code and models are available open-source:


Usage & requirements

In this section, we present how to use SRA to train your own architecture. Please, first clone the repo and install the dependencies.

# To clone the repo
git clone
cd SRA

# Create environment and activate it
conda create --name sra python=3.8 -y
conda activate sra

# Install pytorch 
conda install -y pytorch==1.6.0 torchvision==0.7.0 -c pytorch

# Install other packages
conda install -y matplotlib shapely tqdm tensorboard==2.3.0
pip install albumentations openslide-python
pip install git+

Pretrained models

You can download part of the model used for the publication. The pretrained version is composed of the two branches of the architecture without the linear classifier. The classification model is only composed of one branch as well as the classification (source) layer. We indicate the source and target dataset used for each training.

Arch Source Target n classes download
sra Kather19 In-house 9 pretrained classification
srma Kather19 In-house 9 pretrained classification
sra Kather19 + CRCTP In-house 10 pretrained classification
srma Kather19 + CRCTP In-house 10 pretrained classification


Step 1: Download publicly available data (source)

Here is a non-exhaustive list of the publicly available dataset of colorectal tissues:

Name #Samples #Classes Links
Kather16 5,000 8 download paper
Kather19 100,000 9 download paper
CRCTP 196,000 7 download paper

[Dec 2023] !!! The CRCTP is not publicly available anymore !!!

You can download the previous dataset using the commands:

# Create data folder
mkdir data

# Download Kather16 training/test data
wget -O
unzip && rm
mv Kather_texture_2016_image_tiles_5000 data

# Download Kather19 training data
wget -O
unzip && rm
mv NCT-CRC-HE-100K data

# Download CRCTP training/test data (Before Dec 2023)
wget -O
7z x && rm
mv Fold2 data/CRCTP

Step 2: Create your Dataset (target)

To perform domain alignment, we need to create a target set. To do so, either use your own dataset or generate one using the following script. The data_query should indicate the query to the target whole slides images (*.mrxs, *.svs, ...). The script extracts from each whole slide n_subset tiles picked at random from the foreground and saves them under export path.

python --data_query "/path/to/data/*.mrxs" --export data/GENERATED_TARGETS --n_subset 200

Step 3: Train the model

To train the model with single-source domain:

# Define variables
# Train unsupervised architecture
python --root "${DATASET_SRC}:${DATASET_TAR}" --exp_name sra_k19
# Train linear classifier on top
# Note: You can use the model provided on the google drive (checkpoint_sra_k19_inhouse.pth)
python --name="kather19" --root "${DATASET_SRC}" --loadpath=best_model_sra_k19.pth

To train the model with multi-source domain:

# Define variables
# Train unsupervised architecture
python --root="${DATASET_SRC1}:${DATASET_SRC2}:${DATASET_TAR}"  --exp_name sra_crctp_k19
# Train linear classifier on top
# Note: You can use the model provided on the google drive (checkpoint_sra_crctp_k19_inhouse.pth)
python --name="crctp-cstr+kather19" --root "${DATASET_SRC1}:${DATASET_SRC2}" --loadpath=/path/to/pretrained/model.pth

Step 4: WSIs Classification

The pre-trained models (with and without the linear classifier) are available in the pretrained moodle section. Here we show how to classify a slide taken from the TCGA cohort. The slides are available for download.

# Infer WSI using K19 label
python \
  --wsi_path TCGA-CK-6747-01Z-00-DX1.7824596c-84db-4bee-b149-cd8f617c285f.svs \
  --model_path best_model_srma_cls_k19.pth \
  --config conf_wsi_classification_k19.yaml

# Infer WSI using K19+CRCTP label
python \
  --wsi_path TCGA-CK-6747-01Z-00-DX1.7824596c-84db-4bee-b149-cd8f617c285f.svs \
  --model_path best_model_srma_cls_k19crctp.pth \
  --config conf_wsi_classification_k19crctp.yaml

To run the prediction on multiple slides, you can use unix-like queries. Be careful to use the quotes around the wsi_path argument as below.

python \
  --wsi_path "/PATH/TO/DATA/*.svs" \
  --model_path best_model_srma_cls_k19.pth \
  --config conf_wsi_classification_k19.yaml

You can find the predictions under the outputs folder.

Step 5: QuPath Visualization

You can visualize the predictions using QuPath. To do so, follow the steps:

  1. Open QuPath
  2. Open the WSIs image (*.mrsx, *.svs, ...)
  3. Select Automate->Show script editor
  4. Copy paste the script located under SRA/script_qupath/annotation_loader.groovy
  5. Run the script (Run->Run or CTRL+R).
  6. Select the json file containing the detection output. This file is generated by the script mentioned above.
  7. Enjoy

The expected output is displayed below in the results section. Note that if the detection is not showing up, please make sure you activated filled detection (Press F or view->Fill detections).


WSI Classification

The expected classification result using SRMA model on the selected slide from TCGA cohort.

Original WSI




Tumor Detection Heatmap



We present the t-SNE projection of the results of domain adaptation processes from Kather19 to our in-house dataset. Kather19 to inhouse

As well as the multi-source case CRCTP_Kather19 to inhouse

Crop Segmentation

To validate our approach on a real case scenario, we perform domain adaptation using our proposed model from Kather19 to whole slide image sections from our in-house dataset. The results are presented here, alongside the original H&E image, their corresponding labels annotated by an expert pathologist, as well as comparative results of previous approaches smoothed using conditional random fields as in L. Chan (2018). The sections were selected such that, overall, they represent all tissue types equally.

Segmentation result

Segmentation cstr result


If you use this work, please use the following citations :).

# Single-source domain adaptation
    title={Self-Rule to Adapt: Learning Generalized Features from Sparsely-Labeled Data Using Unsupervised Domain Adaptation for Colorectal Cancer Tissue Phenotyping},
    author={Christian Abbet and Linda Studer and Andreas Fischer and Heather Dawson and Inti Zlobec and Behzad Bozorgtabar and Jean-Philippe Thiran},
    booktitle={Medical Imaging with Deep Learning},

# Multi-source domain adaptation (a generalization of previous work to multi-source domains)
    title = {Self-Rule to Multi-Adapt: Generalized Multi-source Feature Learning Using Unsupervised Domain Adaptation for Colorectal Cancer Tissue Detection},
    journal = {Medical Image Analysis},
    pages = {102473},
    year = {2022},
    issn = {1361-8415},
    doi = {},
    author = {Christian Abbet and Linda Studer and Andreas Fischer and Heather Dawson and Inti Zlobec and Behzad Bozorgtabar and Jean-Philippe Thiran},

# Applicatio to Tumor-Stroma ratio quantification
   title={Toward Automatic Tumor-Stroma Ratio Assessment for Survival Analysis in Colorectal Cancer},
   author={Christian Abbet and Linda Studer and Inti Zlobec and Jean-Philippe Thiran},
   booktitle={Medical Imaging with Deep Learning},


No description, website, or topics provided.







No releases published


No packages published