MADBAL [BMVC 2023 Oral]
This is an official implementation of the paper "Maturity-Aware Active Learning for Semantic Segmentation with Hierarchically-Adaptive Sample Assessment."
Our code is based on Python 3.9.12
and uses the following Python packages.
pytorch=1.11.0
numpy=1.21.2
matplotlib=3.5.1
opencv=4.5.5
scikit-learn=1.0.2
scikit-image=0.19.2
tqdm=4.63.1
scipy=1.8.0
pillow=9.0.1
imageio=2.9.0
torchvision=0.12.0
tensorboard=2.8.0
git clone https://github.com/yazdaniamir38/MADBAL.git
cd MADBAL
We conducted our experiments on the following datasets:
-
For Cityscapes, first visit the link and login to download. Once downloaded, you need to unzip it. It's worth mentioning that we trained and validated on quarter resolution samples. [Cityscapes]
-
For PASCAL VOC 2012, the dataset will be automatically downloaded via
torchvision.datasets.VOCSegmentation
.
includes the scripts to preprocess the data for AL.
“extract_superpixels.py”: we extract superpixels via SEEDS algorithm and store the details in dictionaries with different keys like: “valid indices” and “labels”.
“clustering.py”: we fit a K-means clustering model on the superpixels. The superpixels are first fitted into a rectangular patch with size 16*16 (look at the class “super_pixel_loader_vgg_padding” for more details) and then fed to a pretrained VGG16 feature extraction network. The output is a vector of 512*1 which will be used by K-means.
“assign_clusters.py”: once the clustering model is trained, the superpixels are assigned to a cluster. This information is added to their corresponding dictionary with the key “cluster”.
“train.py”: the main script to be called when initiating MADBAL, based on given parameters in the “config.json” file, it initiates the process and handles the switching between different stages such as training phase I, training Phase II and active sample assessment.
“trainer.py”: the actual forward propagation and backward propagation for both phase I and II as well as validation happen here.
“models”: includes the scripts with class defining the architecture of our model with different backbones.
includes the scripts for active sample assessment step.
“step1.py”: we feed all the samples to the trained model and store the uncertainty scores of the pixels, superpixels and clusters.
“step2.py”: based on the calculated scores and assigned budgets to clusters, for each image we select superpixels with highest uncertainty scores, and within the selected superpixels, we select the most uncertain pixels, label them, and add them to the pool.
“inference_cityscapes.py”/“inference_VOC.py”: The scripts to test a trained model on a dataset with different methods such as sliding or multiscale predict (see the script for more details).
Please download our trained model weights from here. Once downloaded, store the weights in "checkpoint/" and run:
python inference_VOC.py --model checkpoints/weights_name.pth
python inference_cityscapes.py --model checkpoints/weights_name.pth
We report the average ± one std of mean IoU of 3 runs for both datasets.
model | backbone (encoder) | # labelled pixels per img (% annotation) | mean IoU (%) |
---|---|---|---|
MADBAL | MobileNetv2 | 20 (0.015) | 47.5 ± 0.5 |
MADBAL | MobileNetv2 | 40 (0.031) | 59.0 ± 0.3 |
MADBAL | MobileNetv2 | 60 (0.046) | 61.5 ± 0.4 |
MADBAL | MobileNetv2 | 80 (0.061) | 62.7 ± 0.2 |
MADBAL | MobileNetv2 | 100 (0.076) | 63.6 ± 0.2 |
Fully-supervised | MobileNetv2 | 256x512 (100) | 66.5 ± 0.6 |
MADBAL | MobileNetv3 | 20 (0.015) | 49.1 ± 0.4 |
MADBAL | MobileNetv3 | 40 (0.031) | 57.6 ± 0.2 |
MADBAL | MobileNetv3 | 60 (0.046) | 59.3 ± 0.3 |
MADBAL | MobileNetv3 | 80 (0.061) | 62.3 ± 0.2 |
MADBAL | MobileNetv3 | 100 (0.076) | 62.8 ± 0.1 |
Fully-supervised | MobileNetv3 | 256x512 (100) | 68.5 ± 0.4 |
MADBAL | ResNet50 | 20 (0.015) | 51.5 ± 0.5 |
MADBAL | ResNet50 | 40 (0.031) | 63.3 ± 0.2 |
MADBAL | ResNet50 | 60 (0.046) | 66.7 ± 0.3 |
MADBAL | ResNet50 | 80 (0.061) | 67.2 ± 0.1 |
MADBAL | ResNet50 | 100 (0.076) | 68.4 ± 0.3 |
Fully-supervised | ResNet50 | 256x512 (100) | 72.0 ± 0.3 |
model | backbone (encoder) | # labelled pixels per img (% annotation) | mean IoU (%) |
---|---|---|---|
MADBAL | MobileNetv3 | 10 (0.009) | 36.0 ± 0.6 |
MADBAL | MobileNetv3 | 20 (0.017) | 60.3 ± 0.5 |
MADBAL | MobileNetv3 | 30 (0.026) | 63.0 ± 0.4 |
MADBAL | MobileNetv3 | 40 (0.034) | 63.6 ± 0.3 |
Fully-supervised | MobileNetv3 | N/A (100) | 65.1 ± 0.5 |
MADBAL | ResNet50 | 10 (0.009) | 67.0 ± 0.7 |
MADBAL | ResNet50 | 20 (0.017) | 72.4 ± 0.4 |
MADBAL | ResNet50 | 30 (0.026) | 73.3 ± 0.5 |
MADBAL | ResNet50 | 40 (0.034) | 74.3 ± 0.1 |
Fully-supervised | ResNet50 | N/A (100) | 76.1 ± 0.4 |
@inproceedings{Yazdani_2023_BMVC,
author = {Yazdani, Amirsaeed and Li, Xuelu and Monga, Vishal},
title = {Maturity-Aware Active Learning for Semantic Segmentation with Hierarchically-Adaptive Sample Assessment},
booktitle = {34th British Machine Vision Conference 2022, {BMVC} 2023, Aberdeen, UK, November 20-24, 2023},
publisher = {{BMVA} Press},
year = {2023}
}
We borrowed codes heavily from https://github.com/yassouali/pytorch-segmentation and partially from https://github.com/NoelShin/PixelPick and https://github.com/cailile/Revisiting-Superpixels-for-Active-Learning.
If you need further details feel free to reach me at yazdaniamirsaeed@gmail.com.