This project provides modular implementation for state-of-the-art semantic segmentation models based on the MXNet framework and GluonCV toolkit. See MindSeg for a mirror implemented by the HUAWEI MindSpore.
-
Ease of use and extension pipeline for the semantic segmentation task, including data pre-processing, model definition, network training and evaluation.
-
Parallel training on GPUs.
-
Multiple supported models.
- Fully Convolutional Networks for Semantic Segmentation [FCN, CVPR2015, paper]
- Attention to Scale: Scale-Aware Semantic Image Segmentation [Att2Scale, CVPR2016, paper]
- Rethinking Atrous Convolution for Semantic Image Segmentation [DeepLabv3, arXiv2017, paper]
- Ladder-Style DenseNets for Semantic Segmentation of Large Natural Images [LadderDensenet, ICCVW2017, paper]
- Pyramid Scene Parsing Network [PSPNet, CVPR2017, paper]
- BiSeNet: Bilateral segmentation network for real-time semantic segmentation [BiSeNet, ECCV2018, paper]
- Encoder-decoder with atrous separable convolution for semantic image segmentation [DeepLabv3+, ECCV2018, paper]
- DenseASPP for Semantic Segmentation in Street Scenes [DenseASPP, CVPR2018, paper]
- Towards Bridging Semantic Gap to Improve Semantic Segmentation [SeENet, ICCV2019, paper]
- ACFNet: Attentional Class Feature Network for Semantic Segmentation [ACFNet, ICCV2019, paper]
- Dual Attention Network for Scene Segmentation [DANet, CVPR2019, paper]
- In Defense of Pre-trained ImageNet Architectures for Real-time Semantic Segmentation of Road-driving Images [SwiftNet, CVPR2019, paper]
- Panoptic Feature Pyramid Networks [SemanticFPN, CVPR2019, paper]
- Gated Fully Fusion for Semantic Segmentation [GFFNet, AAAI2020, paper]
- Attention-guided Chained Context Aggregation for Semantic Segmentation [CANetv1, IMAVIS2021, paper]
- EPRNet: Efficient Pyramid Representation Network for Real-Time Street Scene Segmentation [EPRNet, TITS2021, paper]
- AttaNet: Attention-Augmented Network for Fast and Accurate Scene Parsing [AttaNet, AAAI2021, paper]
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale [ViT, ICLR2021, paper]
- Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers [SETR, CVPR2021, paper]
- FaPN: Feature-aligned Pyramid Network for Dense Image Prediction [FaPN, ICCV2021, paper]
- AlignSeg: Feature-Aligned Segmentation Networks [AlignSeg, TPAMI2021, paper]
- Compensating for Local Ambiguity with Encoder-Decoder in Urban Scene Segmentation [CANetv2, TITS2022, paper]
We note that:
- OS is output stride of the backbone network.
- * denotes multi-scale and flipping testing, otherwise single-scale inputs.
- No whistles and bells are adopted, e.g. OHEM or multi-grid.
Model | Backbone | OS | #Params | TrainSet | EvalSet | mIoU | *mIoU |
---|---|---|---|---|---|---|---|
BiSeNet | ResNet18 | 32 | 13.2M | train_fine | val | 71.6 | 74.7 |
BiSeNet | ResNet18 | 32 | 13.2M | trainval_fine | test | - | 74.8 |
FCN | ResNet18 | 32 | 12.4M | train_fine | val | 64.9 | 68.1 |
FCN | ResNet18 | 8 | 12.4M | train_fine | val | 68.3 | 69.9 |
FCN | ResNet50 | 8 | 28.4M | train_fine | val | 71.7 | - |
FCN | ResNet101 | 8 | 47.5M | train_fine | val | 74.5 | - |
PSPNet | ResNet101 | 8 | 56.4M | train_fine | val | 78.2 | 79.5 |
DeepLabv3 | ResNet101 | 8 | 58.9M | train_fine | val | 79.3 | 80.0 |
DenseASPP | ResNet101 | 8 | 69.4M | train_fine | val | 78.7 | 79.8 |
DANet | ResNet101 | 8 | 66.7M | train_fine | val | 79.7 | 80.9 |
Model | Backbone | OS | TrainSet | EvalSet | PA | mIoU | *PA | *mIoU |
---|---|---|---|---|---|---|---|---|
PSPNet | ResNet101 | 8 | train | val | 80.1 | 42.9 | 80.9 | 43.7 |
Model | Backbone | OS | TrainSet | EvalSet | PA | mIoU | *PA | *mIoU |
---|---|---|---|---|---|---|---|---|
FCN | ResNet101 | 8 | train_aug | val | 94.4 | 74.6 | 94.5 | 75.0 |
Att2Scale | ResNet101 | 8 | train_aug | val | 94.8 | 77.1 | - | - |
PSPNet | ResNet101 | 8 | train_aug | val | 95.1 | 78.1 | 95.3 | 78.5 |
DeepLabv3 | ResNet101 | 8 | train_aug | val | 95.5 | 80.1 | 95.6 | 80.4 |
DeepLabv3+ | ResNet101 | 8 | train_aug | val | 95.5 | 79.9 | 95.6 | 80.1 |
Model | Backbone | OS | TrainSet | EvalSet | PA | mIoU | *PA | *mIoU |
---|---|---|---|---|---|---|---|---|
FCN | ResNet101 | 8 | train | val | 69.2 | 39.7 | 70.2 | 41.0 |
PSPNet | ResNet101 | 8 | train | val | 71.3 | 43.0 | 71.9 | 43.6 |
DeepLabv3+ | ResNet101 | 8 | train | val | 73.5 | 46.0 | 74.3 | 47.2 |
We adopt python 3.6.2 and CUDA 10.1 in this project.
-
Prerequisites
pip install -r requirements.txt
Note that we employ wandb for log and visualization. Refer to here for a QuickStart.
-
Detail API for Pascal Context dataset
-
Configure hyper-parameters in
./mxnetseg/config.yml
-
Run the
./mxnetseg/train.py
scriptpython train.py --ctx 0 1 2 3 --wandb wandb-demo
-
During training, the program will automatically create a sub-folder
./weights/{model_name}
to save model checkpoints/parameters.
Simply run the ./mxnetseg/eval.py
with arguments need to be specified
python eval.py --model FCNResNet --backbone resnet18 --checkpoint fcn_resnet18_Cityscapes_20191900_310600_best.params --ctx 0 --data Cityscapes --crop 768 --base 2048 --mode val --ms
About the mode
:
val
: to get mIoU and PA metrics on the validation set.test
: to get colored predictions on the test set.testval
: to get colored predictions on the validation set.
Please kindly cite our paper if you feel our codes help in your research.
@article{tang2021attention,
title={Attention-guided chained context aggregation for semantic segmentation},
author={Tang, Quan and Liu, Fagui and Zhang, Tong and Jiang, Jun and Zhang, Yu},
journal={Image and Vision Computing},
pages={104309},
year={2021},
publisher={Elsevier}
}
@article{tang2021eprnet,
title={EPRNet: Efficient Pyramid Representation Network for Real-Time Street Scene Segmentation},
author={Tang, Quan and Liu, Fagui and Jiang, Jun and Zhang, Yu},
journal={IEEE Transactions on Intelligent Transportation Systems},
year={2021},
doi={10.1109/TITS.2021.3066401},
publisher={IEEE}
}
@article{tang2022compe,
title={Compensating for Local Ambiguity With Encoder-Decoder in Urban Scene Segmentation},
author={Tang, Quan and Liu, Fagui and Zhang, Tong and Jiang, Jun and Zhang, Yu and Zhu, Boyuan and Tang, Xuhao},
journal={IEEE Transactions on Intelligent Transportation Systems},
year={2022},
doi={10.1109/TITS.2022.3157128},
publisher={IEEE}
}