diff --git a/README.md b/README.md index bf2c5302fa..9e08c8611e 100644 --- a/README.md +++ b/README.md @@ -8,6 +8,8 @@ English | [简体中文](README_CN.md) ![python version](https://img.shields.io/badge/python-3.6+-orange.svg) ![support os](https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-yellow.svg) + *[2020-12-18] PaddleSeg has released the v2.0.0-rc version, which supports the dynamic graph by default. The static-graph codes have been moved to [legacy](./legacy). See detailed [release notes](./docs/release_notes.md).* + ![demo](./docs/images/cityscapes.gif) Welcome to PaddleSeg! PaddleSeg is an end-to-end image segmentation development kit developed based on [PaddlePaddle](https://www.paddlepaddle.org.cn), which covers a large number of high-quality segmentation models in different directions such as *high-performance* and *lightweight*. With the help of modular design, we provide two application methods: *Configuration Drive* and *API Calling*. So one can conveniently complete the entire image segmentation application from training to deployment through configuration calls or API calls. @@ -41,14 +43,16 @@ Welcome to PaddleSeg! PaddleSeg is an end-to-end image segmentation development |[Att U-Net](./configs/attention_unet)|-|-|-|-| |[U-Net++](./configs/unet_plusplus)|-|-|-|-| |[DecoupledSegNet](./configs/decoupled_segnet)|✔|✔||| - +|[EMANet](./configs/emanet)|✔|✔|-|-| +|[ISANet](./configs/isanet)|✔|✔|-|-| +|[DNLNet](./configs/dnlnet)|✔|✔|-|-| ## Dataset - [x] Cityscapes - [x] Pascal VOC - [x] ADE20K -- [ ] Pascal Context -- [ ] COCO stuff +- [x] Pascal Context +- [x] COCO stuff ## Installation @@ -102,3 +106,24 @@ python train.py --config configs/quick_start/bisenet_optic_disc_512x512_1k.yml * Thanks [jm12138](https://github.com/jm12138) for contributing U2-Net. * Thanks [zjhellofss](https://github.com/zjhellofss) (Fu Shenshen) for contributing Attention U-Net, and Dice Loss. * Thanks [liuguoyu666](https://github.com/liguoyu666) for contributing U-Net++. + +## Citation +If you find our project useful in your research, please consider citing: + +```latex +@misc{liu2021paddleseg, + title={PaddleSeg: A High-Efficient Development Toolkit for Image Segmentation}, + author={Yi Liu and Lutao Chu and Guowei Chen and Zewu Wu and Zeyu Chen and Baohua Lai and Yuying Hao}, + year={2021}, + eprint={2101.06175}, + archivePrefix={arXiv}, + primaryClass={cs.CV} +} + +@misc{paddleseg2019, + title={PaddleSeg, End-to-end image segmentation kit based on PaddlePaddle}, + author={PaddlePaddle Authors}, + howpublished = {\url{https://github.com/PaddlePaddle/PaddleSeg}}, + year={2019} +} +``` diff --git a/README_CN.md b/README_CN.md index fc8c142da7..0b6bdeefff 100644 --- a/README_CN.md +++ b/README_CN.md @@ -8,6 +8,8 @@ ![python version](https://img.shields.io/badge/python-3.6+-orange.svg) ![support os](https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-yellow.svg) + *[2020-12-18] PaddleSeg发布2.0.0rc版,动态图正式成为主目录。静态图已经被移至[legacy](./legacy)子目录下。更多信息请查看详细[更新日志](./docs/release_notes_cn.md)。* + ![demo](./docs/images/cityscapes.gif) PaddleSeg是基于飞桨[PaddlePaddle](https://www.paddlepaddle.org.cn)开发的端到端图像分割开发套件,涵盖了**高精度**和**轻量级**等不同方向的大量高质量分割模型。通过模块化的设计,提供了**配置化驱动**和**API调用**两种应用方式,帮助开发者更便捷地完成从训练到部署的全流程图像分割应用。 @@ -41,14 +43,17 @@ PaddleSeg是基于飞桨[PaddlePaddle](https://www.paddlepaddle.org.cn)开发的 |[Att U-Net](./configs/attention_unet)|-|-|-|-| |[U-Net++](./configs/unet_plusplus)|-|-|-|-| |[DecoupledSegNet](./configs/decoupled_segnet)|✔|✔||| +|[EMANet](./configs/emanet)|✔|✔|-|-| +|[ISANet](./configs/isanet)|✔|✔|-|-| +|[DNLNet](./configs/dnlnet)|✔|✔|-|-| ## 数据集 - [x] Cityscapes - [x] Pascal VOC - [x] ADE20K -- [ ] Pascal Context -- [ ] COCO stuff +- [x] Pascal Context +- [x] COCO stuff ## 安装 @@ -94,8 +99,34 @@ python train.py --config configs/quick_start/bisenet_optic_disc_512x512_1k.yml * [API参考](./docs/apis) * [添加新组件](./docs/add_new_model.md) +## 联系我们 +* 如果你发现任何PaddleSeg存在的问题或者是建议, 欢迎通过[GitHub Issues](https://github.com/PaddlePaddle/PaddleSeg/issues)给我们提issues。 +* 同时欢迎加入PaddleSeg技术交流群:850378321(QQ群1)或者793114768(QQ群2)。 + ## 代码贡献 * 非常感谢[jm12138](https://github.com/jm12138)贡献U2-Net模型。 * 非常感谢[zjhellofss](https://github.com/zjhellofss)(傅莘莘)贡献Attention U-Net模型,和Dice loss损失函数。 * 非常感谢[liuguoyu666](https://github.com/liguoyu666)贡献U-Net++模型。 + +## 学术引用 + +如果我们的项目在学术上帮助到你,请考虑以下引用: + +```latex +@misc{liu2021paddleseg, + title={PaddleSeg: A High-Efficient Development Toolkit for Image Segmentation}, + author={Yi Liu and Lutao Chu and Guowei Chen and Zewu Wu and Zeyu Chen and Baohua Lai and Yuying Hao}, + year={2021}, + eprint={2101.06175}, + archivePrefix={arXiv}, + primaryClass={cs.CV} +} + +@misc{paddleseg2019, + title={PaddleSeg, End-to-end image segmentation kit based on PaddlePaddle}, + author={PaddlePaddle Authors}, + howpublished = {\url{https://github.com/PaddlePaddle/PaddleSeg}}, + year={2019} +} +``` diff --git a/configs/README.md b/configs/README.md index 2ae9a8f2f1..144caebbbf 100644 --- a/configs/README.md +++ b/configs/README.md @@ -47,7 +47,7 @@ > 损失函数 > * 参数 > * types : 损失函数列表 -> * type : 损失函数类型,目前只支持CrossEntropyLoss +> * type : 损失函数类型,所支持值请参考损失函数库 > * coef : 对应损失函数列表的系数列表 ---- diff --git a/configs/_base_/coco_stuff.yml b/configs/_base_/coco_stuff.yml new file mode 100644 index 0000000000..d57fbbf22e --- /dev/null +++ b/configs/_base_/coco_stuff.yml @@ -0,0 +1,45 @@ +batch_size: 4 +iters: 80000 + +train_dataset: + type: CocoStuff + dataset_root: data/cocostuff/ + transforms: + - type: ResizeStepScaling + min_scale_factor: 0.5 + max_scale_factor: 2.0 + scale_step_size: 0.25 + - type: RandomPaddingCrop + crop_size: [520, 520] + - type: RandomHorizontalFlip + - type: RandomDistort + brightness_range: 0.4 + contrast_range: 0.4 + saturation_range: 0.4 + - type: Normalize + mode: train + +val_dataset: + type: CocoStuff + dataset_root: data/cocostuff/ + transforms: + - type: Normalize + mode: val + + +optimizer: + type: sgd + momentum: 0.9 + weight_decay: 4.0e-5 + +learning_rate: + value: 0.01 + decay: + type: poly + power: 0.9 + end_lr: 0.0 + +loss: + types: + - type: CrossEntropyLoss + coef: [1] diff --git a/configs/_base_/pascal_context.yml b/configs/_base_/pascal_context.yml new file mode 100644 index 0000000000..85f70387b3 --- /dev/null +++ b/configs/_base_/pascal_context.yml @@ -0,0 +1,50 @@ +batch_size: 4 +iters: 40000 + +train_dataset: + type: PascalContext + dataset_root: data/VOC2010/ + transforms: + - type: ResizeStepScaling + min_scale_factor: 0.5 + max_scale_factor: 2.0 + scale_step_size: 0.25 + - type: RandomPaddingCrop + crop_size: [520, 520] + - type: RandomHorizontalFlip + - type: RandomDistort + brightness_range: 0.4 + contrast_range: 0.4 + saturation_range: 0.4 + - type: Normalize + mode: train + +val_dataset: + type: PascalContext + dataset_root: data/VOC2010/ + transforms: + - type: Padding + target_size: [520, 520] + - type: Normalize + mode: val + + +optimizer: + type: sgd + momentum: 0.9 + weight_decay: 4.0e-5 + +learning_rate: + value: 0.001 + decay: + type: poly + power: 0.9 + end_lr: 0.0 + +loss: + types: + - type: CrossEntropyLoss + coef: [1] + + + diff --git a/configs/bisenet/README.md b/configs/bisenet/README.md index 55b7eeac72..e5640308d6 100644 --- a/configs/bisenet/README.md +++ b/configs/bisenet/README.md @@ -2,7 +2,7 @@ ## Reference -> Yu C, Gao C, Wang J, et al. BiSeNet V2: Bilateral Network with Guided Aggregation for Real-time Semantic Segmentation[J]. arXiv preprint arXiv:2004.02147, 2020. +> Yu, Changqian, Changxin Gao, Jingbo Wang, Gang Yu, Chunhua Shen, and Nong Sang. "BiSeNet V2: Bilateral Network with Guided Aggregation for Real-time Semantic Segmentation." arXiv preprint arXiv:2004.02147 (2020). ## Performance @@ -10,4 +10,4 @@ | Model | Backbone | Resolution | Training Iters | mIoU | mIoU (flip) | mIoU (ms+flip) | Links | |-|-|-|-|-|-|-|-| -|BiSeNetv2|-|1024x1024|160000|73.19%|74.19%|74.43%|[model](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/bisenet_cityscapes_1024x1024_160k/model.pdparams) \| [log](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/bisenet_cityscapes_1024x1024_160k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=3ccfaff613de769eadb76f8379afffa5)| +|BiSeNetv2|-|1024x1024|160000|73.19%|74.19%|74.43%|[model](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/bisenet_cityscapes_1024x1024_160k/model.pdparams) \| [log](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/bisenet_cityscapes_1024x1024_160k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=3ccfaff613de769eadb76f8379afffa5)| diff --git a/configs/danet/README.md b/configs/danet/README.md index 47b1a4d83e..8472d40217 100644 --- a/configs/danet/README.md +++ b/configs/danet/README.md @@ -2,7 +2,7 @@ ## Reference -> Fu J, Liu J, Tian H, et al. Dual attention network for scene segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019: 3146-3154. +> Fu, Jun, Jing Liu, Haijie Tian, Yong Li, Yongjun Bao, Zhiwei Fang, and Hanqing Lu. "Dual attention network for scene segmentation." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3146-3154. 2019. ## Performance @@ -10,10 +10,10 @@ | Model | Backbone | Resolution | Training Iters | mIoU | mIoU (flip) | mIoU (ms+flip) | Links | |-|-|-|-|-|-|-|-| -|DANet|ResNet50_OS8|1024x512|80000|80.27%|-|-|[model](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/danet_resnet50_os8_cityscapes_1024x512_80k/model.pdparams) \| [log](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/danet_resnet50_os8_cityscapes_1024x512_80k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=6caecf1222a0cc9124a376284a402cbe)| +|DANet|ResNet50_OS8|1024x512|80000|80.27%|80.53%|-|[model](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/danet_resnet50_os8_cityscapes_1024x512_80k/model.pdparams) \| [log](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/danet_resnet50_os8_cityscapes_1024x512_80k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=6caecf1222a0cc9124a376284a402cbe)| ### Pascal VOC 2012 + Aug | Model | Backbone | Resolution | Training Iters | mIoU | mIoU (flip) | mIoU (ms+flip) | Links | |-|-|-|-|-|-|-|-| -|DANet|ResNet50_OS8|1024x512|40000|78.55%|-|-|[model](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/danet_resnet50_os8_voc12aug_512x512_40k/model.pdparams) \| [log](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/danet_resnet50_os8_voc12aug_512x512_40k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=51a403a54302bc81dd5ec0310a6d50ba)| +|DANet|ResNet50_OS8|512x512|40000|78.55%|78.93%|79.68%|[model](https://bj.bcebos.com/paddleseg/dygraph/pascal_voc12/danet_resnet50_os8_voc12aug_512x512_40k/model.pdparams) \| [log](https://bj.bcebos.com/paddleseg/dygraph/pascal_voc12/danet_resnet50_os8_voc12aug_512x512_40k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=51a403a54302bc81dd5ec0310a6d50ba)| diff --git a/configs/danet/danet_resnet50_os8_voc12aug_512x512_40k.yml b/configs/danet/danet_resnet50_os8_voc12aug_512x512_40k.yml new file mode 100644 index 0000000000..05a119dc86 --- /dev/null +++ b/configs/danet/danet_resnet50_os8_voc12aug_512x512_40k.yml @@ -0,0 +1,17 @@ +_base_: '../_base_/pascal_voc12aug.yml' + +model: + type: DANet + backbone: + type: ResNet50_vd + output_stride: 8 + pretrained: https://bj.bcebos.com/paddleseg/dygraph/resnet50_vd_ssld_v2.tar.gz + backbone_indices: [2, 3] + +loss: + types: + - type: CrossEntropyLoss + - type: CrossEntropyLoss + - type: CrossEntropyLoss + - type: CrossEntropyLoss + coef: [1, 1, 1, 0.4] diff --git a/configs/dnlnet/README.md b/configs/dnlnet/README.md new file mode 100644 index 0000000000..162e4dfc08 --- /dev/null +++ b/configs/dnlnet/README.md @@ -0,0 +1,23 @@ +# Disentangled Non-Local Neural Networks + +## Reference + +> Minghao Yin, Zhuliang Yao, Yue Cao, Xiu Li, Zheng Zhang, Stephen Lin, Han Hu: +Disentangled Non-local Neural Networks. ECCV (15) 2020: 191-207. + +## Performance + +### Cityscapes + +| Model | Backbone | Resolution | Training Iters | mIoU | mIoU (flip) | mIoU (ms+flip) |Links | +|-|-|-|-|-|-|-|-| +|DNLNet|ResNet50_OS8|1024x512|80000|79.95%|80.43%|-|[model](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/dnlnet_resnet50_os8_cityscapes_1024x512_80k/model.pdparams) \| [log](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/dnlnet_resnet50_os8_cityscapes_1024x512_80k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=922cf0682c5e684507ab54a14ef12847)| +|DNLNet|ResNet101_OS8|1024x512|80000|81.03%|81.38%|-|[model](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/dnlnet_resnet101_os8_cityscapes_1024x512_80k/model.pdparams) \| [log](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/dnlnet_resnet101_os8_cityscapes_1024x512_80k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=3e0d13c4d9dbf4115bbba2abdc88122c)| + +### Pascal VOC 2012 + Aug + +| Model | Backbone | Resolution | Training Iters | mIoU | mIoU (flip) | mIoU (ms+flip) | Links | +|-|-|-|-|-|-|-|-| +|DNLNet|ResNet50_OS8|512x512|40000|80.89%|81.31%|81.56%|[model](https://paddleseg.bj.bcebos.com/dygraph/pascal_voc12/dnlnet_resnet50_os8_voc12aug_512x512_40k/model.pdparams) \| [log](https://paddleseg.bj.bcebos.com/dygraph/pascal_voc12/dnlnet_resnet50_os8_voc12aug_512x512_40k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=8877c77bef8b227af22c5eb3017138ce)| +|DNLNet|ResNet101_OS8|512x512|40000|80.49%|80.83%| 81.33%|[model](https://paddleseg.bj.bcebos.com/dygraph/pascal_voc12/dnlnet_resnet101_os8_voc12aug_512x512_40k/model.pdparams) \| [log](https://paddleseg.bj.bcebos.com/dygraph/pascal_voc12/dnlnet_resnet101_os8_voc12aug_512x512_40k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=1d42c22da1c465d9a38e4204bebeeb54)| + diff --git a/configs/dnlnet/dnlnet_resnet101_os8_cityscapes_1024x512_80k.yml b/configs/dnlnet/dnlnet_resnet101_os8_cityscapes_1024x512_80k.yml new file mode 100644 index 0000000000..b6fe983785 --- /dev/null +++ b/configs/dnlnet/dnlnet_resnet101_os8_cityscapes_1024x512_80k.yml @@ -0,0 +1,30 @@ +_base_: '../_base_/cityscapes.yml' + +batch_size: 2 +iters: 80000 + +model: + type: DNLNet + backbone: + type: ResNet101_vd + output_stride: 8 + pretrained: https://bj.bcebos.com/paddleseg/dygraph/resnet101_vd_ssld.tar.gz + num_classes: 19 + +optimizer: + type: sgd + momentum: 0.9 + weight_decay: 0.00004 + +learning_rate: + value: 0.01 + decay: + type: poly + power: 0.9 + + +loss: + types: + - type: CrossEntropyLoss + - type: CrossEntropyLoss + coef: [1, 0.4] diff --git a/configs/dnlnet/dnlnet_resnet101_os8_voc12aug_512x512_40k.yml b/configs/dnlnet/dnlnet_resnet101_os8_voc12aug_512x512_40k.yml new file mode 100644 index 0000000000..b0b11b7260 --- /dev/null +++ b/configs/dnlnet/dnlnet_resnet101_os8_voc12aug_512x512_40k.yml @@ -0,0 +1,25 @@ +_base_: '../_base_/pascal_voc12aug.yml' + +model: + type: DNLNet + backbone: + type: ResNet101_vd + output_stride: 8 + pretrained: https://bj.bcebos.com/paddleseg/dygraph/resnet101_vd_ssld.tar.gz + +optimizer: + type: sgd + momentum: 0.9 + weight_decay: 4.0e-05 + +learning_rate: + value: 0.01 + decay: + type: poly + power: 0.9 + +loss: + types: + - type: CrossEntropyLoss + - type: CrossEntropyLoss + coef: [1, 0.4] diff --git a/configs/dnlnet/dnlnet_resnet50_os8_cityscapes_1024x512_80k.yml b/configs/dnlnet/dnlnet_resnet50_os8_cityscapes_1024x512_80k.yml new file mode 100644 index 0000000000..ae6bd0f4b3 --- /dev/null +++ b/configs/dnlnet/dnlnet_resnet50_os8_cityscapes_1024x512_80k.yml @@ -0,0 +1,30 @@ +_base_: '../_base_/cityscapes.yml' + +batch_size: 2 +iters: 80000 + +model: + type: DNLNet + backbone: + type: ResNet50_vd + output_stride: 8 + pretrained: https://bj.bcebos.com/paddleseg/dygraph/resnet50_vd_ssld_v2.tar.gz + num_classes: 19 + +optimizer: + type: sgd + momentum: 0.9 + weight_decay: 0.00004 + +learning_rate: + value: 0.01 + decay: + type: poly + power: 0.9 + + +loss: + types: + - type: CrossEntropyLoss + - type: CrossEntropyLoss + coef: [1, 0.4] diff --git a/configs/dnlnet/dnlnet_resnet50_os8_voc12aug_512x512_40k.yml b/configs/dnlnet/dnlnet_resnet50_os8_voc12aug_512x512_40k.yml new file mode 100644 index 0000000000..ee8e802d17 --- /dev/null +++ b/configs/dnlnet/dnlnet_resnet50_os8_voc12aug_512x512_40k.yml @@ -0,0 +1,25 @@ +_base_: '../_base_/pascal_voc12aug.yml' + +model: + type: DNLNet + backbone: + type: ResNet50_vd + output_stride: 8 + pretrained: https://bj.bcebos.com/paddleseg/dygraph/resnet50_vd_ssld_v2.tar.gz + +optimizer: + type: sgd + momentum: 0.9 + weight_decay: 4.0e-05 + +learning_rate: + value: 0.01 + decay: + type: poly + power: 0.9 + +loss: + types: + - type: CrossEntropyLoss + - type: CrossEntropyLoss + coef: [1, 0.4] diff --git a/configs/emanet/README.md b/configs/emanet/README.md new file mode 100644 index 0000000000..47c469a94a --- /dev/null +++ b/configs/emanet/README.md @@ -0,0 +1,22 @@ +# Expectation-Maximization Attention Networks for Semantic Segmentation + +## Reference + +> Xia Li, Zhisheng Zhong, Jianlong Wu, Yibo Yang, Zhouchen Lin, Hong Liu: +Expectation-Maximization Attention Networks for Semantic Segmentation. ICCV 2019: 9166-9175. + +## Performance + +### Cityscapes + +| Model | Backbone | Resolution | Training Iters | mIoU | mIoU (flip) | mIoU (ms+flip) |Links | +|-|-|-|-|-|-|-|-| +|EMANet|ResNet50_OS8|1024x512|80000|77.64%|77.98%|78.23%|[model](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/emanet_resnet50_os8_cityscapes_1024x512_80k/model.pdparams) \| [log](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/emanet_resnet50_os8_cityscapes_1024x512_80k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=3e053a214d60822d6e65445b8614d052)| +|EMANet|ResNet101_OS8|1024x512|80000|79.41%|79.83%|80.33%|[model](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/emanet_resnet101_os8_cityscapes_1024x512_80k/model.pdparams) \| [log](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/emanet_resnet101_os8_cityscapes_1024x512_80k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=87be6389cdada711f5c6ada21d9ef6cd)| + +### Pascal VOC 2012 + Aug + +| Model | Backbone | Resolution | Training Iters | mIoU | mIoU (flip) | mIoU (ms+flip) | Links | +|-|-|-|-|-|-|-|-| +|EMANet|ResNet50_OS8|512x512|40000|78.60%|78.90%|79.17%|[model](https://bj.bcebos.com/paddleseg/dygraph/pascal_voc12/emanet_resnet50_os8_voc12aug_512x512_40k/model.pdparams) \| [log](https://bj.bcebos.com/paddleseg/dygraph/pascal_voc12/emanet_resnet50_os8_voc12aug_512x512_40k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=3e60b80b984a71f3d2b83b8a746a819c)| +|EMANet|ResNet101_OS8|512x512|40000|79.47%|79.97%| 80.67%|[model](https://bj.bcebos.com/paddleseg/dygraph/pascal_voc12/emanet_resnet101_os8_voc12aug_512x512_40k/model.pdparams) \| [log](https://bj.bcebos.com/paddleseg/dygraph/pascal_voc12/emanet_resnet101_os8_voc12aug_512x512_40k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=f33479772409766dbc40b5f031cbdb1a)| diff --git a/configs/emanet/emanet_resnet101_os8_cityscapes_1024x512_80k.yml b/configs/emanet/emanet_resnet101_os8_cityscapes_1024x512_80k.yml new file mode 100644 index 0000000000..c357b6dbcd --- /dev/null +++ b/configs/emanet/emanet_resnet101_os8_cityscapes_1024x512_80k.yml @@ -0,0 +1,31 @@ +_base_: '../_base_/cityscapes.yml' + +batch_size: 2 +iters: 80000 + +model: + type: EMANet + backbone: + type: ResNet101_vd + output_stride: 8 + pretrained: https://bj.bcebos.com/paddleseg/dygraph/resnet101_vd_ssld.tar.gz + num_classes: 19 + ema_channels: 512 + gc_channels: 256 + num_bases: 64 + stage_num: 3 + momentum: 0.1 + concat_input: True + enable_auxiliary_loss: True + align_corners: False + +optimizer: + type: sgd + momentum: 0.9 + weight_decay: 0.0005 + +loss: + types: + - type: CrossEntropyLoss + - type: CrossEntropyLoss + coef: [1, 0.4] \ No newline at end of file diff --git a/configs/emanet/emanet_resnet101_os8_voc12aug_512x512_40k.yml b/configs/emanet/emanet_resnet101_os8_voc12aug_512x512_40k.yml new file mode 100644 index 0000000000..a14f63962d --- /dev/null +++ b/configs/emanet/emanet_resnet101_os8_voc12aug_512x512_40k.yml @@ -0,0 +1,28 @@ +_base_: '../_base_/pascal_voc12aug.yml' + +model: + type: EMANet + backbone: + type: ResNet101_vd + output_stride: 8 + pretrained: https://bj.bcebos.com/paddleseg/dygraph/resnet101_vd_ssld.tar.gz + ema_channels: 512 + gc_channels: 256 + num_bases: 64 + stage_num: 3 + momentum: 0.1 + concat_input: True + enable_auxiliary_loss: True + align_corners: True + +optimizer: + type: sgd + momentum: 0.9 + weight_decay: 0.0005 + + +loss: + types: + - type: CrossEntropyLoss + - type: CrossEntropyLoss + coef: [1, 0.4] \ No newline at end of file diff --git a/configs/emanet/emanet_resnet50_os8_cityscapes_1024x512_80k.yml b/configs/emanet/emanet_resnet50_os8_cityscapes_1024x512_80k.yml new file mode 100644 index 0000000000..0230ab44f1 --- /dev/null +++ b/configs/emanet/emanet_resnet50_os8_cityscapes_1024x512_80k.yml @@ -0,0 +1,32 @@ +_base_: '../_base_/cityscapes.yml' + +batch_size: 2 +iters: 80000 + +model: + type: EMANet + backbone: + type: ResNet50_vd + output_stride: 8 + pretrained: https://bj.bcebos.com/paddleseg/dygraph/resnet50_vd_ssld_v2.tar.gz + num_classes: 19 + ema_channels: 512 + gc_channels: 256 + num_bases: 64 + stage_num: 3 + momentum: 0.1 + concat_input: True + enable_auxiliary_loss: True + align_corners: False + +optimizer: + type: sgd + momentum: 0.9 + weight_decay: 0.0005 + + +loss: + types: + - type: CrossEntropyLoss + - type: CrossEntropyLoss + coef: [1, 0.4] \ No newline at end of file diff --git a/configs/emanet/emanet_resnet50_os8_voc12aug_512x512_40k.yml b/configs/emanet/emanet_resnet50_os8_voc12aug_512x512_40k.yml new file mode 100644 index 0000000000..9644881dcd --- /dev/null +++ b/configs/emanet/emanet_resnet50_os8_voc12aug_512x512_40k.yml @@ -0,0 +1,30 @@ +_base_: '../_base_/pascal_voc12aug.yml' + + +model: + type: EMANet + backbone: + type: ResNet50_vd + output_stride: 8 + pretrained: https://bj.bcebos.com/paddleseg/dygraph/resnet50_vd_ssld_v2.tar.gz + ema_channels: 512 + gc_channels: 256 + num_bases: 64 + stage_num: 3 + momentum: 0.1 + concat_input: True + enable_auxiliary_loss: True + align_corners: True + +optimizer: + type: sgd + momentum: 0.9 + weight_decay: 0.0005 + + +loss: + types: + - type: CrossEntropyLoss + - type: CrossEntropyLoss + coef: [1, 0.4] + \ No newline at end of file diff --git a/configs/fcn/README.md b/configs/fcn/README.md index 8bb9786c53..87fe6e4ee0 100644 --- a/configs/fcn/README.md +++ b/configs/fcn/README.md @@ -1,7 +1,7 @@ # Deep High-Resolution Representation Learning for Visual Recognition ## Reference -> Wang J, Sun K, Cheng T, et al. Deep high-resolution representation learning for visual recognition[J]. IEEE transactions on pattern analysis and machine intelligence, 2020. +> Wang, Jingdong, Ke Sun, Tianheng Cheng, Borui Jiang, Chaorui Deng, Yang Zhao, Dong Liu et al. "Deep high-resolution representation learning for visual recognition." IEEE transactions on pattern analysis and machine intelligence (2020). ## Performance diff --git a/configs/gscnn/README.md b/configs/gscnn/README.md index 81e99c7c96..bb144b95b6 100644 --- a/configs/gscnn/README.md +++ b/configs/gscnn/README.md @@ -2,7 +2,7 @@ ## Reference -> Takikawa T, Acuna D, Jampani V, et al. Gated-scnn: Gated shape cnns for semantic segmentation[C]//Proceedings of the IEEE International Conference on Computer Vision. 2019: 5229-5238. +> Takikawa, Towaki, David Acuna, Varun Jampani, and Sanja Fidler. "Gated-scnn: Gated shape cnns for semantic segmentation." In Proceedings of the IEEE International Conference on Computer Vision, pp. 5229-5238. 2019. ## Performance diff --git a/configs/isanet/README.md b/configs/isanet/README.md new file mode 100644 index 0000000000..3e72f66d64 --- /dev/null +++ b/configs/isanet/README.md @@ -0,0 +1,21 @@ +# Interlaced Sparse Self-Attention for Semantic Segmentation + +## Reference + +> Lang Huang, Yuhui Yuan, Jianyuan Guo, Chao Zhang, Xilin Chen, Jingdong Wang: Interlaced Sparse Self-Attention for Semantic Segmentation. CoRR abs/1907.12273 (2019). + +## Performance + +### Cityscapes + +| Model | Backbone | Resolution | Training Iters | mIoU | mIoU (flip) | mIoU (ms+flip) | Links | +|-|-|-|-|-|-|-|-| +|ISANet|ResNet50_OS8|769x769|80000|79.03%|79.43%|79.52%|[model](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/isanet_resnet50_os8_cityscapes_769x769_80k/model.pdparams) \| [log](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/isanet_resnet50_os8_cityscapes_769x769_80k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=ab7cc0627fdbf1e210557c33d94d2e8c)| +|ISANet|ResNet101_OS8|769x769|80000|80.10%|80.30%|80.26%|[model](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/isanet_resnet101_os8_cityscapes_769x769_80k/model.pdparams) \| [log](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/isanet_resnet101_os8_cityscapes_769x769_80k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=76366b80293c3ac2374d981b4573eb52)| + +### Pascal VOC 2012 + Aug + +| Model | Backbone | Resolution | Training Iters | mIoU | mIoU (flip) | mIoU (ms+flip) |Links | +|-|-|-|-|-|-|-|-| +|ISANet|ResNet50_OS8|512x512|40000|79.69%|79.93%|80.53%|[model](https://bj.bcebos.com/paddleseg/dygraph/pascal_voc12/isanet_resnet50_os8_voc12aug_512x512_40k/model.pdparams) \| [log](https://bj.bcebos.com/paddleseg/dygraph/pascal_voc12/isanet_resnet50_os8_voc12aug_512x512_40k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=84af8df983e48f1a0c89154a26f55032)| +|ISANet|ResNet101_OS8|512x512|40000|79.57%|79.69%|80.01%|[model](https://bj.bcebos.com/paddleseg/dygraph/pascal_voc12/emanet_resnet101_os8_voc12aug_512x512_40k/model.pdparams) \| [log](https://bj.bcebos.com/paddleseg/dygraph/pascal_voc12/emanet_resnet101_os8_voc12aug_512x512_40k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=6874531f0adbfc72f22fb816bb231a46)| diff --git a/configs/isanet/isanet_resnet101_os8_cityscapes_769x769_80k.yml b/configs/isanet/isanet_resnet101_os8_cityscapes_769x769_80k.yml new file mode 100644 index 0000000000..0c135845cb --- /dev/null +++ b/configs/isanet/isanet_resnet101_os8_cityscapes_769x769_80k.yml @@ -0,0 +1,30 @@ +_base_: '../_base_/cityscapes_769x769.yml' + +batch_size: 2 +iters: 80000 + +model: + type: ISANet + isa_channels: 256 + backbone: + type: ResNet101_vd + output_stride: 8 + pretrained: https://bj.bcebos.com/paddleseg/dygraph/resnet101_vd_ssld.tar.gz + num_classes: 19 + +optimizer: + type: sgd + momentum: 0.9 + weight_decay: 0.00001 + +learning_rate: + value: 0.01 + decay: + type: poly + power: 0.9 + +loss: + types: + - type: CrossEntropyLoss + - type: CrossEntropyLoss + coef: [1, 0.4] \ No newline at end of file diff --git a/configs/isanet/isanet_resnet101_os8_voc12aug_512x512_40k.yml b/configs/isanet/isanet_resnet101_os8_voc12aug_512x512_40k.yml new file mode 100644 index 0000000000..35b6ca8def --- /dev/null +++ b/configs/isanet/isanet_resnet101_os8_voc12aug_512x512_40k.yml @@ -0,0 +1,27 @@ +_base_: '../_base_/pascal_voc12aug.yml' + +model: + type: ISANet + isa_channels: 256 + backbone: + type: ResNet101_vd + output_stride: 8 + pretrained: https://bj.bcebos.com/paddleseg/dygraph/resnet101_vd_ssld.tar.gz + align_corners: True + +optimizer: + type: sgd + momentum: 0.9 + weight_decay: 4.0e-05 + +learning_rate: + value: 0.01 + decay: + type: poly + power: 0.9 + +loss: + types: + - type: CrossEntropyLoss + - type: CrossEntropyLoss + coef: [1, 0.4] diff --git a/configs/isanet/isanet_resnet50_os8_cityscapes_769x769_80k.yml b/configs/isanet/isanet_resnet50_os8_cityscapes_769x769_80k.yml new file mode 100644 index 0000000000..dbb0eba71e --- /dev/null +++ b/configs/isanet/isanet_resnet50_os8_cityscapes_769x769_80k.yml @@ -0,0 +1,31 @@ +_base_: '../_base_/cityscapes_769x769.yml' + +batch_size: 2 +iters: 80000 + +model: + type: ISANet + isa_channels: 256 + backbone: + type: ResNet50_vd + output_stride: 8 + pretrained: https://bj.bcebos.com/paddleseg/dygraph/resnet50_vd_ssld_v2.tar.gz + num_classes: 19 + +optimizer: + type: sgd + momentum: 0.9 + weight_decay: 0.00001 + +learning_rate: + value: 0.01 + decay: + type: poly + power: 0.9 + + +loss: + types: + - type: CrossEntropyLoss + - type: CrossEntropyLoss + coef: [1, 0.4] \ No newline at end of file diff --git a/configs/isanet/isanet_resnet50_os8_voc12aug_512x512_40k.yml b/configs/isanet/isanet_resnet50_os8_voc12aug_512x512_40k.yml new file mode 100644 index 0000000000..d0d0672ec4 --- /dev/null +++ b/configs/isanet/isanet_resnet50_os8_voc12aug_512x512_40k.yml @@ -0,0 +1,27 @@ +_base_: '../_base_/pascal_voc12aug.yml' + +model: + type: ISANet + isa_channels: 256 + backbone: + type: ResNet50_vd + output_stride: 8 + pretrained: https://bj.bcebos.com/paddleseg/dygraph/resnet50_vd_ssld_v2.tar.gz + align_corners: True + +optimizer: + type: sgd + momentum: 0.9 + weight_decay: 0.00001 + +learning_rate: + value: 0.01 + decay: + type: poly + power: 0.9 + +loss: + types: + - type: CrossEntropyLoss + - type: CrossEntropyLoss + coef: [1, 0.4] diff --git a/configs/ocrnet/README.md b/configs/ocrnet/README.md index 9f8fd6b597..b82820c68b 100644 --- a/configs/ocrnet/README.md +++ b/configs/ocrnet/README.md @@ -2,7 +2,7 @@ ## Reference -> Yuan Y, Chen X, Wang J. Object-contextual representations for semantic segmentation[J]. arXiv preprint arXiv:1909.11065, 2019. +> Yuan, Yuhui, Xilin Chen, and Jingdong Wang. "Object-contextual representations for semantic segmentation." arXiv preprint arXiv:1909.11065 (2019). ## Performance @@ -10,12 +10,12 @@ | Model | Backbone | Resolution | Training Iters | mIoU | mIoU (flip) | mIoU (ms+flip) | Links | |:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:| -|OCRNet|HRNet_w18|1024x512|160000|80.67%|81.21%|81.30%|[model](https://paddleseg.bj.bcebos.com/dygraph/ocrnet_hrnetw18_cityscapes_1024x512_160k/model.pdparams) \| [log](https://paddleseg.bj.bcebos.com/dygraph/ocrnet_hrnetw18_cityscapes_1024x512_160k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=901a5d0a78b71ca56f06002f05547837)| -|OCRNet|HRNet_w48|1024x512|160000|82.15%|82.59%|82.85%|[model](https://paddleseg.bj.bcebos.com/dygraph/ocrnet_hrnetw48_cityscapes_1024x512_160k/model.pdparams) \| [log](https://paddleseg.bj.bcebos.com/dygraph/ocrnet_hrnetw48_cityscapes_1024x512_160k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=176bf6ca4d89957ffe62ac7c30fcd039) | +|OCRNet|HRNet_w18|1024x512|160000|80.67%|81.21%|81.30%|[model](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/ocrnet_hrnetw18_cityscapes_1024x512_160k/model.pdparams) \| [log](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/ocrnet_hrnetw18_cityscapes_1024x512_160k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=901a5d0a78b71ca56f06002f05547837)| +|OCRNet|HRNet_w48|1024x512|160000|82.15%|82.59%|82.85%|[model](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/ocrnet_hrnetw48_cityscapes_1024x512_160k/model.pdparams) \| [log](https://bj.bcebos.com/paddleseg/dygraph/cityscapes/ocrnet_hrnetw48_cityscapes_1024x512_160k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=176bf6ca4d89957ffe62ac7c30fcd039) | ### Pascal VOC 2012 + Aug | Model | Backbone | Resolution | Training Iters | mIoU | mIoU (flip) | mIoU (ms+flip) | Links | |:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:| -|OCRNet|HRNet_w18|1024x512|40000|75.76%|76.39%|77.95%|[model](https://bj.bcebos.com/paddleseg/dygraph/pascal_voc12/ocrnet_hrnetw18_voc12aug_512x512_40k/model.pdparams) \| [log](https://bj.bcebos.com/paddleseg/dygraph/pascal_voc12/ocrnet_hrnetw18_voc12aug_512x512_40k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=74707b83bc14b7d236146ac4ceaf6c9c)| -|OCRNet|HRNet_w48|1024x512|40000|79.98%|80.47%|81.02%|[model](https://bj.bcebos.com/paddleseg/dygraph/pascal_voc12/ocrnet_hrnetw48_voc12aug_512x512_40k/model.pdparams) \| [log](https://bj.bcebos.com/paddleseg/dygraph/pascal_voc12/ocrnet_hrnetw48_voc12aug_512x512_40k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=8f695743c799f8966a72973f3259fad4) | +|OCRNet|HRNet_w18|512x512|40000|75.76%|76.39%|77.95%|[model](https://bj.bcebos.com/paddleseg/dygraph/pascal_voc12/ocrnet_hrnetw18_voc12aug_512x512_40k/model.pdparams) \| [log](https://bj.bcebos.com/paddleseg/dygraph/pascal_voc12/ocrnet_hrnetw18_voc12aug_512x512_40k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=74707b83bc14b7d236146ac4ceaf6c9c)| +|OCRNet|HRNet_w48|512x512|40000|79.98%|80.47%|81.02%|[model](https://bj.bcebos.com/paddleseg/dygraph/pascal_voc12/ocrnet_hrnetw48_voc12aug_512x512_40k/model.pdparams) \| [log](https://bj.bcebos.com/paddleseg/dygraph/pascal_voc12/ocrnet_hrnetw48_voc12aug_512x512_40k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=8f695743c799f8966a72973f3259fad4) | diff --git a/configs/ocrnet/ocrnet_hrnetw18_voc12aug_512x512_40k.yml b/configs/ocrnet/ocrnet_hrnetw18_voc12aug_512x512_40k.yml index f625bcc028..3e6739c839 100644 --- a/configs/ocrnet/ocrnet_hrnetw18_voc12aug_512x512_40k.yml +++ b/configs/ocrnet/ocrnet_hrnetw18_voc12aug_512x512_40k.yml @@ -5,7 +5,6 @@ model: backbone: type: HRNet_W18 pretrained: https://bj.bcebos.com/paddleseg/dygraph/hrnet_w18_ssld.tar.gz - num_classes: 19 backbone_indices: [0] optimizer: diff --git a/configs/pspnet/README.md b/configs/pspnet/README.md index bdb54f298e..a48415eeee 100644 --- a/configs/pspnet/README.md +++ b/configs/pspnet/README.md @@ -18,4 +18,4 @@ | Model | Backbone | Resolution | Training Iters | mIoU | mIoU (flip) | mIoU (ms+flip) | Links | |:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:| |PSPNet|ResNet50_OS8|512x512|40000|80.76%|80.92%|80.91%|[model](https://bj.bcebos.com/paddleseg/dygraph/pascal_voc12/pspnet_resnet50_os8_voc12aug_512x512_40k/model.pdparams) \| [log](https://bj.bcebos.com/paddleseg/dygraph/pascal_voc12/pspnet_resnet50_os8_voc12aug_512x512_40k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=d94fca382566d823dd23a84d380fe0af)| -|PSPNet|ResNet101_OS8|512x512|40000|80.22%|80.48%|80.36%|[model](https://bj.bcebos.com/paddleseg/dygraph/voc12aug/pspnet_resnet101_os8_voc12aug_512x512_40k/model.pdparams) \| [log](https://bj.bcebos.com/paddleseg/dygraph/voc12aug/pspnet_resnet101_os8_voc12aug_512x512_40k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=5fe5012cf0bd58a3574c95e0fc79306b)| +|PSPNet|ResNet101_OS8|512x512|40000|80.22%|80.48%|80.36%|[model](https://bj.bcebos.com/paddleseg/dygraph/pascal_voc12/pspnet_resnet101_os8_voc12aug_512x512_40k/model.pdparams) \| [log](https://bj.bcebos.com/paddleseg/dygraph/pascal_voc12/pspnet_resnet101_os8_voc12aug_512x512_40k/train.log) \| [vdl](https://paddlepaddle.org.cn/paddle/visualdl/service/app?id=5fe5012cf0bd58a3574c95e0fc79306b)| diff --git a/configs/quick_start/bisenet_optic_disc_512x512_1k.yml b/configs/quick_start/bisenet_optic_disc_512x512_1k.yml index d04b1056e1..181bdf0941 100644 --- a/configs/quick_start/bisenet_optic_disc_512x512_1k.yml +++ b/configs/quick_start/bisenet_optic_disc_512x512_1k.yml @@ -39,5 +39,4 @@ loss: model: type: BiSeNetV2 - num_classes: 2 pretrained: Null diff --git a/docs/apis/models.md b/docs/apis/models.md index 67e1eb1f0f..0776191736 100644 --- a/docs/apis/models.md +++ b/docs/apis/models.md @@ -19,6 +19,10 @@ The models subpackage contains the following model for image sementic segmentaio - [AttentionUNet](#AttentionUNet) - [UNet++](#UNet-1) - [DecoupledSegNet](#DecoupledSegNet) +- [ISANet](#ISANet) +- [EMANet](#EMANet) +- [DNLNet](#DNLNet) + ## [DeepLabV3+](../../paddleseg/models/deeplab.py) > CLASS paddleseg.models.DeepLabV3P(num_classes, backbone, backbone_indices=(0, 3), aspp_ratios=(1, 6, 12, 18), aspp_out_channels=256, align_corners=False, pretrained=None) @@ -432,3 +436,70 @@ The models subpackage contains the following model for image sementic segmentaio > > > - **align_corners** (bool, optional): An argument of F.interpolate. It should be set to False when the feature size is even, e.g. 1024x512, otherwise it is True, e.g. 769x769. Default: False. > > > - **pretrained** (str, optional): The path or url of pretrained model. Default: None. + +## [ISANet](../../paddleseg/models/isanet.py) +> CLASS paddleseg.models.ISANet(num_classes, backbone, backbone_indices=(2, 3), isa_channels=256, down_factor=(8, 8), enable_auxiliary_loss=True, align_corners=False, pretrained=None) + + The ISANet implementation based on PaddlePaddle. + + The original article refers to Lang Huang, et al. "Interlaced Sparse Self-Attention for Semantic Segmentation" + (https://arxiv.org/abs/1907.12273). + +> > Args +> > > - **num_classes** (int): The unique number of target classes. +> > > - **backbone** (Paddle.nn.Layer): A backbone network. +> > > - **backbone_indices** (tuple): The values in the tuple indicate the indices of output of backbone. +> > > - **isa_channels** (int): The channels of ISA Module. +> > > - **down_factor** (tuple): Divide the height and width dimension to (Ph, PW) groups. +> > > - **enable_auxiliary_loss** (bool, optional): A bool value indicates whether adding auxiliary loss. Default: True. +> > > - **align_corners** (bool): An argument of F.interpolate. It should be set to False when the output size of feature + is even, e.g. 1024x512, otherwise it is True, e.g. 769x769. Default: False. +> > > - **pretrained** (str, optional): The path or url of pretrained model. Default: None. + +## [EMANet](../../paddleseg/models/emanet.py) +> CLASS paddleseg.models.EMANet(num_classes, backbone, backbone_indices=(2, 3), ema_channels=512, gc_channels=256, num_bases=64, stage_num=3, momentum=0.1, concat_input=True, enable_auxiliary_loss=True, align_corners=False, pretrained=None) + + The EMANet implementation based on PaddlePaddle. + + The original article refers to + Xia Li, et al. "Expectation-Maximization Attention Networks for Semantic Segmentation" + (https://arxiv.org/abs/1907.13426) + +> > Args +> > > - **num_classes** (int): The unique number of target classes. +> > > - **backbone** (Paddle.nn.Layer): A backbone network. +> > > - **backbone_indices** (tuple): The values in the tuple indicate the indices of output of backbone. +> > > - **ema_channels** (int): EMA module channels. +> > > - **gc_channels** (int): The input channels to Global Context Block. +> > > - **num_bases** (int): Number of bases. +> > > - **stage_num** (int): The iteration number for EM. +> > > - **momentum** (float): The parameter for updating bases. +> > > - **concat_input** (bool): Whether concat the input and output of convs before classification layer. Default: True +> > > - **enable_auxiliary_loss** (bool, optional): A bool value indicates whether adding auxiliary loss. Default: True. +> > > - **align_corners** (bool): An argument of F.interpolate. It should be set to False when the output size of feature + is even, e.g. 1024x512, otherwise it is True, e.g. 769x769. Default: False. +> > > - **pretrained** (str, optional): The path or url of pretrained model. Default: None. + +## [DNLNet](../../paddleseg/models/dnlnet.py) +> CLASS paddleseg.models.DNLNet(num_classes, backbone, backbone_indices=(2, 3), reduction=2, use_scale=True, mode='embedded_gaussian', temperature=0.05, concat_input=True, enable_auxiliary_loss=True, align_corners=False, pretrained=None) + + The DNLNet implementation based on PaddlePaddle. + + The original article refers to + Minghao Yin, et al. "Disentangled Non-Local Neural Networks" + (https://arxiv.org/abs/2006.06668) + +> > Args +> > > - **num_classes** (int): The unique number of target classes. +> > > - **backbone** (Paddle.nn.Layer): A backbone network. +> > > - **backbone_indices** (tuple): The values in the tuple indicate the indices of output of backbone. +> > > - **reduction** (int): Reduction factor of projection transform. Default: 2. +> > > - **use_scale** (bool): Whether to scale pairwise_weight by sqrt(1/inter_channels). Default: False. +> > > - **mode** (str): The nonlocal mode. Options are 'embedded_gaussian', + 'dot_product'. Default: 'embedded_gaussian'. +> > > - **temperature** (float): Temperature to adjust attention. Default: 0.05. +> > > - **concat_input** (bool): Whether concat the input and output of convs before classification layer. Default: True +> > > - **enable_auxiliary_loss** (bool, optional): A bool value indicates whether adding auxiliary loss. Default: True. +> > > - **align_corners** (bool): An argument of F.interpolate. It should be set to False when the output size of feature + is even, e.g. 1024x512, otherwise it is True, e.g. 769x769. Default: False. +> > > - **pretrained** (str, optional): The path or url of pretrained model. Default: None. diff --git a/docs/data_prepare.md b/docs/data_prepare.md index cfba4e3481..b07a548194 100644 --- a/docs/data_prepare.md +++ b/docs/data_prepare.md @@ -1,100 +1,142 @@ -# 数据集准备 - -PaddleSeg目前支持CityScapes、ADE20K、Pascal VOC等数据集的加载,在加载数据集时,如若本地不存在对应数据,则会自动触发下载(除Cityscapes数据集). - -## 关于CityScapes数据集 -Cityscapes是关于城市街道场景的语义理解图片数据集。它主要包含来自50个不同城市的街道场景, -拥有5000张(2048 x 1024)城市驾驶场景的高质量像素级注释图像,包含19个类别。其中训练集2975张, 验证集500张和测试集1525张。 - -由于协议限制,请自行前往[CityScapes官网](https://www.cityscapes-dataset.com/)下载数据集, -我们建议您将数据集存放于`PaddleSeg/data`中,以便与我们配置文件完全兼容。数据集下载后请组织成如下结构: - - cityscapes - | - |--leftImg8bit - | |--train - | |--val - | |--test - | - |--gtFine - | |--train - | |--val - | |--test - -运行下列命令进行标签转换: -```shell -pip install cityscapesscripts -python tools/convert_cityscapes.py --cityscapes_path data/cityscapes --num_workers 8 -``` -其中`cityscapes_path`应根据实际数据集路径进行调整。 `num_workers`决定启动的进程数,可根据实际情况进行调整大小。 - -## 关于Pascal VOC 2012数据集 -[Pascal VOC 2012](http://host.robots.ox.ac.uk/pascal/VOC/)数据集以对象分割为主,包含20个类别和背景类,其中训练集1464张,验证集1449张。 -通常情况下会利用[SBD(Semantic Boundaries Dataset)](http://home.bharathh.info/pubs/codes/SBD/download.html)进行扩充,扩充后训练集10582张。 -运行下列命令进行SBD数据集下载并进行扩充: -```shell -python tools/voc_augment.py --voc_path data/VOCdevkit --num_workers 8 -``` -其中`voc_path`应根据实际数据集路径进行调整。 - -**注意** 运行前请确保在PaddleSeg目录下执行过下列命令: -```shell -export PYTHONPATH=`pwd` -# windows下请执行相面的命令 -# set PYTHONPATH=%cd% -``` - -## 关于ADE20K数据集 -[ADE20K](http://sceneparsing.csail.mit.edu/)由MIT发布的可用于场景感知、分割和多物体识别等多种任务的数据集。 -其涵盖了150个语义类别,包括训练集20210张,验证集2000张。 - -## 自定义数据集 - -如果您需要使用自定义数据集进行训练,请按照以下步骤准备数据. - -1.推荐整理成如下结构 - - custom_dataset - | - |--images - | |--image1.jpg - | |--image2.jpg - | |--... - | - |--labels - | |--label1.jpg - | |--label2.png - | |--... - | - |--train.txt - | - |--val.txt - | - |--test.txt - -其中train.txt和val.txt的内容如下所示: - - images/image1.jpg labels/label1.png - images/image2.jpg labels/label2.png - ... - -2.标注图像的标签从0,1依次取值,不可间隔。若有需要忽略的像素,则按255进行标注。 - -可按如下方式对自定义数据集进行配置: -```yaml -train_dataset: - type: Dataset - dataset_root: custom_dataset - train_path: custom_dataset/train.txt - num_classes: 2 - transforms: - - type: ResizeStepScaling - min_scale_factor: 0.5 - max_scale_factor: 2.0 - scale_step_size: 0.25 - - type: RandomPaddingCrop - crop_size: [512, 512] - - type: RandomHorizontalFlip - - type: Normalize - mode: train -``` +# 数据集准备 + +PaddleSeg目前支持CityScapes、ADE20K、Pascal VOC等数据集的加载,在加载数据集时,如若本地不存在对应数据,则会自动触发下载(除Cityscapes数据集). + +## 关于CityScapes数据集 +Cityscapes是关于城市街道场景的语义理解图片数据集。它主要包含来自50个不同城市的街道场景, +拥有5000张(2048 x 1024)城市驾驶场景的高质量像素级注释图像,包含19个类别。其中训练集2975张, 验证集500张和测试集1525张。 + +由于协议限制,请自行前往[CityScapes官网](https://www.cityscapes-dataset.com/)下载数据集, +我们建议您将数据集存放于`PaddleSeg/data`中,以便与我们配置文件完全兼容。数据集下载后请组织成如下结构: + + cityscapes + | + |--leftImg8bit + | |--train + | |--val + | |--test + | + |--gtFine + | |--train + | |--val + | |--test + +运行下列命令进行标签转换: +```shell +pip install cityscapesscripts +python tools/convert_cityscapes.py --cityscapes_path data/cityscapes --num_workers 8 +``` +其中`cityscapes_path`应根据实际数据集路径进行调整。 `num_workers`决定启动的进程数,可根据实际情况进行调整大小。 + +## 关于Pascal VOC 2012数据集 +[Pascal VOC 2012](http://host.robots.ox.ac.uk/pascal/VOC/)数据集以对象分割为主,包含20个类别和背景类,其中训练集1464张,验证集1449张。 +通常情况下会利用[SBD(Semantic Boundaries Dataset)](http://home.bharathh.info/pubs/codes/SBD/download.html)进行扩充,扩充后训练集10582张。 +运行下列命令进行SBD数据集下载并进行扩充: +```shell +python tools/voc_augment.py --voc_path data/VOCdevkit --num_workers 8 +``` +其中`voc_path`应根据实际数据集路径进行调整。 + +**注意** 运行前请确保在PaddleSeg目录下执行过下列命令: +```shell +export PYTHONPATH=`pwd` +# windows下请执行相面的命令 +# set PYTHONPATH=%cd% +``` + +## 关于ADE20K数据集 +[ADE20K](http://sceneparsing.csail.mit.edu/)由MIT发布的可用于场景感知、分割和多物体识别等多种任务的数据集。 +其涵盖了150个语义类别,包括训练集20210张,验证集2000张。 + +## 关于Coco Stuff数据集 +Coco Stuff是基于Coco数据集的像素级别语义分割数据集。它主要覆盖172个类别,包含80个'thing',91个'stuff'和1个'unlabeled', +其中训练集118k, 验证集5k. + +在使用Coco Stuff数据集前, 请自行前往[COCO-Stuff主页](https://github.com/nightrome/cocostuff)下载数据集,或者下载[coco2017训练集原图](http://images.cocodataset.org/zips/train2017.zip), [coco2017验证集原图](http://images.cocodataset.org/zips/val2017.zip)及[标注图](http://calvin.inf.ed.ac.uk/wp-content/uploads/data/cocostuffdataset/stuffthingmaps_trainval2017.zip) +我们建议您将数据集存放于`PaddleSeg/data`中,以便与我们配置文件完全兼容。数据集下载后请组织成如下结构: + + cocostuff + | + |--images + | |--train2017 + | |--val2017 + | + |--annotations + | |--train2017 + | |--val2017 + +其中,标注图像的标签从0,1依次取值,不可间隔。若有需要忽略的像素,则按255进行标注。 + +## 关于Pascal Context数据集 +Pascal Context是基于PASCAL VOC 2010数据集额外标注的像素级别的语义分割数据集。我们提供的转换脚本支持59个类别,其中训练集4996, 验证集5104张. + + +在使用Pascal Context数据集前, 请先下载[VOC2010](http://host.robots.ox.ac.uk/pascal/VOC/voc2010/VOCtrainval_03-May-2010.tar),随后自行前往[Pascal-Context主页](https://www.cs.stanford.edu/~roozbeh/pascal-context/)下载数据集及[标注](https://codalabuser.blob.core.windows.net/public/trainval_merged.json) +我们建议您将数据集存放于`PaddleSeg/data`中,以便与我们配置文件完全兼容。数据集下载后请组织成如下结构: + + VOC2010 + | + |--Annotations + | + |--ImageSets + | + |--SegmentationClass + | + |--JPEGImages + | + |--SegmentationObject + | + |--trainval_merged.json + +其中,标注图像的标签从1,2依次取值,不可间隔。若有需要忽略的像素,则按0进行标注。在使用Pascal Context数据集时,需要安装[Detail](https://github.com/zhanghang1989/detail-api). + +## 自定义数据集 + +如果您需要使用自定义数据集进行训练,请按照以下步骤准备数据. + +1.推荐整理成如下结构 + + custom_dataset + | + |--images + | |--image1.jpg + | |--image2.jpg + | |--... + | + |--labels + | |--label1.jpg + | |--label2.png + | |--... + | + |--train.txt + | + |--val.txt + | + |--test.txt + +其中train.txt和val.txt的内容如下所示: + + images/image1.jpg labels/label1.png + images/image2.jpg labels/label2.png + ... + +2.标注图像的标签从0,1依次取值,不可间隔。若有需要忽略的像素,则按255进行标注。 + +可按如下方式对自定义数据集进行配置: +```yaml +train_dataset: + type: Dataset + dataset_root: custom_dataset + train_path: custom_dataset/train.txt + num_classes: 2 + transforms: + - type: ResizeStepScaling + min_scale_factor: 0.5 + max_scale_factor: 2.0 + scale_step_size: 0.25 + - type: RandomPaddingCrop + crop_size: [512, 512] + - type: RandomHorizontalFlip + - type: Normalize + mode: train +``` diff --git a/docs/images/seg_news_icon.png b/docs/images/seg_news_icon.png new file mode 100644 index 0000000000..30ec26f5fc Binary files /dev/null and b/docs/images/seg_news_icon.png differ diff --git a/docs/release_notes.md b/docs/release_notes.md new file mode 100644 index 0000000000..dfb01a43f2 --- /dev/null +++ b/docs/release_notes.md @@ -0,0 +1,88 @@ +English | [简体中文](release_notes_cn.md) + +## Release Notes + +* 2020.12.18 + + **`v2.0.0-rc`** + * Newly release 2.0-rc version, fully upgraded to dynamic graph. It supports 15+ segmentation models, 4 backbone networks, 3 datasets, and 4 types of loss functions: + * Segmentation models: ANN, BiSeNetV2, DANet, DeeplabV3, DeeplabV3+, FCN, FastSCNN, Gated-scnn, GCNet, OCRNet, PSPNet, UNet, and U2-Net, Attention UNet. + * Backbone networks: ResNet, HRNet, MobileNetV3, and Xception. + * Datasets: Cityscapes, ADE20K, and Pascal VOC. + * Loss: CrossEntropy Loss, BootstrappedCrossEntropy Loss, Dice Loss, BCE Loss. + * Provide 40+ high quality pre-trained models based on Cityscapes and Pascal Voc datasets. + * Support multi-card GPU parallel evaluation. This provides the efficient index calculation function. Support multiple evaluation methods such as multi-scale evaluation/flip evaluation/sliding window evaluation. + +* 2020.12.02 + + **`v0.8.0`** + * Add multi-scale/flipping/sliding-window inference. + * Add the fast multi-GPUs evaluation, and high-efficient metric calculation. + * Add Pascal VOC 2012 dataset. + * Add high-accuracy pre-trained models on Pascal VOC 2012, see [detailed models](../dygraph/configs/). + * Support visualizing pseudo-color images in PNG format while predicting. + +* 2020.10.28 + + **`v0.7.0`** + * 全面支持Paddle2.0-rc动态图模式,推出PaddleSeg[动态图体验版](../dygraph/) + * 发布大量动态图模型,支持11个分割模型,4个骨干网络,3个数据集: + * 分割模型:ANN, BiSeNetV2, DANet, DeeplabV3, DeeplabV3+, FCN, FastSCNN, GCNet, OCRNet, PSPNet, UNet + * 骨干网络:ResNet, HRNet, MobileNetV3, Xception + * 数据集:Cityscapes, ADE20K, Pascal VOC + + * 提供高精度骨干网络预训练模型以及基于Cityscapes数据集的语义分割[预训练模型](../dygraph/configs/)。Cityscapes精度超过**82%**。 + + +* 2020.08.31 + + **`v0.6.0`** + * 丰富Deeplabv3p网络结构,新增ResNet-vd、MobileNetv3两种backbone,满足高性能与高精度场景,并提供基于Cityscapes和ImageNet的[预训练模型](./model_zoo.md)4个。 + * 新增高精度分割模型OCRNet,支持以HRNet作为backbone,提供基于Cityscapes的[预训练模型](https://github.com/PaddlePaddle/PaddleSeg/blob/develop/docs/model_zoo.md#cityscapes%E9%A2%84%E8%AE%AD%E7%BB%83%E6%A8%A1%E5%9E%8B),mIoU超过80%。 + * 新增proposal free的实例分割模型[Spatial Embedding](https://github.com/PaddlePaddle/PaddleSeg/tree/develop/contrib/SpatialEmbeddings),性能与精度均超越MaskRCNN。提供了基于kitti的预训练模型。 + +* 2020.05.12 + + **`v0.5.0`** + * 全面升级[HumanSeg人像分割模型](../contrib/HumanSeg),新增超轻量级人像分割模型HumanSeg-lite支持移动端实时人像分割处理,并提供基于光流的视频分割后处理提升分割流畅性。 + * 新增[气象遥感分割方案](../contrib/RemoteSensing),支持积雪识别、云检测等气象遥感场景。 + * 新增[Lovasz Loss](lovasz_loss.md),解决数据类别不均衡问题。 + * 使用VisualDL 2.0作为训练可视化工具 + +* 2020.02.25 + + **`v0.4.0`** + * 新增适用于实时场景且不需要预训练模型的分割网络Fast-SCNN,提供基于Cityscapes的[预训练模型](./model_zoo.md)1个 + * 新增LaneNet车道线检测网络,提供[预训练模型](https://github.com/PaddlePaddle/PaddleSeg/tree/release/v0.4.0/contrib/LaneNet#%E4%B8%83-%E5%8F%AF%E8%A7%86%E5%8C%96)一个 + * 新增基于PaddleSlim的分割库压缩策略([量化](../slim/quantization/README.md), [蒸馏](../slim/distillation/README.md), [剪枝](../slim/prune/README.md), [搜索](../slim/nas/README.md)) + + +* 2019.12.15 + + **`v0.3.0`** + * 新增HRNet分割网络,提供基于cityscapes和ImageNet的[预训练模型](./model_zoo.md)8个 + * 支持使用[伪彩色标签](./data_prepare.md#%E7%81%B0%E5%BA%A6%E6%A0%87%E6%B3%A8vs%E4%BC%AA%E5%BD%A9%E8%89%B2%E6%A0%87%E6%B3%A8)进行训练/评估/预测,提升训练体验,并提供将灰度标注图转为伪彩色标注图的脚本 + * 新增[学习率warmup](./configs/solver_group.md#lr_warmup)功能,支持与不同的学习率Decay策略配合使用 + * 新增图像归一化操作的GPU化实现,进一步提升预测速度。 + * 新增Python部署方案,更低成本完成工业级部署。 + * 新增Paddle-Lite移动端部署方案,支持人像分割模型的移动端部署。 + * 新增不同分割模型的预测[性能数据Benchmark](../deploy/python/docs/PaddleSeg_Infer_Benchmark.md), 便于开发者提供模型选型性能参考。 + + +* 2019.11.04 + + **`v0.2.0`** + * 新增PSPNet分割网络,提供基于COCO和cityscapes数据集的[预训练模型](./model_zoo.md)4个。 + * 新增Dice Loss、BCE Loss以及组合Loss配置,支持样本不均衡场景下的[模型优化](./loss_select.md)。 + * 支持[FP16混合精度训练](./multiple_gpus_train_and_mixed_precision_train.md)以及动态Loss Scaling,在不损耗精度的情况下,训练速度提升30%+。 + * 支持[PaddlePaddle多卡多进程训练](./multiple_gpus_train_and_mixed_precision_train.md),多卡训练时训练速度提升15%+。 + * 发布基于UNet的[工业标记表盘分割模型](../contrib#%E5%B7%A5%E4%B8%9A%E7%94%A8%E8%A1%A8%E5%88%86%E5%89%B2)。 + +* 2019.09.10 + + **`v0.1.0`** + * PaddleSeg分割库初始版本发布,包含DeepLabv3+, U-Net, ICNet三类分割模型, 其中DeepLabv3+支持Xception, MobileNet v2两种可调节的骨干网络。 + * CVPR19 LIP人体部件分割比赛冠军预测模型发布[ACE2P](../contrib/ACE2P)。 + * 预置基于DeepLabv3+网络的[人像分割](../contrib/HumanSeg/)和[车道线分割](../contrib/RoadLine)预测模型发布。 + +
diff --git a/docs/release_notes_cn.md b/docs/release_notes_cn.md new file mode 100644 index 0000000000..f23e38d085 --- /dev/null +++ b/docs/release_notes_cn.md @@ -0,0 +1,88 @@ +简体中文 | [English](release_notes.md) + +## Release Notes + +* 2020.12.18 + + **`v2.0.0-rc`** + * 全新发布2.0-rc版本,全面升级至动态图,支持15+分割模型,4个骨干网络,3个数据集,4种Loss: + * 分割模型:ANN, BiSeNetV2, DANet, DeeplabV3, DeeplabV3+, FCN, FastSCNN, Gated-scnn, GCNet, HarDNet, OCRNet, PSPNet, UNet, UNet++, U2-Net, Attention UNet + * 骨干网络:ResNet, HRNet, MobileNetV3, Xception + * 数据集:Cityscapes, ADE20K, Pascal VOC + * Loss:CrossEntropy Loss、BootstrappedCrossEntropy Loss、Dice Loss、BCE Loss + * 提供基于Cityscapes和Pascal Voc数据集的高质量预训练模型 40+。 + * 支持多卡GPU并行评估,提供了高效的指标计算功能。支持多尺度评估/翻转评估/滑动窗口评估等多种评估方式。 + +* 2020.12.02 + + **`v0.8.0`** + * 增加多尺度评估/翻转评估/滑动窗口评估等功能。 + * 支持多卡GPU并行评估,提供了高效的指标计算功能。 + * 增加Pascal VOC 2012数据集。 + * 新增在Pascal VOC 2012数据集上的高精度预训练模型,详见[模型库](../configs/)。 + * 支持对PNG格式的伪彩色图片进行预测可视化。 + +* 2020.10.28 + + **`v0.7.0`** + * 全面支持Paddle2.0-rc动态图模式,推出PaddleSeg[动态图体验版](../dygraph/) + * 发布大量动态图模型,支持11个分割模型,4个骨干网络,3个数据集: + * 分割模型:ANN, BiSeNetV2, DANet, DeeplabV3, DeeplabV3+, FCN, FastSCNN, GCNet, OCRNet, PSPNet, UNet + * 骨干网络:ResNet, HRNet, MobileNetV3, Xception + * 数据集:Cityscapes, ADE20K, Pascal VOC + + * 提供高精度骨干网络预训练模型以及基于Cityscapes数据集的语义分割[预训练模型](../dygraph/configs/)。Cityscapes精度超过**82%**。 + + +* 2020.08.31 + + **`v0.6.0`** + * 丰富Deeplabv3p网络结构,新增ResNet-vd、MobileNetv3两种backbone,满足高性能与高精度场景,并提供基于Cityscapes和ImageNet的[预训练模型](./model_zoo.md)4个。 + * 新增高精度分割模型OCRNet,支持以HRNet作为backbone,提供基于Cityscapes的[预训练模型](https://github.com/PaddlePaddle/PaddleSeg/blob/develop/docs/model_zoo.md#cityscapes%E9%A2%84%E8%AE%AD%E7%BB%83%E6%A8%A1%E5%9E%8B),mIoU超过80%。 + * 新增proposal free的实例分割模型[Spatial Embedding](https://github.com/PaddlePaddle/PaddleSeg/tree/develop/contrib/SpatialEmbeddings),性能与精度均超越MaskRCNN。提供了基于kitti的预训练模型。 + +* 2020.05.12 + + **`v0.5.0`** + * 全面升级[HumanSeg人像分割模型](../contrib/HumanSeg),新增超轻量级人像分割模型HumanSeg-lite支持移动端实时人像分割处理,并提供基于光流的视频分割后处理提升分割流畅性。 + * 新增[气象遥感分割方案](../contrib/RemoteSensing),支持积雪识别、云检测等气象遥感场景。 + * 新增[Lovasz Loss](lovasz_loss.md),解决数据类别不均衡问题。 + * 使用VisualDL 2.0作为训练可视化工具 + +* 2020.02.25 + + **`v0.4.0`** + * 新增适用于实时场景且不需要预训练模型的分割网络Fast-SCNN,提供基于Cityscapes的[预训练模型](./model_zoo.md)1个 + * 新增LaneNet车道线检测网络,提供[预训练模型](https://github.com/PaddlePaddle/PaddleSeg/tree/release/v0.4.0/contrib/LaneNet#%E4%B8%83-%E5%8F%AF%E8%A7%86%E5%8C%96)一个 + * 新增基于PaddleSlim的分割库压缩策略([量化](../slim/quantization/README.md), [蒸馏](../slim/distillation/README.md), [剪枝](../slim/prune/README.md), [搜索](../slim/nas/README.md)) + + +* 2019.12.15 + + **`v0.3.0`** + * 新增HRNet分割网络,提供基于cityscapes和ImageNet的[预训练模型](./model_zoo.md)8个 + * 支持使用[伪彩色标签](./data_prepare.md#%E7%81%B0%E5%BA%A6%E6%A0%87%E6%B3%A8vs%E4%BC%AA%E5%BD%A9%E8%89%B2%E6%A0%87%E6%B3%A8)进行训练/评估/预测,提升训练体验,并提供将灰度标注图转为伪彩色标注图的脚本 + * 新增[学习率warmup](./configs/solver_group.md#lr_warmup)功能,支持与不同的学习率Decay策略配合使用 + * 新增图像归一化操作的GPU化实现,进一步提升预测速度。 + * 新增Python部署方案,更低成本完成工业级部署。 + * 新增Paddle-Lite移动端部署方案,支持人像分割模型的移动端部署。 + * 新增不同分割模型的预测[性能数据Benchmark](../deploy/python/docs/PaddleSeg_Infer_Benchmark.md), 便于开发者提供模型选型性能参考。 + + +* 2019.11.04 + + **`v0.2.0`** + * 新增PSPNet分割网络,提供基于COCO和cityscapes数据集的[预训练模型](./model_zoo.md)4个。 + * 新增Dice Loss、BCE Loss以及组合Loss配置,支持样本不均衡场景下的[模型优化](./loss_select.md)。 + * 支持[FP16混合精度训练](./multiple_gpus_train_and_mixed_precision_train.md)以及动态Loss Scaling,在不损耗精度的情况下,训练速度提升30%+。 + * 支持[PaddlePaddle多卡多进程训练](./multiple_gpus_train_and_mixed_precision_train.md),多卡训练时训练速度提升15%+。 + * 发布基于UNet的[工业标记表盘分割模型](../contrib#%E5%B7%A5%E4%B8%9A%E7%94%A8%E8%A1%A8%E5%88%86%E5%89%B2)。 + +* 2019.09.10 + + **`v0.1.0`** + * PaddleSeg分割库初始版本发布,包含DeepLabv3+, U-Net, ICNet三类分割模型, 其中DeepLabv3+支持Xception, MobileNet v2两种可调节的骨干网络。 + * CVPR19 LIP人体部件分割比赛冠军预测模型发布[ACE2P](../contrib/ACE2P)。 + * 预置基于DeepLabv3+网络的[人像分割](../contrib/HumanSeg/)和[车道线分割](../contrib/RoadLine)预测模型发布。 + +
diff --git a/legacy/configs/fcn.yaml b/legacy/configs/fcn.yaml new file mode 100644 index 0000000000..726350b734 --- /dev/null +++ b/legacy/configs/fcn.yaml @@ -0,0 +1,39 @@ +# 数据集配置 +DATASET: + DATA_DIR: "./dataset/optic_disc_seg/" + NUM_CLASSES: 2 + TEST_FILE_LIST: "./dataset/optic_disc_seg/test_list.txt" + TRAIN_FILE_LIST: "./dataset/optic_disc_seg/train_list.txt" + VAL_FILE_LIST: "./dataset/optic_disc_seg/val_list.txt" + VIS_FILE_LIST: "./dataset/optic_disc_seg/test_list.txt" + +# 预训练模型配置 +MODEL: + MODEL_NAME: "hrnet" + DEFAULT_NORM_TYPE: "bn" + HRNET: + STAGE2: + NUM_CHANNELS: [18, 36] + STAGE3: + NUM_CHANNELS: [18, 36, 72] + STAGE4: + NUM_CHANNELS: [18, 36, 72, 144] + +# 其他配置 +TRAIN_CROP_SIZE: (512, 512) +EVAL_CROP_SIZE: (512, 512) +AUG: + AUG_METHOD: "unpadding" + FIX_RESIZE_SIZE: (512, 512) +BATCH_SIZE: 1 +TRAIN: + PRETRAINED_MODEL_DIR: "./pretrained_model/hrnet_w18_bn_cityscapes/" + MODEL_SAVE_DIR: "./saved_model/hrnet_optic/" + SNAPSHOT_EPOCH: 1 +TEST: + TEST_MODEL: "./saved_model/hrnet_optic/final" +SOLVER: + NUM_EPOCHS: 10 + LR: 0.001 + LR_POLICY: "poly" + OPTIMIZER: "adam" diff --git a/legacy/docs/train_on_xpu.md b/legacy/docs/train_on_xpu.md index d7b830b73f..310b7fe814 100644 --- a/legacy/docs/train_on_xpu.md +++ b/legacy/docs/train_on_xpu.md @@ -10,16 +10,15 @@ * 数据准备(在legacy目录下): ```shell -python pretrained_model/download_model.py deeplabv3p_xception65_bn_coco +python dataset/download_optic.py ``` * 预训练模型准备(在legacy目录下): ```shell -python dataset/download_optic.py +python pretrained_model/download_model.py deeplabv3p_xception65_bn_coco ``` - * 执行训练(在legacy目录下): ```shell @@ -30,16 +29,15 @@ python pdseg/train.py --cfg configs/deeplabv3p_xception65_optic_kunlun.yaml --us * 数据准备(在legacy目录下): ```shell -python pretrained_model/download_model.py unet_bn_coco +python dataset/download_optic.py ``` * 预训练模型准备(在legacy目录下): ```shell -python dataset/download_optic.py +python pretrained_model/download_model.py unet_bn_coco ``` - * 执行训练(在legacy目录下): 因为昆仑1的内存不够,在用昆仑1训练的时候,需要把./configs/unet_optic.yaml 里面的 BATCH_SIZE @@ -54,3 +52,31 @@ export XPUSIM_DEVICE_MODEL=KUNLUN1 python pdseg/train.py --use_xpu --cfg configs/unet_optic.yaml --use_mpio --log_steps 1 --do_eval ``` +### FCN +* 数据准备(在legacy目录下): + +```shell +python dataset/download_optic.py +``` + +* 预训练模型准备(在legacy目录下): + +```shell +python pretrained_model/download_model.py hrnet_w18_bn_cityscapes +``` + +* 执行训练(在legacy目录下): + +因为昆仑1的内存不够,在用昆仑1训练的时候,需要把./configs/fcn.yaml 里面的 BATCH_SIZE +修改为 1 + +```shell +# 指定xpu的卡号 (以0号卡为例) +export FLAGS_selected_xpus=0 +# 执行xpu产品名称 这里指定昆仑1 +export XPUSIM_DEVICE_MODEL=KUNLUN1 +# 训练 +export PYTHONPATH=`pwd` +python3 pdseg/train.py --cfg configs/fcn.yaml --use_mpio --log_steps 1 --do_eval +``` + diff --git a/legacy/pdseg/models/modeling/icnet.py b/legacy/pdseg/models/modeling/icnet.py index aee0461459..2f4a393e4d 100644 --- a/legacy/pdseg/models/modeling/icnet.py +++ b/legacy/pdseg/models/modeling/icnet.py @@ -116,9 +116,18 @@ def resnet(input): scale = cfg.MODEL.ICNET.DEPTH_MULTIPLIER layers = cfg.MODEL.ICNET.LAYERS model = resnet_backbone(scale=scale, layers=layers, stem='icnet') - end_points = 49 - decode_point = 13 - resize_point = 13 + if layers >= 50: + end_points = layers - 1 + decode_point = 13 + resize_point = 13 + elif layers == 18: + end_points = 13 + decode_point = 9 + resize_point = 9 + elif layers == 34: + end_points = 27 + decode_point = 15 + resize_point = 15 dilation_dict = {2: 2, 3: 4} data, decode_shortcuts = model.net( input, diff --git a/legacy/pdseg/tools/create_dataset_list.py b/legacy/pdseg/tools/create_dataset_list.py index 8dd4c7e9a3..a33bfaad51 100644 --- a/legacy/pdseg/tools/create_dataset_list.py +++ b/legacy/pdseg/tools/create_dataset_list.py @@ -128,12 +128,12 @@ def generate_list(args): file_list = os.path.join(dataset_root, dataset_split + '.txt') with open(file_list, "w") as f: for item in range(num_images): - left = image_files[item].replace(dataset_root, '') + left = image_files[item].replace(dataset_root, '', 1) if left[0] == os.path.sep: left = left.lstrip(os.path.sep) try: - right = label_files[item].replace(dataset_root, '') + right = label_files[item].replace(dataset_root, '', 1) if right[0] == os.path.sep: right = right.lstrip(os.path.sep) line = left + separator + right + '\n' diff --git a/paddleseg/core/infer.py b/paddleseg/core/infer.py index bfb3888909..9d6df78b8a 100644 --- a/paddleseg/core/infer.py +++ b/paddleseg/core/infer.py @@ -42,6 +42,23 @@ def get_reverse_list(ori_shape, transforms): if op.__class__.__name__ in ['Padding']: reverse_list.append(('padding', (h, w))) w, h = op.target_size[0], op.target_size[1] + if op.__class__.__name__ in ['LimitLong']: + long_edge = max(h, w) + short_edge = min(h, w) + if ((op.max_long is not None) and (long_edge > op.max_long)): + reverse_list.append(('resize', (h, w))) + long_edge = op.max_long + short_edge = int(round(short_edge * op.max_long / long_edge)) + elif ((op.min_long is not None) and (long_edge < op.min_long)): + reverse_list.append(('resize', (h, w))) + long_edge = op.min_long + short_edge = int(round(short_edge * op.min_long / long_edge)) + if h > w: + h = long_edge + w = short_edge + else: + w = long_edge + h = short_edge return reverse_list diff --git a/paddleseg/core/train.py b/paddleseg/core/train.py index edcdc8afc8..942e4aa970 100644 --- a/paddleseg/core/train.py +++ b/paddleseg/core/train.py @@ -20,7 +20,7 @@ import paddle import paddle.nn.functional as F -from paddleseg.utils import Timer, calculate_eta, resume, logger +from paddleseg.utils import TimeAverager, calculate_eta, resume, logger from paddleseg.core.val import evaluate @@ -112,16 +112,15 @@ def train(model, from visualdl import LogWriter log_writer = LogWriter(save_dir) - timer = Timer() avg_loss = 0.0 avg_loss_list = [] iters_per_epoch = len(batch_sampler) best_mean_iou = -1.0 best_model_iter = -1 - train_reader_cost = 0.0 - train_batch_cost = 0.0 + reader_cost_averager = TimeAverager() + batch_cost_averager = TimeAverager() save_models = deque() - timer.start() + batch_start = time.time() iter = start_iter while iter < iters: @@ -129,7 +128,7 @@ def train(model, iter += 1 if iter > iters: break - train_reader_cost += timer.elapsed_time() + reader_cost_averager.record(time.time() - batch_start) images = data[0] labels = data[1].astype('int64') edges = None @@ -160,24 +159,24 @@ def train(model, else: for i in range(len(loss_list)): avg_loss_list[i] += loss_list[i] - train_batch_cost += timer.elapsed_time() + batch_cost_averager.record( + time.time() - batch_start, num_samples=batch_size) if (iter) % log_iters == 0 and local_rank == 0: avg_loss /= log_iters avg_loss_list = [ l.numpy()[0] / log_iters for l in avg_loss_list ] - avg_train_reader_cost = train_reader_cost / log_iters - avg_train_batch_cost = train_batch_cost / log_iters - train_reader_cost = 0.0 - train_batch_cost = 0.0 remain_iters = iters - iter + avg_train_batch_cost = batch_cost_averager.get_average() + avg_train_reader_cost = reader_cost_averager.get_average() eta = calculate_eta(remain_iters, avg_train_batch_cost) logger.info( - "[TRAIN] epoch={}, iter={}/{}, loss={:.4f}, lr={:.6f}, batch_cost={:.4f}, reader_cost={:.4f} | ETA {}" + "[TRAIN] epoch={}, iter={}/{}, loss={:.4f}, lr={:.6f}, batch_cost={:.4f}, reader_cost={:.5f}, ips={:.4f} samples/sec | ETA {}" .format((iter - 1) // iters_per_epoch + 1, iter, iters, avg_loss, lr, avg_train_batch_cost, - avg_train_reader_cost, eta)) + avg_train_reader_cost, + batch_cost_averager.get_ips_average(), eta)) if use_vdl: log_writer.add_scalar('Train/loss', avg_loss, iter) # Record all losses if there are more than 2 losses. @@ -196,6 +195,8 @@ def train(model, avg_train_reader_cost, iter) avg_loss = 0.0 avg_loss_list = [] + reader_cost_averager.reset() + batch_cost_averager.reset() if (iter % save_interval == 0 or iter == iters) and (val_dataset is not None): @@ -233,7 +234,7 @@ def train(model, if use_vdl: log_writer.add_scalar('Evaluate/mIoU', mean_iou, iter) log_writer.add_scalar('Evaluate/Acc', acc, iter) - timer.restart() + batch_start = time.time() # Calculate flops. if local_rank == 0: @@ -247,7 +248,6 @@ def count_syncbn(m, x, y): flops = paddle.flops( model, [1, c, h, w], custom_ops={paddle.nn.SyncBatchNorm: count_syncbn}) - logger.info(flops) # Sleep for half a second to let dataloader release resources. time.sleep(0.5) diff --git a/paddleseg/core/val.py b/paddleseg/core/val.py index cdf0a348b9..3516de9364 100644 --- a/paddleseg/core/val.py +++ b/paddleseg/core/val.py @@ -15,10 +15,11 @@ import os import numpy as np +import time import paddle import paddle.nn.functional as F -from paddleseg.utils import metrics, Timer, calculate_eta, logger, progbar +from paddleseg.utils import metrics, TimeAverager, calculate_eta, logger, progbar from paddleseg.core import infer np.set_printoptions(suppress=True) @@ -80,10 +81,12 @@ def evaluate(model, logger.info("Start evaluating (total_samples={}, total_iters={})...".format( len(eval_dataset), total_iters)) progbar_val = progbar.Progbar(target=total_iters, verbose=1) - timer = Timer() + reader_cost_averager = TimeAverager() + batch_cost_averager = TimeAverager() + batch_start = time.time() with paddle.no_grad(): for iter, (im, label) in enumerate(loader): - reader_cost = timer.elapsed_time() + reader_cost_averager.record(time.time() - batch_start) label = label.astype('int64') ori_shape = label.shape[-2:] @@ -120,7 +123,8 @@ def evaluate(model, intersect_area_list = [] pred_area_list = [] label_area_list = [] - paddle.distributed.all_gather(intersect_area_list, intersect_area) + paddle.distributed.all_gather(intersect_area_list, + intersect_area) paddle.distributed.all_gather(pred_area_list, pred_area) paddle.distributed.all_gather(label_area_list, label_area) @@ -132,19 +136,25 @@ def evaluate(model, label_area_list = label_area_list[:valid] for i in range(len(intersect_area_list)): - intersect_area_all = intersect_area_all + intersect_area_list[i] + intersect_area_all = intersect_area_all + intersect_area_list[ + i] pred_area_all = pred_area_all + pred_area_list[i] label_area_all = label_area_all + label_area_list[i] else: intersect_area_all = intersect_area_all + intersect_area pred_area_all = pred_area_all + pred_area label_area_all = label_area_all + label_area - batch_cost = timer.elapsed_time() - timer.restart() + batch_cost_averager.record( + time.time() - batch_start, num_samples=len(label)) + batch_cost = batch_cost_averager.get_average() + reader_cost = reader_cost_averager.get_average() if local_rank == 0: progbar_val.update(iter + 1, [('batch_cost', batch_cost), ('reader cost', reader_cost)]) + reader_cost_averager.reset() + batch_cost_averager.reset() + batch_start = time.time() class_iou, miou = metrics.mean_iou(intersect_area_all, pred_area_all, label_area_all) diff --git a/paddleseg/cvlibs/config.py b/paddleseg/cvlibs/config.py index b56d6a5ac0..407a6204d6 100644 --- a/paddleseg/cvlibs/config.py +++ b/paddleseg/cvlibs/config.py @@ -227,24 +227,33 @@ def loss(self) -> dict: @property def model(self) -> paddle.nn.Layer: model_cfg = self.dic.get('model').copy() - model_cfg['num_classes'] = self.train_dataset.num_classes - if not model_cfg: raise RuntimeError('No model specified in the configuration file.') + if not 'num_classes' in model_cfg: + if self.train_dataset and hasattr(self.train_dataset, + 'num_classes'): + model_cfg['num_classes'] = self.train_dataset.num_classes + elif self.val_dataset and hasattr(self.val_dataset, 'num_classes'): + model_cfg['num_classes'] = self.val_dataset.num_classes + else: + raise ValueError( + '`num_classes` is not found. Please set it in model, train_dataset or val_dataset' + ) + if not self._model: self._model = self._load_object(model_cfg) return self._model @property def train_dataset(self) -> paddle.io.Dataset: - _train_dataset = self.dic.get('train_dataset').copy() + _train_dataset = self.dic.get('train_dataset', {}).copy() if not _train_dataset: return None return self._load_object(_train_dataset) @property def val_dataset(self) -> paddle.io.Dataset: - _val_dataset = self.dic.get('val_dataset').copy() + _val_dataset = self.dic.get('val_dataset', {}).copy() if not _val_dataset: return None return self._load_object(_val_dataset) diff --git a/paddleseg/datasets/__init__.py b/paddleseg/datasets/__init__.py index 8c328897b9..047fd22a6c 100644 --- a/paddleseg/datasets/__init__.py +++ b/paddleseg/datasets/__init__.py @@ -17,4 +17,5 @@ from .voc import PascalVOC from .ade import ADE20K from .optic_disc_seg import OpticDiscSeg +from .pascal_context import PascalContext from .mini_deep_globe_road_extraction import MiniDeepGlobeRoadExtraction diff --git a/paddleseg/datasets/cocostuff.py b/paddleseg/datasets/cocostuff.py new file mode 100644 index 0000000000..88a8c8a903 --- /dev/null +++ b/paddleseg/datasets/cocostuff.py @@ -0,0 +1,82 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import glob + +from paddleseg.datasets import Dataset +from paddleseg.cvlibs import manager +from paddleseg.transforms import Compose + +@manager.DATASETS.add_component +class CocoStuff(Dataset): + """ + COCO-Stuff dataset `https://github.com/nightrome/cocostuff`. + The folder structure is as follow: + + cocostuff + | + |--images + | |--train2017 + | |--val2017 + | + |--annotations + | |--train2017 + | |--val2017 + + + Args: + transforms (list): Transforms for image. + dataset_root (str): Cityscapes dataset directory. + mode (str): Which part of dataset to use. it is one of ('train', 'val'). Default: 'train'. + """ + + def __init__(self, transforms, dataset_root, mode='train'): + self.dataset_root = dataset_root + self.transforms = Compose(transforms) + self.file_list = list() + mode = mode.lower() + self.mode = mode + self.num_classes = 172 + self.ignore_index = 255 + + if mode not in ['train', 'val']: + raise ValueError( + "mode should be 'train', 'val', but got {}.".format(mode)) + + if self.transforms is None: + raise ValueError("`transforms` is necessary, but it is None.") + + img_dir = os.path.join(self.dataset_root, 'images') + label_dir = os.path.join(self.dataset_root, 'annotations') + if self.dataset_root is None or not os.path.isdir( + self.dataset_root) or not os.path.isdir( + img_dir) or not os.path.isdir(label_dir): + raise ValueError( + "The dataset is not Found or the folder structure is nonconfoumance." + ) + + label_files = sorted( + glob.glob( + os.path.join(label_dir, mode+'2017', '*.png'))) + + img_files = sorted( + glob.glob(os.path.join(img_dir, mode+'2017', '*.jpg'))) + + self.file_list = [[ + img_path, label_path + ] for img_path, label_path in zip(img_files, label_files)] + + + diff --git a/paddleseg/datasets/mini_deep_globe_road_extraction.py b/paddleseg/datasets/mini_deep_globe_road_extraction.py index 705d4e8f7a..0f6851b603 100644 --- a/paddleseg/datasets/mini_deep_globe_road_extraction.py +++ b/paddleseg/datasets/mini_deep_globe_road_extraction.py @@ -26,13 +26,14 @@ @manager.DATASETS.add_component class MiniDeepGlobeRoadExtraction(Dataset): """ - OpticDiscSeg dataset is extraced from iChallenge-AMD - (https://ai.baidu.com/broad/subordinate?dataset=amd). + MiniDeepGlobeRoadExtraction dataset is extraced from DeepGlobe CVPR2018 challenge (http://deepglobe.org/) + + There are 800 images in the training set and 200 images in the validation set. Args: - transforms (list): Transforms for image. - dataset_root (str): The dataset directory. Default: None - mode (str, optional): Which part of dataset to use. it is one of ('train', 'val', 'test'). Default: 'train'. + dataset_root (str, optional): The dataset directory. Default: None + transforms (list, optional): Transforms for image. Default: None + mode (str, optional): Which part of dataset to use. it is one of ('train', 'val'). Default: 'train'. edge (bool, optional): Whether to compute edge while training. Default: False """ diff --git a/paddleseg/datasets/pascal_context.py b/paddleseg/datasets/pascal_context.py new file mode 100644 index 0000000000..2361507b0b --- /dev/null +++ b/paddleseg/datasets/pascal_context.py @@ -0,0 +1,77 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os + +from PIL import Image +from paddleseg.datasets import Dataset +from paddleseg.cvlibs import manager +from paddleseg.transforms import Compose + + +@manager.DATASETS.add_component +class PascalContext(Dataset): + """ + PascalVOC2010 dataset `http://host.robots.ox.ac.uk/pascal/VOC/`. + If you want to use pascal context dataset, please run the convert_voc2010.py in tools firstly. + + Args: + transforms (list): Transforms for image. + dataset_root (str): The dataset directory. Default: None + mode (str): Which part of dataset to use. it is one of ('train', 'trainval', 'context', 'val'). + If you want to set mode to 'context', please make sure the dataset have been augmented. Default: 'train'. + """ + + def __init__(self, transforms=None, dataset_root=None, mode='train'): + self.dataset_root = dataset_root + self.transforms = Compose(transforms) + mode = mode.lower() + self.mode = mode + self.file_list = list() + self.num_classes = 59 + self.ignore_index = 255 + + if mode not in ['train', 'trainval', 'val']: + raise ValueError( + "`mode` should be one of ('train', 'trainval', 'val') in PascalContext dataset, but got {}." + .format(mode)) + + if self.transforms is None: + raise ValueError("`transforms` is necessary, but it is None.") + if self.dataset_root is None: + raise ValueError( + "The dataset is not Found or the folder structure is nonconfoumance.") + + image_set_dir = os.path.join(self.dataset_root, 'ImageSets','Segmentation') + + if mode == 'train': + file_path = os.path.join(image_set_dir, 'train_context.txt') + elif mode == 'val': + file_path = os.path.join(image_set_dir, 'val_context.txt') + elif mode == 'trainval': + file_path = os.path.join(image_set_dir, 'trainval_context.txt') + if not os.path.exists(file_path): + raise RuntimeError( + "PASCAL-Context annotations are not ready, " + "Please make sure voc_context.py has been properly run.") + + img_dir = os.path.join(self.dataset_root, 'JPEGImages') + label_dir = os.path.join(self.dataset_root, 'Context') + + with open(file_path, 'r') as f: + for line in f: + line = line.strip() + image_path = os.path.join(img_dir, ''.join([line, '.jpg'])) + label_path = os.path.join(label_dir, ''.join([line, '.png'])) + self.file_list.append([image_path, label_path]) diff --git a/paddleseg/models/__init__.py b/paddleseg/models/__init__.py index 20037459fc..0d546fd3c2 100644 --- a/paddleseg/models/__init__.py +++ b/paddleseg/models/__init__.py @@ -31,3 +31,6 @@ from .attention_unet import AttentionUNet from .unet_plusplus import UNetPlusPlus from .decoupled_segnet import DecoupledSegNet +from .emanet import * +from .isanet import * +from .dnlnet import * diff --git a/paddleseg/models/dnlnet.py b/paddleseg/models/dnlnet.py new file mode 100644 index 0000000000..4b0913b6a5 --- /dev/null +++ b/paddleseg/models/dnlnet.py @@ -0,0 +1,226 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F + +from paddleseg.models import layers +from paddleseg.cvlibs import manager +from paddleseg.utils import utils + + +@manager.MODELS.add_component +class DNLNet(nn.Layer): + """Disentangled Non-Local Neural Networks. + + The original article refers to + Minghao Yin, et al. "Disentangled Non-Local Neural Networks" + (https://arxiv.org/abs/2006.06668) + Args: + num_classes (int): The unique number of target classes. + backbone (Paddle.nn.Layer): A backbone network. + backbone_indices (tuple): The values in the tuple indicate the indices of output of backbone. + reduction (int): Reduction factor of projection transform. Default: 2. + use_scale (bool): Whether to scale pairwise_weight by + sqrt(1/inter_channels). Default: False. + mode (str): The nonlocal mode. Options are 'embedded_gaussian', + 'dot_product'. Default: 'embedded_gaussian'. + temperature (float): Temperature to adjust attention. Default: 0.05. + concat_input (bool): Whether concat the input and output of convs before classification layer. Default: True + enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss. Default: True. + align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature + is even, e.g. 1024x512, otherwise it is True, e.g. 769x769. Default: False. + pretrained (str, optional): The path or url of pretrained model. Default: None. + """ + + def __init__(self, + num_classes, + backbone, + backbone_indices=(2, 3), + reduction=2, + use_scale=True, + mode='embedded_gaussian', + temperature=0.05, + concat_input=True, + enable_auxiliary_loss=True, + align_corners=False, + pretrained=None): + super().__init__() + self.backbone = backbone + self.backbone_indices = backbone_indices + in_channels = [self.backbone.feat_channels[i] for i in backbone_indices] + self.head = DNLHead(num_classes, in_channels, reduction, use_scale, + mode, temperature, concat_input, + enable_auxiliary_loss) + self.align_corners = align_corners + self.pretrained = pretrained + self.init_weight() + + def forward(self, x): + feats = self.backbone(x) + feats = [feats[i] for i in self.backbone_indices] + logit_list = self.head(feats) + logit_list = [ + F.interpolate( + logit, + x.shape[2:], + mode='bilinear', + align_corners=self.align_corners, + align_mode=1) for logit in logit_list + ] + return logit_list + + def init_weight(self): + if self.pretrained is not None: + utils.load_entire_model(self, self.pretrained) + + +class DNLHead(nn.Layer): + """ + The DNLNet head. + + Args: + num_classes (int): The unique number of target classes. + in_channels (tuple): The number of input channels. + reduction (int): Reduction factor of projection transform. Default: 2. + use_scale (bool): Whether to scale pairwise_weight by + sqrt(1/inter_channels). Default: False. + mode (str): The nonlocal mode. Options are 'embedded_gaussian', + 'dot_product'. Default: 'embedded_gaussian.'. + temperature (float): Temperature to adjust attention. Default: 0.05 + concat_input (bool): Whether concat the input and output of convs before classification layer. Default: True + enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss. Default: True. + """ + + def __init__(self, + num_classes, + in_channels, + reduction, + use_scale, + mode, + temperature, + concat_input=True, + enable_auxiliary_loss=True, + **kwargs): + super(DNLHead, self).__init__() + self.in_channels = in_channels[-1] + self.concat_input = concat_input + self.enable_auxiliary_loss = enable_auxiliary_loss + inter_channels = self.in_channels // 4 + + self.dnl_block = DisentangledNonLocal2D( + in_channels=inter_channels, + reduction=reduction, + use_scale=use_scale, + temperature=temperature, + mode=mode) + self.conv0 = layers.ConvBNReLU( + in_channels=self.in_channels, + out_channels=inter_channels, + kernel_size=3, + bias_attr=False) + self.conv1 = layers.ConvBNReLU( + in_channels=inter_channels, + out_channels=inter_channels, + kernel_size=3, + bias_attr=False) + self.cls = nn.Sequential( + nn.Dropout2D(p=0.1), nn.Conv2D(inter_channels, num_classes, 1)) + self.aux = nn.Sequential( + layers.ConvBNReLU( + in_channels=1024, + out_channels=256, + kernel_size=3, + bias_attr=False), nn.Dropout2D(p=0.1), + nn.Conv2D(256, num_classes, 1)) + if self.concat_input: + self.conv_cat = layers.ConvBNReLU( + self.in_channels + inter_channels, + inter_channels, + kernel_size=3, + bias_attr=False) + + def forward(self, feat_list): + C3, C4 = feat_list + output = self.conv0(C4) + output = self.dnl_block(output) + output = self.conv1(output) + if self.concat_input: + output = self.conv_cat(paddle.concat([C4, output], axis=1)) + output = self.cls(output) + if self.enable_auxiliary_loss: + auxout = self.aux(C3) + return [output, auxout] + else: + return [output] + + +class DisentangledNonLocal2D(layers.NonLocal2D): + """Disentangled Non-Local Blocks. + + Args: + temperature (float): Temperature to adjust attention. + """ + + def __init__(self, temperature, *arg, **kwargs): + super().__init__(*arg, **kwargs) + self.temperature = temperature + self.conv_mask = nn.Conv2D(self.in_channels, 1, kernel_size=1) + + def embedded_gaussian(self, theta_x, phi_x): + pairwise_weight = paddle.matmul(theta_x, phi_x) + if self.use_scale: + pairwise_weight /= theta_x.shape[-1]**0.5 + pairwise_weight /= self.temperature + pairwise_weight = F.softmax(pairwise_weight, -1) + return pairwise_weight + + def forward(self, x): + n, c, h, w = x.shape + g_x = self.g(x).reshape([n, self.inter_channels, + -1]).transpose([0, 2, 1]) + + if self.mode == "gaussian": + theta_x = paddle.transpose( + x.reshape([n, self.in_channels, -1]), [0, 2, 1]) + if self.sub_sample: + phi_x = paddle.transpose(self.phi(x), [n, self.in_channels, -1]) + else: + phi_x = paddle.transpose(x, [n, self.in_channels, -1]) + + elif self.mode == "concatenation": + theta_x = paddle.reshape( + self.theta(x), [n, self.inter_channels, -1, 1]) + phi_x = paddle.reshape(self.phi(x), [n, self.inter_channels, 1, -1]) + + else: + theta_x = self.theta(x).reshape([n, self.inter_channels, + -1]).transpose([0, 2, 1]) + phi_x = paddle.reshape(self.phi(x), [n, self.inter_channels, -1]) + + theta_x -= paddle.mean(theta_x, axis=-2, keepdim=True) + phi_x -= paddle.mean(phi_x, axis=-1, keepdim=True) + + pairwise_func = getattr(self, self.mode) + pairwise_weight = pairwise_func(theta_x, phi_x) + + y = paddle.matmul(pairwise_weight, g_x).transpose([0, 2, 1]).reshape( + [n, self.inter_channels, h, w]) + unary_mask = F.softmax( + paddle.reshape(self.conv_mask(x), [n, 1, -1]), -1) + unary_x = paddle.matmul(unary_mask, g_x).transpose([0, 2, 1]).reshape( + [n, self.inter_channels, 1, 1]) + output = x + self.conv_out(y + unary_x) + return output diff --git a/paddleseg/models/emanet.py b/paddleseg/models/emanet.py new file mode 100644 index 0000000000..a567e433a8 --- /dev/null +++ b/paddleseg/models/emanet.py @@ -0,0 +1,199 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F + +from paddleseg.models import layers +from paddleseg.cvlibs import manager +from paddleseg.utils import utils + + +@manager.MODELS.add_component +class EMANet(nn.Layer): + """ + Expectation Maximization Attention Networks for Semantic Segmentation based on PaddlePaddle. + + The original article refers to + Xia Li, et al. "Expectation-Maximization Attention Networks for Semantic Segmentation" + (https://arxiv.org/abs/1907.13426) + + Args: + num_classes (int): The unique number of target classes. + backbone (Paddle.nn.Layer): A backbone network. + backbone_indices (tuple): The values in the tuple indicate the indices of output of backbone. + ema_channels (int): EMA module channels. + gc_channels (int): The input channels to Global Context Block. + num_bases (int): Number of bases. + stage_num (int): The iteration number for EM. + momentum (float): The parameter for updating bases. + concat_input (bool): Whether concat the input and output of convs before classification layer. Default: True + enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss. Default: True. + align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature + is even, e.g. 1024x512, otherwise it is True, e.g. 769x769. Default: False. + pretrained (str, optional): The path or url of pretrained model. Default: None. + """ + + def __init__(self, + num_classes, + backbone, + backbone_indices=(2, 3), + ema_channels=512, + gc_channels=256, + num_bases=64, + stage_num=3, + momentum=0.1, + concat_input=True, + enable_auxiliary_loss=True, + align_corners=False, + pretrained=None): + super().__init__() + + self.backbone = backbone + self.backbone_indices = backbone_indices + in_channels = [self.backbone.feat_channels[i] for i in backbone_indices] + self.head = EMAHead(num_classes, in_channels, ema_channels, gc_channels, + num_bases, stage_num, momentum, concat_input, enable_auxiliary_loss) + self.align_corners = align_corners + self.pretrained = pretrained + self.init_weight() + + def forward(self, x): + feats = self.backbone(x) + feats = [feats[i] for i in self.backbone_indices] + logit_list = self.head(feats) + logit_list = [F.interpolate( + logit, + x.shape[2:], + mode='bilinear', + align_corners=self.align_corners) for logit in logit_list] + + return logit_list + + def init_weight(self): + if self.pretrained is not None: + utils.load_entire_model(self, self.pretrained) + + +class EMAHead(nn.Layer): + """ + The EMANet head. + + Args: + num_classes (int): The unique number of target classes. + in_channels (tuple): The number of input channels. + ema_channels (int): EMA module channels. + gc_channels (int): The input channels to Global Context Block. + num_bases (int): Number of bases. + stage_num (int): The iteration number for EM. + momentum (float): The parameter for updating bases. + concat_input (bool): Whether concat the input and output of convs before classification layer. Default: True + enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss. Default: True. + """ + + def __init__(self, + num_classes, + in_channels, + ema_channels, + gc_channels, + num_bases, + stage_num, + momentum, + concat_input=True, + enable_auxiliary_loss=True): + super(EMAHead, self).__init__() + + self.in_channels = in_channels[-1] + self.concat_input = concat_input + self.enable_auxiliary_loss = enable_auxiliary_loss + + self.emau = EMAU(ema_channels, num_bases, stage_num, momentum=momentum) + self.ema_in_conv = layers.ConvBNReLU(in_channels=self.in_channels, out_channels=ema_channels, kernel_size=3) + self.ema_mid_conv = nn.Conv2D(ema_channels, ema_channels, kernel_size=1) + for param in self.ema_mid_conv.parameters(): + param.stop_gradient = True + self.ema_out_conv = layers.ConvBNReLU(in_channels=ema_channels, out_channels=ema_channels, kernel_size=1) + self.bottleneck = layers.ConvBNReLU(in_channels=ema_channels, out_channels=gc_channels, kernel_size=3) + self.cls = nn.Sequential(nn.Dropout2D(p=0.1),nn.Conv2D(gc_channels, num_classes, 1)) + self.aux = nn.Sequential(layers.ConvBNReLU(in_channels=1024, out_channels=256, kernel_size=3), + nn.Dropout2D(p=0.1), + nn.Conv2D(256, num_classes, 1)) + if self.concat_input: + self.conv_cat = layers.ConvBNReLU(self.in_channels+gc_channels, gc_channels, kernel_size=3) + + def forward(self, feat_list): + C3, C4 = feat_list + feats = self.ema_in_conv(C4) + identity = feats + feats = self.ema_mid_conv(feats) + recon = self.emau(feats) + recon = F.relu(recon) + recon = self.ema_out_conv(recon) + output = F.relu(identity + recon) + output = self.bottleneck(output) + if self.concat_input: + output = self.conv_cat(paddle.concat([C4, output], axis=1)) + output = self.cls(output) + if self.enable_auxiliary_loss: + auxout = self.aux(C3) + return [output, auxout] + else: + return [output] + + +class EMAU(nn.Layer): + '''The Expectation-Maximization Attention Unit (EMAU). + + Arguments: + c (int): The input and output channel number. + k (int): The number of the bases. + stage_num (int): The iteration number for EM. + momentum (float): The parameter for updating bases. + ''' + def __init__(self, c, k, stage_num=3, momentum=0.1): + super(EMAU, self).__init__() + assert stage_num >= 1 + self.stage_num = stage_num + self.momentum = momentum + + tmp_mu = self.create_parameter(shape=[1, c, k], default_initializer=paddle.nn.initializer.KaimingNormal(k)) + self.mu = F.normalize(paddle.to_tensor(tmp_mu), axis=1, p=2) + self.register_buffer('bases', self.mu) + + def forward(self, x): + b, c, h, w = x.shape + x = paddle.reshape(x, [b, c, h*w]) + mu = paddle.tile(self.mu, [b, 1, 1]) + + with paddle.no_grad(): + for i in range(self.stage_num): + x_t = paddle.transpose(x, [0, 2, 1]) + z = paddle.bmm(x_t, mu) + z = F.softmax(z, axis=2) + z_ = F.normalize(z, axis=1, p=1) + mu = paddle.bmm(x, z_) + mu = F.normalize(mu, axis=1, p=2) + + z_t = paddle.transpose(z, [0, 2, 1]) + x = paddle.matmul(mu, z_t) + x = paddle.reshape(x, [b, c, h, w]) + + if self.training: + mu = paddle.mean(mu, 0, keepdim=True) + if paddle.distributed.get_world_size() >1: + paddle.distributed.reduce(mu/paddle.distributed.get_world_size(), 0) + mu = F.normalize(mu, axis=1, p=2) + self.mu = self.mu * (1 - self.momentum) + mu * self.momentum + return x diff --git a/paddleseg/models/isanet.py b/paddleseg/models/isanet.py new file mode 100644 index 0000000000..4a083d9bd1 --- /dev/null +++ b/paddleseg/models/isanet.py @@ -0,0 +1,178 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import math + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F + +from paddleseg.models import layers +from paddleseg.cvlibs import manager +from paddleseg.utils import utils + + +@manager.MODELS.add_component +class ISANet(nn.Layer): + """Interlaced Sparse Self-Attention for Semantic Segmentation. + + The original article refers to Lang Huang, et al. "Interlaced Sparse Self-Attention for Semantic Segmentation" + (https://arxiv.org/abs/1907.12273). + + Args: + num_classes (int): The unique number of target classes. + backbone (Paddle.nn.Layer): A backbone network. + backbone_indices (tuple): The values in the tuple indicate the indices of output of backbone. + isa_channels (int): The channels of ISA Module. + down_factor (tuple): Divide the height and width dimension to (Ph, PW) groups. + enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss. Default: True. + align_corners (bool): An argument of F.interpolate. It should be set to False when the output size of feature + is even, e.g. 1024x512, otherwise it is True, e.g. 769x769. Default: False. + pretrained (str, optional): The path or url of pretrained model. Default: None. + + """ + + def __init__(self, + num_classes, + backbone, + backbone_indices=(2, 3), + isa_channels=256, + down_factor=(8, 8), + enable_auxiliary_loss=True, + align_corners=False, + pretrained=None): + super().__init__() + + self.backbone = backbone + self.backbone_indices = backbone_indices + in_channels = [self.backbone.feat_channels[i] for i in backbone_indices] + self.head = ISAHead(num_classes, in_channels, isa_channels, down_factor, enable_auxiliary_loss) + self.align_corners = align_corners + self.pretrained = pretrained + self.init_weight() + + def forward(self, x): + feats = self.backbone(x) + feats = [feats[i] for i in self.backbone_indices] + logit_list = self.head(feats) + logit_list = [F.interpolate( + logit, + x.shape[2:], + mode='bilinear', + align_corners=self.align_corners, + align_mode=1) for logit in logit_list] + + return logit_list + + def init_weight(self): + if self.pretrained is not None: + utils.load_entire_model(self, self.pretrained) + + +class ISAHead(nn.Layer): + """ + The ISAHead. + + Args: + num_classes (int): The unique number of target classes. + in_channels (tuple): The number of input channels. + isa_channels (int): The channels of ISA Module. + down_factor (tuple): Divide the height and width dimension to (Ph, PW) groups. + enable_auxiliary_loss (bool, optional): A bool value indicates whether adding auxiliary loss. Default: True. + """ + def __init__(self, num_classes, in_channels, isa_channels, down_factor, enable_auxiliary_loss): + super(ISAHead, self).__init__() + self.in_channels = in_channels[-1] + inter_channels = self.in_channels // 4 + self.down_factor = down_factor + self.enable_auxiliary_loss = enable_auxiliary_loss + self.in_conv = layers.ConvBNReLU(self.in_channels, inter_channels, 3, bias_attr=False) + self.global_relation = SelfAttentionBlock(inter_channels, isa_channels) + self.local_relation = SelfAttentionBlock(inter_channels, isa_channels) + self.out_conv = layers.ConvBNReLU(inter_channels * 2, inter_channels, 1, bias_attr=False) + self.cls = nn.Sequential(nn.Dropout2D(p=0.1), nn.Conv2D(inter_channels, num_classes, 1)) + self.aux = nn.Sequential( + layers.ConvBNReLU(in_channels=1024, out_channels=256, kernel_size=3, bias_attr=False), + nn.Dropout2D(p=0.1), + nn.Conv2D(256, num_classes, 1)) + + def forward(self, feat_list): + C3, C4 = feat_list + x = self.in_conv(C4) + n, c, h, w = x.shape + P_h, P_w = self.down_factor + Q_h, Q_w = math.ceil(h / P_h), math.ceil(w / P_w) + pad_h, pad_w = Q_h * P_h - h, Q_w * P_w - w + if pad_h > 0 or pad_w > 0: + padding = [pad_w // 2, pad_w - pad_w // 2, pad_h // 2, pad_h - pad_h // 2] + feat = F.pad(x, padding) + else: + feat = x + + feat = feat.reshape([n, c, Q_h, P_h, Q_w, P_w]) + feat = feat.transpose([0, 3, 5, 1, 2, 4]).reshape([-1, c, Q_h, Q_w]) + feat = self.global_relation(feat) + + feat = feat.reshape([n, P_h, P_w, c, Q_h, Q_w]) + feat = feat.transpose([0, 4, 5, 3, 1, 2]).reshape([-1, c, P_h, P_w]) + feat = self.local_relation(feat) + + feat = feat.reshape([n, Q_h, Q_w, c, P_h, P_w]) + feat = feat.transpose([0, 3, 1, 4, 2, 5]).reshape([n, c, P_h * Q_h, P_w * Q_w]) + if pad_h > 0 or pad_w > 0: + feat = feat[:, :, pad_h // 2:pad_h // 2 + h, pad_w // 2:pad_w // 2 + w] + + feat = self.out_conv(paddle.concat([feat, x], axis=1)) + output = self.cls(feat) + + if self.enable_auxiliary_loss: + auxout = self.aux(C3) + return [output, auxout] + else: + return [output] + + +class SelfAttentionBlock(layers.AttentionBlock): + """General self-attention block/non-local block. + + Args: + in_channels (int): Input channels of key/query feature. + channels (int): Output channels of key/query transform. + """ + def __init__(self, in_channels, channels): + super(SelfAttentionBlock, self).__init__( + key_in_channels=in_channels, + query_in_channels=in_channels, + channels=channels, + out_channels=in_channels, + share_key_query=False, + query_downsample=None, + key_downsample=None, + key_query_num_convs=2, + key_query_norm=True, + value_out_num_convs=1, + value_out_norm=False, + matmul_norm=True, + with_out=False) + + self.output_project = self.build_project( + in_channels, + in_channels, + num_convs=1, + use_conv_module=True) + + def forward(self, x): + context = super(SelfAttentionBlock, self).forward(x, x) + return self.output_project(context) + diff --git a/paddleseg/models/layers/__init__.py b/paddleseg/models/layers/__init__.py index 27fbbba370..86ec36c08d 100644 --- a/paddleseg/models/layers/__init__.py +++ b/paddleseg/models/layers/__init__.py @@ -15,3 +15,5 @@ from .layer_libs import ConvBNReLU, ConvBN, SeparableConvBNReLU, DepthwiseConvBN, AuxLayer, SyncBatchNorm from .activation import Activation from .pyramid_pool import ASPPModule, PPModule +from .attention import AttentionBlock +from .nonlocal2d import NonLocal2D diff --git a/paddleseg/models/layers/attention.py b/paddleseg/models/layers/attention.py new file mode 100644 index 0000000000..dabcdd358c --- /dev/null +++ b/paddleseg/models/layers/attention.py @@ -0,0 +1,131 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F + +from paddleseg.models import layers + + +class AttentionBlock(nn.Layer): + """General self-attention block/non-local block. + + The original article refers to refer to https://arxiv.org/abs/1706.03762. + Args: + key_in_channels (int): Input channels of key feature. + query_in_channels (int): Input channels of query feature. + channels (int): Output channels of key/query transform. + out_channels (int): Output channels. + share_key_query (bool): Whether share projection weight between key + and query projection. + query_downsample (nn.Module): Query downsample module. + key_downsample (nn.Module): Key downsample module. + key_query_num_convs (int): Number of convs for key/query projection. + value_out_num_convs (int): Number of convs for value projection. + key_query_norm (bool): Whether to use BN for key/query projection. + value_out_norm (bool): Whether to use BN for value projection. + matmul_norm (bool): Whether normalize attention map with sqrt of + channels + with_out (bool): Whether use out projection. + """ + def __init__(self, key_in_channels, query_in_channels, channels, + out_channels, share_key_query, query_downsample, + key_downsample, key_query_num_convs, value_out_num_convs, + key_query_norm, value_out_norm, matmul_norm, with_out): + super(AttentionBlock, self).__init__() + if share_key_query: + assert key_in_channels == query_in_channels + self.key_in_channels = key_in_channels + self.query_in_channels = query_in_channels + self.out_channels = out_channels + self.channels = channels + self.share_key_query = share_key_query + self.key_project = self.build_project(key_in_channels, + channels, + num_convs=key_query_num_convs, + use_conv_module=key_query_norm) + if share_key_query: + self.query_project = self.key_project + else: + self.query_project = self.build_project( + query_in_channels, + channels, + num_convs=key_query_num_convs, + use_conv_module=key_query_norm) + + self.value_project = self.build_project( + key_in_channels, + channels if with_out else out_channels, + num_convs=value_out_num_convs, + use_conv_module=value_out_norm) + + if with_out: + self.out_project = self.build_project( + channels, + out_channels, + num_convs=value_out_num_convs, + use_conv_module=value_out_norm) + else: + self.out_project = None + + self.query_downsample = query_downsample + self.key_downsample = key_downsample + self.matmul_norm = matmul_norm + + def build_project(self, in_channels, channels, num_convs, use_conv_module): + if use_conv_module: + convs = [layers.ConvBNReLU(in_channels=in_channels, out_channels=channels, kernel_size=1, bias_attr=False)] + for _ in range(num_convs - 1): + convs.append( + layers.ConvBNReLU(in_channels=channels, out_channels=channels, kernel_size=1, bias_attr=False)) + else: + convs = [nn.Conv2D(in_channels, channels, 1)] + for _ in range(num_convs - 1): + convs.append(nn.Conv2D(channels, channels, 1)) + + if len(convs) > 1: + convs = nn.Sequential(*convs) + else: + convs = convs[0] + return convs + + def forward(self, query_feats, key_feats): + b, c, h, w = query_feats.shape + query = self.query_project(query_feats) + if self.query_downsample is not None: + query = self.query_downsample(query) + query = query.reshape([*query.shape[:2], -1]).transpose([0, 2, 1]) + + key = self.key_project(key_feats) + value = self.value_project(key_feats) + + if self.key_downsample is not None: + key = self.key_downsample(key) + value = self.key_downsample(value) + + key = key.reshape([*key.shape[:2], -1]) + value = value.reshape([*value.shape[:2], -1]).transpose([0, 2, 1]) + sim_map = paddle.matmul(query, key) + if self.matmul_norm: + sim_map = (self.channels ** -0.5) * sim_map + sim_map = F.softmax(sim_map, axis=-1) + + context = paddle.matmul(sim_map, value) + context = paddle.transpose(context, [0, 2, 1]) + context = paddle.reshape(context, [b, -1, *query_feats.shape[2:]]) + + if self.out_project is not None: + context = self.out_project(context) + return context \ No newline at end of file diff --git a/paddleseg/models/layers/nonlocal2d.py b/paddleseg/models/layers/nonlocal2d.py new file mode 100644 index 0000000000..bd577c1a16 --- /dev/null +++ b/paddleseg/models/layers/nonlocal2d.py @@ -0,0 +1,154 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import paddle +import paddle.nn as nn +import paddle.nn.functional as F + +from paddleseg.models import layers + + +class NonLocal2D(nn.Layer): + """Basic Non-local module. + This model is the implementation of "Non-local Neural Networks" + (https://arxiv.org/abs/1711.07971) + + Args: + in_channels (int): Channels of the input feature map. + reduction (int): Channel reduction ratio. Default: 2. + use_scale (bool): Whether to scale pairwise_weight by `1/sqrt(inter_channels)` when the mode is `embedded_gaussian`. Default: True. + sub_sample (bool): Whether to utilize max pooling after pairwise function. Default: False. + mode (str): Options are `gaussian`, `concatenation`, `embedded_gaussian` and `dot_product`. Default: embedded_gaussian. + """ + + def __init__(self, + in_channels, + reduction=2, + use_scale=True, + sub_sample=False, + mode='embedded_gaussian'): + super(NonLocal2D, self).__init__() + self.in_channels = in_channels + self.reduction = reduction + self.use_scale = use_scale + self.sub_sample = sub_sample + self.mode = mode + if mode not in [ + 'gaussian', 'embedded_gaussian', 'dot_product', 'concatenation' + ]: + raise ValueError( + "Mode should be in 'gaussian', 'concatenation','embedded_gaussian' or 'dot_product'." + ) + + self.inter_channels = max(in_channels // reduction, 1) + + self.g = nn.Conv2D( + in_channels=self.in_channels, + out_channels=self.inter_channels, + kernel_size=1) + self.conv_out = layers.ConvBNReLU( + in_channels=self.inter_channels, + out_channels=self.in_channels, + kernel_size=1, + bias_attr=False) + + if self.mode != "gaussian": + self.theta = nn.Conv2D( + in_channels=self.in_channels, + out_channels=self.inter_channels, + kernel_size=1) + self.phi = nn.Conv2D( + in_channels=self.in_channels, + out_channels=self.inter_channels, + kernel_size=1) + + if self.mode == "concatenation": + self.concat_project = layers.ConvBNReLU( + in_channels=self.inter_channels * 2, + out_channels=1, + kernel_size=1, + bias_attr=False) + + if self.sub_sample: + max_pool_layer = nn.MaxPool2D(kernel_size=(2, 2)) + self.g = nn.Sequential(self.g, max_pool_layer) + if self.mode != 'gaussian': + self.phi = nn.Sequential(self.phi, max_pool_layer) + else: + self.phi = max_pool_layer + + def gaussian(self, theta_x, phi_x): + pairwise_weight = paddle.matmul(theta_x, phi_x) + pairwise_weight = F.softmax(pairwise_weight, axis=-1) + return pairwise_weight + + def embedded_gaussian(self, theta_x, phi_x): + pairwise_weight = paddle.matmul(theta_x, phi_x) + if self.use_scale: + pairwise_weight /= theta_x.shape[-1]**0.5 + pairwise_weight = F.softmax(pairwise_weight, -1) + return pairwise_weight + + def dot_product(self, theta_x, phi_x): + pairwise_weight = paddle.matmul(theta_x, phi_x) + pairwise_weight /= pairwise_weight.shape[-1] + return pairwise_weight + + def concatenation(self, theta_x, phi_x): + h = theta_x.shape[2] + w = phi_x.shape[3] + theta_x = paddle.tile(theta_x, [1, 1, 1, w]) + phi_x = paddle.tile(phi_x, [1, 1, h, 1]) + + concat_feature = paddle.concat([theta_x, phi_x], axis=1) + pairwise_weight = self.concat_project(concat_feature) + n, _, h, w = pairwise_weight.shape + pairwise_weight = paddle.reshape(pairwise_weight, [n, h, w]) + pairwise_weight /= pairwise_weight.shape[-1] + return pairwise_weight + + def forward(self, x): + n, c, h, w = x.shape + g_x = paddle.reshape(self.g(x), [n, self.inter_channels, -1]) + g_x = paddle.transpose(g_x, [0, 2, 1]) + + if self.mode == 'gaussian': + theta_x = paddle.reshape(x, [n, self.inter_channels, -1]) + theta_x = paddle.transpose(theta_x, [0, 2, 1]) + if self.sub_sample: + phi_x = paddle.reshape( + self.phi(x), [n, self.inter_channels, -1]) + else: + phi_x = paddle.reshape(x, [n, self.in_channels, -1]) + + elif self.mode == 'concatenation': + theta_x = paddle.reshape( + self.theta(x), [n, self.inter_channels, -1, 1]) + phi_x = self.phi(x).view(n, self.inter_channels, 1, -1) + + else: + theta_x = paddle.reshape( + self.theta(x), [n, self.inter_channels, -1, 1]) + theta_x = paddle.transpose(theta_x, [0, 2, 1]) + phi_x = paddle.reshape(self.phi(x), [n, self.inter_channels, -1]) + + pairwise_func = getattr(self, self.mode) + pairwise_weight = pairwise_func(theta_x, phi_x) + y = paddle.matmul(pairwise_weight, g_x) + y = paddle.transpose(y, [0, 2, 1]) + y = paddle.reshape(y, [n, self.inter_channels, h, w]) + + output = x + self.conv_out(y) + + return output diff --git a/paddleseg/models/losses/bootstrapped_cross_entropy.py b/paddleseg/models/losses/bootstrapped_cross_entropy.py index 5ca95feb69..6443ccffec 100644 --- a/paddleseg/models/losses/bootstrapped_cross_entropy.py +++ b/paddleseg/models/losses/bootstrapped_cross_entropy.py @@ -38,9 +38,10 @@ def __init__(self, min_K, loss_th, weight=None, ignore_index=255): self.ignore_index = ignore_index self.K = min_K self.threshold = loss_th + if weight is not None: + weight = paddle.to_tensor(weight, dtype='float32') self.weight = weight - self.ignore_index = ignore_index - + def forward(self, logit, label): n, c, h, w = logit.shape @@ -55,7 +56,6 @@ def forward(self, logit, label): y = paddle.transpose(y, (0, 2, 3, 1)) x = paddle.reshape(x, shape=(-1, c)) y = paddle.reshape(y, shape=(-1, )) - loss = F.cross_entropy( x, y, diff --git a/paddleseg/models/u2net.py b/paddleseg/models/u2net.py index eaa956f022..2511e5b656 100644 --- a/paddleseg/models/u2net.py +++ b/paddleseg/models/u2net.py @@ -1,11 +1,264 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + import paddle import paddle.nn as nn import paddle.nn.functional as F -from paddleseg import utils + from paddleseg.cvlibs import manager +from paddleseg.models import layers +from paddleseg.utils import utils __all__ = ['U2Net', 'U2Netp'] +@manager.MODELS.add_component +class U2Net(nn.Layer): + """ + The U^2-Net implementation based on PaddlePaddle. + + The original article refers to + Xuebin Qin, et, al. "U^2-Net: Going Deeper with Nested U-Structure for Salient Object Detection" + (https://arxiv.org/abs/2005.09007). + + Args: + num_classes (int): The unique number of target classes. + in_ch (int, optional): Input channels. Default: 3. + pretrained (str, optional): The path or url of pretrained model for fine tuning. Default: None. + + """ + + def __init__(self, num_classes, in_ch=3, pretrained=None): + super(U2Net, self).__init__() + + self.stage1 = RSU7(in_ch, 32, 64) + self.pool12 = nn.MaxPool2D(2, stride=2, ceil_mode=True) + + self.stage2 = RSU6(64, 32, 128) + self.pool23 = nn.MaxPool2D(2, stride=2, ceil_mode=True) + + self.stage3 = RSU5(128, 64, 256) + self.pool34 = nn.MaxPool2D(2, stride=2, ceil_mode=True) + + self.stage4 = RSU4(256, 128, 512) + self.pool45 = nn.MaxPool2D(2, stride=2, ceil_mode=True) + + self.stage5 = RSU4F(512, 256, 512) + self.pool56 = nn.MaxPool2D(2, stride=2, ceil_mode=True) + + self.stage6 = RSU4F(512, 256, 512) + + # decoder + self.stage5d = RSU4F(1024, 256, 512) + self.stage4d = RSU4(1024, 128, 256) + self.stage3d = RSU5(512, 64, 128) + self.stage2d = RSU6(256, 32, 64) + self.stage1d = RSU7(128, 16, 64) + + self.side1 = nn.Conv2D(64, num_classes, 3, padding=1) + self.side2 = nn.Conv2D(64, num_classes, 3, padding=1) + self.side3 = nn.Conv2D(128, num_classes, 3, padding=1) + self.side4 = nn.Conv2D(256, num_classes, 3, padding=1) + self.side5 = nn.Conv2D(512, num_classes, 3, padding=1) + self.side6 = nn.Conv2D(512, num_classes, 3, padding=1) + + self.outconv = nn.Conv2D(6 * num_classes, num_classes, 1) + + self.pretrained = pretrained + self.init_weight() + + def forward(self, x): + + hx = x + + #stage 1 + hx1 = self.stage1(hx) + hx = self.pool12(hx1) + + #stage 2 + hx2 = self.stage2(hx) + hx = self.pool23(hx2) + + #stage 3 + hx3 = self.stage3(hx) + hx = self.pool34(hx3) + + #stage 4 + hx4 = self.stage4(hx) + hx = self.pool45(hx4) + + #stage 5 + hx5 = self.stage5(hx) + hx = self.pool56(hx5) + + #stage 6 + hx6 = self.stage6(hx) + hx6up = _upsample_like(hx6, hx5) + + #-------------------- decoder -------------------- + hx5d = self.stage5d(paddle.concat((hx6up, hx5), 1)) + hx5dup = _upsample_like(hx5d, hx4) + + hx4d = self.stage4d(paddle.concat((hx5dup, hx4), 1)) + hx4dup = _upsample_like(hx4d, hx3) + + hx3d = self.stage3d(paddle.concat((hx4dup, hx3), 1)) + hx3dup = _upsample_like(hx3d, hx2) + + hx2d = self.stage2d(paddle.concat((hx3dup, hx2), 1)) + hx2dup = _upsample_like(hx2d, hx1) + + hx1d = self.stage1d(paddle.concat((hx2dup, hx1), 1)) + + #side output + d1 = self.side1(hx1d) + + d2 = self.side2(hx2d) + d2 = _upsample_like(d2, d1) + + d3 = self.side3(hx3d) + d3 = _upsample_like(d3, d1) + + d4 = self.side4(hx4d) + d4 = _upsample_like(d4, d1) + + d5 = self.side5(hx5d) + d5 = _upsample_like(d5, d1) + + d6 = self.side6(hx6) + d6 = _upsample_like(d6, d1) + + d0 = self.outconv(paddle.concat((d1, d2, d3, d4, d5, d6), 1)) + + return [d0, d1, d2, d3, d4, d5, d6] + + def init_weight(self): + if self.pretrained is not None: + utils.load_entire_model(self, self.pretrained) + + +### U^2-Net small ### +@manager.MODELS.add_component +class U2Netp(nn.Layer): + """Please Refer to U2Net above.""" + + def __init__(self, num_classes, in_ch=3, pretrained=None): + super(U2Netp, self).__init__() + + self.stage1 = RSU7(in_ch, 16, 64) + self.pool12 = nn.MaxPool2D(2, stride=2, ceil_mode=True) + + self.stage2 = RSU6(64, 16, 64) + self.pool23 = nn.MaxPool2D(2, stride=2, ceil_mode=True) + + self.stage3 = RSU5(64, 16, 64) + self.pool34 = nn.MaxPool2D(2, stride=2, ceil_mode=True) + + self.stage4 = RSU4(64, 16, 64) + self.pool45 = nn.MaxPool2D(2, stride=2, ceil_mode=True) + + self.stage5 = RSU4F(64, 16, 64) + self.pool56 = nn.MaxPool2D(2, stride=2, ceil_mode=True) + + self.stage6 = RSU4F(64, 16, 64) + + # decoder + self.stage5d = RSU4F(128, 16, 64) + self.stage4d = RSU4(128, 16, 64) + self.stage3d = RSU5(128, 16, 64) + self.stage2d = RSU6(128, 16, 64) + self.stage1d = RSU7(128, 16, 64) + + self.side1 = nn.Conv2D(64, num_classes, 3, padding=1) + self.side2 = nn.Conv2D(64, num_classes, 3, padding=1) + self.side3 = nn.Conv2D(64, num_classes, 3, padding=1) + self.side4 = nn.Conv2D(64, num_classes, 3, padding=1) + self.side5 = nn.Conv2D(64, num_classes, 3, padding=1) + self.side6 = nn.Conv2D(64, num_classes, 3, padding=1) + + self.outconv = nn.Conv2D(6 * num_classes, num_classes, 1) + + self.pretrained = pretrained + self.init_weight() + + def forward(self, x): + + hx = x + + #stage 1 + hx1 = self.stage1(hx) + hx = self.pool12(hx1) + + #stage 2 + hx2 = self.stage2(hx) + hx = self.pool23(hx2) + + #stage 3 + hx3 = self.stage3(hx) + hx = self.pool34(hx3) + + #stage 4 + hx4 = self.stage4(hx) + hx = self.pool45(hx4) + + #stage 5 + hx5 = self.stage5(hx) + hx = self.pool56(hx5) + + #stage 6 + hx6 = self.stage6(hx) + hx6up = _upsample_like(hx6, hx5) + + #decoder + hx5d = self.stage5d(paddle.concat((hx6up, hx5), 1)) + hx5dup = _upsample_like(hx5d, hx4) + + hx4d = self.stage4d(paddle.concat((hx5dup, hx4), 1)) + hx4dup = _upsample_like(hx4d, hx3) + + hx3d = self.stage3d(paddle.concat((hx4dup, hx3), 1)) + hx3dup = _upsample_like(hx3d, hx2) + + hx2d = self.stage2d(paddle.concat((hx3dup, hx2), 1)) + hx2dup = _upsample_like(hx2d, hx1) + + hx1d = self.stage1d(paddle.concat((hx2dup, hx1), 1)) + + #side output + d1 = self.side1(hx1d) + + d2 = self.side2(hx2d) + d2 = _upsample_like(d2, d1) + + d3 = self.side3(hx3d) + d3 = _upsample_like(d3, d1) + + d4 = self.side4(hx4d) + d4 = _upsample_like(d4, d1) + + d5 = self.side5(hx5d) + d5 = _upsample_like(d5, d1) + + d6 = self.side6(hx6) + d6 = _upsample_like(d6, d1) + + d0 = self.outconv(paddle.concat((d1, d2, d3, d4, d5, d6), 1)) + + return [d0, d1, d2, d3, d4, d5, d6] + + def init_weight(self): + if self.pretrained is not None: + utils.load_entire_model(self, self.pretrained) class REBNCONV(nn.Layer): def __init__(self, in_ch=3, out_ch=3, dirate=1): @@ -317,243 +570,3 @@ def forward(self, x): hx1d = self.rebnconv1d(paddle.concat((hx2d, hx1), 1)) return hx1d + hxin - - -##### U^2-Net #### -@manager.MODELS.add_component -class U2Net(nn.Layer): - """ - The U^2-Net implementation based on PaddlePaddle. - - The original article refers to - Xuebin Qin, et, al. "U^2-Net: Going Deeper with Nested U-Structure for Salient Object Detection" - (https://arxiv.org/abs/2005.09007). - - Args: - num_classes (int): The unique number of target classes. - in_ch (int, optional): Input channels. Default: 3. - pretrained (str, optional): The path or url of pretrained model for fine tuning. Default: None. - - """ - - def __init__(self, num_classes, in_ch=3, pretrained=None): - super(U2Net, self).__init__() - - self.stage1 = RSU7(in_ch, 32, 64) - self.pool12 = nn.MaxPool2D(2, stride=2, ceil_mode=True) - - self.stage2 = RSU6(64, 32, 128) - self.pool23 = nn.MaxPool2D(2, stride=2, ceil_mode=True) - - self.stage3 = RSU5(128, 64, 256) - self.pool34 = nn.MaxPool2D(2, stride=2, ceil_mode=True) - - self.stage4 = RSU4(256, 128, 512) - self.pool45 = nn.MaxPool2D(2, stride=2, ceil_mode=True) - - self.stage5 = RSU4F(512, 256, 512) - self.pool56 = nn.MaxPool2D(2, stride=2, ceil_mode=True) - - self.stage6 = RSU4F(512, 256, 512) - - # decoder - self.stage5d = RSU4F(1024, 256, 512) - self.stage4d = RSU4(1024, 128, 256) - self.stage3d = RSU5(512, 64, 128) - self.stage2d = RSU6(256, 32, 64) - self.stage1d = RSU7(128, 16, 64) - - self.side1 = nn.Conv2D(64, num_classes, 3, padding=1) - self.side2 = nn.Conv2D(64, num_classes, 3, padding=1) - self.side3 = nn.Conv2D(128, num_classes, 3, padding=1) - self.side4 = nn.Conv2D(256, num_classes, 3, padding=1) - self.side5 = nn.Conv2D(512, num_classes, 3, padding=1) - self.side6 = nn.Conv2D(512, num_classes, 3, padding=1) - - self.outconv = nn.Conv2D(6 * num_classes, num_classes, 1) - - self.pretrained = pretrained - self.init_weight() - - def forward(self, x): - - hx = x - - #stage 1 - hx1 = self.stage1(hx) - hx = self.pool12(hx1) - - #stage 2 - hx2 = self.stage2(hx) - hx = self.pool23(hx2) - - #stage 3 - hx3 = self.stage3(hx) - hx = self.pool34(hx3) - - #stage 4 - hx4 = self.stage4(hx) - hx = self.pool45(hx4) - - #stage 5 - hx5 = self.stage5(hx) - hx = self.pool56(hx5) - - #stage 6 - hx6 = self.stage6(hx) - hx6up = _upsample_like(hx6, hx5) - - #-------------------- decoder -------------------- - hx5d = self.stage5d(paddle.concat((hx6up, hx5), 1)) - hx5dup = _upsample_like(hx5d, hx4) - - hx4d = self.stage4d(paddle.concat((hx5dup, hx4), 1)) - hx4dup = _upsample_like(hx4d, hx3) - - hx3d = self.stage3d(paddle.concat((hx4dup, hx3), 1)) - hx3dup = _upsample_like(hx3d, hx2) - - hx2d = self.stage2d(paddle.concat((hx3dup, hx2), 1)) - hx2dup = _upsample_like(hx2d, hx1) - - hx1d = self.stage1d(paddle.concat((hx2dup, hx1), 1)) - - #side output - d1 = self.side1(hx1d) - - d2 = self.side2(hx2d) - d2 = _upsample_like(d2, d1) - - d3 = self.side3(hx3d) - d3 = _upsample_like(d3, d1) - - d4 = self.side4(hx4d) - d4 = _upsample_like(d4, d1) - - d5 = self.side5(hx5d) - d5 = _upsample_like(d5, d1) - - d6 = self.side6(hx6) - d6 = _upsample_like(d6, d1) - - d0 = self.outconv(paddle.concat((d1, d2, d3, d4, d5, d6), 1)) - - return [d0, d1, d2, d3, d4, d5, d6] - - def init_weight(self): - if self.pretrained is not None: - utils.load_entire_model(self, self.pretrained) - - -### U^2-Net small ### -@manager.MODELS.add_component -class U2Netp(nn.Layer): - """Please Refer to U2Net above.""" - - def __init__(self, num_classes, in_ch=3, pretrained=None): - super(U2Netp, self).__init__() - - self.stage1 = RSU7(in_ch, 16, 64) - self.pool12 = nn.MaxPool2D(2, stride=2, ceil_mode=True) - - self.stage2 = RSU6(64, 16, 64) - self.pool23 = nn.MaxPool2D(2, stride=2, ceil_mode=True) - - self.stage3 = RSU5(64, 16, 64) - self.pool34 = nn.MaxPool2D(2, stride=2, ceil_mode=True) - - self.stage4 = RSU4(64, 16, 64) - self.pool45 = nn.MaxPool2D(2, stride=2, ceil_mode=True) - - self.stage5 = RSU4F(64, 16, 64) - self.pool56 = nn.MaxPool2D(2, stride=2, ceil_mode=True) - - self.stage6 = RSU4F(64, 16, 64) - - # decoder - self.stage5d = RSU4F(128, 16, 64) - self.stage4d = RSU4(128, 16, 64) - self.stage3d = RSU5(128, 16, 64) - self.stage2d = RSU6(128, 16, 64) - self.stage1d = RSU7(128, 16, 64) - - self.side1 = nn.Conv2D(64, num_classes, 3, padding=1) - self.side2 = nn.Conv2D(64, num_classes, 3, padding=1) - self.side3 = nn.Conv2D(64, num_classes, 3, padding=1) - self.side4 = nn.Conv2D(64, num_classes, 3, padding=1) - self.side5 = nn.Conv2D(64, num_classes, 3, padding=1) - self.side6 = nn.Conv2D(64, num_classes, 3, padding=1) - - self.outconv = nn.Conv2D(6 * num_classes, num_classes, 1) - - self.pretrained = pretrained - self.init_weight() - - def forward(self, x): - - hx = x - - #stage 1 - hx1 = self.stage1(hx) - hx = self.pool12(hx1) - - #stage 2 - hx2 = self.stage2(hx) - hx = self.pool23(hx2) - - #stage 3 - hx3 = self.stage3(hx) - hx = self.pool34(hx3) - - #stage 4 - hx4 = self.stage4(hx) - hx = self.pool45(hx4) - - #stage 5 - hx5 = self.stage5(hx) - hx = self.pool56(hx5) - - #stage 6 - hx6 = self.stage6(hx) - hx6up = _upsample_like(hx6, hx5) - - #decoder - hx5d = self.stage5d(paddle.concat((hx6up, hx5), 1)) - hx5dup = _upsample_like(hx5d, hx4) - - hx4d = self.stage4d(paddle.concat((hx5dup, hx4), 1)) - hx4dup = _upsample_like(hx4d, hx3) - - hx3d = self.stage3d(paddle.concat((hx4dup, hx3), 1)) - hx3dup = _upsample_like(hx3d, hx2) - - hx2d = self.stage2d(paddle.concat((hx3dup, hx2), 1)) - hx2dup = _upsample_like(hx2d, hx1) - - hx1d = self.stage1d(paddle.concat((hx2dup, hx1), 1)) - - #side output - d1 = self.side1(hx1d) - - d2 = self.side2(hx2d) - d2 = _upsample_like(d2, d1) - - d3 = self.side3(hx3d) - d3 = _upsample_like(d3, d1) - - d4 = self.side4(hx4d) - d4 = _upsample_like(d4, d1) - - d5 = self.side5(hx5d) - d5 = _upsample_like(d5, d1) - - d6 = self.side6(hx6) - d6 = _upsample_like(d6, d1) - - d0 = self.outconv(paddle.concat((d1, d2, d3, d4, d5, d6), 1)) - - return [d0, d1, d2, d3, d4, d5, d6] - - def init_weight(self): - if self.pretrained is not None: - utils.load_entire_model(self, self.pretrained) diff --git a/paddleseg/transforms/transforms.py b/paddleseg/transforms/transforms.py index 7f285ed340..52ba7a29f7 100644 --- a/paddleseg/transforms/transforms.py +++ b/paddleseg/transforms/transforms.py @@ -228,6 +228,71 @@ def __call__(self, im, label=None): return (im, label) +@manager.TRANSFORMS.add_component +class LimitLong: + """ + Limit the long edge of image. + + If the long edge is larger than max_long, resize the long edge + to max_long, while scale the short edge proportionally. + + If the long edge is smaller than min_long, resize the long edge + to min_long, while scale the short edge proportionally. + + Args: + max_long (int, optional): If the long edge of image is larger than max_long, + it will be resize to max_long. Default: None. + min_long (int, optional): If the long edge of image is smaller than min_long, + it will be resize to min_long. Default: None. + """ + + def __init__(self, max_long=None, min_long=None): + if max_long is not None: + if not isinstance(max_long, int): + raise TypeError( + "Type of `max_long` is invalid. It should be int, but it is {}" + .format(type(max_long))) + if min_long is not None: + if not isinstance(min_long, int): + raise TypeError( + "Type of `min_long` is invalid. It should be int, but it is {}" + .format(type(min_long))) + if (max_long is not None) and (min_long is not None): + if min_long > max_long: + raise ValueError( + '`max_long should not smaller than min_long, but they are {} and {}' + .format(max_long, min_long)) + self.max_long = max_long + self.min_long = min_long + + def __call__(self, im, label=None): + """ + Args: + im (np.ndarray): The Image data. + label (np.ndarray, optional): The label data. Default: None. + + Returns: + (tuple). When label is None, it returns (im, ), otherwise it returns (im, label). + """ + h, w = im.shape[0], im.shape[1] + long_edge = max(h, w) + target = long_edge + if (self.max_long is not None) and (long_edge > self.max_long): + target = self.max_long + elif (self.min_long is not None) and (long_edge < self.min_long): + target = self.min_long + + if target != long_edge: + im = functional.resize_long(im, target) + if label is not None: + label = functional.resize_long(label, target, cv2.INTER_NEAREST) + + if label is None: + return (im, ) + else: + return (im, label) + + @manager.TRANSFORMS.add_component class ResizeRangeScaling: """ diff --git a/paddleseg/utils/__init__.py b/paddleseg/utils/__init__.py index 4c5dc6d806..b11c17d4d8 100644 --- a/paddleseg/utils/__init__.py +++ b/paddleseg/utils/__init__.py @@ -17,5 +17,6 @@ from . import metrics from .env import seg_env, get_sys_env from .utils import * -from .timer import Timer, calculate_eta +from .timer import TimeAverager, calculate_eta from . import visualize +from .config_check import config_check diff --git a/paddleseg/utils/config_check.py b/paddleseg/utils/config_check.py new file mode 100644 index 0000000000..47a7049823 --- /dev/null +++ b/paddleseg/utils/config_check.py @@ -0,0 +1,59 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import numpy as np + + +def config_check(cfg, train_dataset=None, val_dataset=None): + """ + To check config。 + + Args: + cfg (paddleseg.cvlibs.Config): An object of paddleseg.cvlibs.Config. + train_dataset (paddle.io.Dataset): Used to read and process training datasets. + val_dataset (paddle.io.Dataset, optional): Used to read and process validation datasets. + """ + + num_classes_check(cfg, train_dataset, val_dataset) + + +def num_classes_check(cfg, train_dataset, val_dataset): + """" + Check that the num_classes in model, train_dataset and val_dataset is consistent. + """ + num_classes_set = set() + if train_dataset and hasattr(train_dataset, 'num_classes'): + num_classes_set.add(train_dataset.num_classes) + if val_dataset and hasattr(val_dataset, 'num_classes'): + num_classes_set.add(val_dataset.num_classes) + if cfg.dic.get('model', None) and cfg.dic['model'].get('num_classes', None): + num_classes_set.add(cfg.dic['model'].get('num_classes')) + if (not cfg.train_dataset) and (not cfg.val_dataset): + raise ValueError( + 'One of `train_dataset` or `val_dataset should be given, but there are none.' + ) + if len(num_classes_set) == 0: + raise ValueError( + '`num_classes` is not found. Please set it in model, train_dataset or val_dataset' + ) + elif len(num_classes_set) > 1: + raise ValueError( + '`num_classes` is not consistent: {}. Please set it consistently in model or train_dataset or val_dataset' + .format(num_classes_set)) + else: + num_classes = num_classes_set.pop() + if train_dataset: + train_dataset.num_classes = num_classes + if val_dataset: + val_dataset.num_classes = num_classes diff --git a/paddleseg/utils/timer.py b/paddleseg/utils/timer.py index 4478af62c9..f1fcbfa96b 100644 --- a/paddleseg/utils/timer.py +++ b/paddleseg/utils/timer.py @@ -15,37 +15,30 @@ import time -class Timer(object): - """ Simple timer class for measuring time consuming """ - +class TimeAverager(object): def __init__(self): - self._start_time = 0.0 - self._end_time = 0.0 - self._elapsed_time = 0.0 - self._is_running = False - - def start(self): - self._is_running = True - self._start_time = time.time() - - def restart(self): - self.start() - - def stop(self): - self._is_running = False - self._end_time = time.time() - - def elapsed_time(self): - self._end_time = time.time() - self._elapsed_time = self._end_time - self._start_time - if not self.is_running: - return 0.0 - - return self._elapsed_time - - @property - def is_running(self): - return self._is_running + self.reset() + + def reset(self): + self._cnt = 0 + self._total_time = 0 + self._total_samples = 0 + + def record(self, usetime, num_samples=None): + self._cnt += 1 + self._total_time += usetime + if num_samples: + self._total_samples += num_samples + + def get_average(self): + if self._cnt == 0: + return 0 + return self._total_time / float(self._cnt) + + def get_ips_average(self): + if not self._total_samples or self._cnt == 0: + return 0 + return float(self._total_samples) / self._total_time def calculate_eta(remaining_step, speed): diff --git a/predict.py b/predict.py index d262f04ad2..8ac2bb3b6d 100644 --- a/predict.py +++ b/predict.py @@ -18,7 +18,7 @@ import paddle from paddleseg.cvlibs import manager, Config -from paddleseg.utils import get_sys_env, logger +from paddleseg.utils import get_sys_env, logger, config_check from paddleseg.core import predict @@ -150,6 +150,8 @@ def main(args): transforms = val_dataset.transforms image_list, image_dir = get_image_list(args.image_path) + config_check(cfg, val_dataset=val_dataset) + predict( model, model_path=args.model_path, diff --git a/requirements.txt b/requirements.txt index 37707980fb..3abfb8979e 100644 --- a/requirements.txt +++ b/requirements.txt @@ -7,3 +7,4 @@ opencv-python tqdm filelock scipy +prettytable diff --git a/tools/convert_voc2010.py b/tools/convert_voc2010.py new file mode 100644 index 0000000000..779536f03c --- /dev/null +++ b/tools/convert_voc2010.py @@ -0,0 +1,135 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +""" +File: convert_voc2010.py +This file is based on https://www.cs.stanford.edu/~roozbeh/pascal-context/ to generate PASCAL-Context Dataset. +Before running, you should download the PASCAL VOC2010 from http://host.robots.ox.ac.uk/pascal/VOC/voc2010/VOCtrainval_03-May-2010.tar, PASCAL-Context Dataset from https://www.cs.stanford.edu/~roozbeh/pascal-context/ and annotation file from https://codalabuser.blob.core.windows.net/public/trainval_merged.json. Then, make the folder +structure as follow: +VOC2010 +| +|--Annotations +| +|--ImageSets +| +|--SegmentationClass +| +|--JPEGImages +| +|--SegmentationObject +| +|--trainval_merged.json +""" + +import os + +import argparse +import tqdm +import numpy as np +from detail import Detail +from PIL import Image + + +def parse_args(): + parser = argparse.ArgumentParser( + description= + 'Generate PASCAL-Context dataset' + ) + parser.add_argument( + '--voc_path', + dest='voc_path', + help='pascal voc path', + type=str) + parser.add_argument( + '--annotation_path', + dest='annotation_path', + help='pascal context annotation path', + type=str) + + return parser.parse_args() + + +class PascalContextGenerator(object): + def __init__(self, voc_path, annotation_path): + self.voc_path = voc_path + self.annotation_path = annotation_path + self.label_dir = os.path.join(self.voc_path, 'Context') + self._image_dir = os.path.join(self.voc_path, 'JPEGImages') + self.annFile = os.path.join(self.annotation_path, 'trainval_merged.json') + + if not os.path.exists(self.annFile): + _download_file(url=JSON_URL, savepath=self.annotation_path, print_progress=True) + + self._mapping = np.sort(np.array([ + 0, 2, 259, 260, 415, 324, 9, 258, 144, 18, 19, 22, + 23, 397, 25, 284, 158, 159, 416, 33, 162, 420, 454, 295, 296, + 427, 44, 45, 46, 308, 59, 440, 445, 31, 232, 65, 354, 424, + 68, 326, 72, 458, 34, 207, 80, 355, 85, 347, 220, 349, 360, + 98, 187, 104, 105, 366, 189, 368, 113, 115])) + self._key = np.array(range(len(self._mapping))).astype('uint8') - 1 + + self.train_detail = Detail(self.annFile, self._image_dir, 'train') + self.train_ids = self.train_detail.getImgs() + self.val_detail = Detail(self.annFile, self._image_dir, 'val') + self.val_ids = self.val_detail.getImgs() + + if not os.path.exists(self.label_dir): + os.makedirs(self.label_dir) + + def _class_to_index(self, mask, _mapping, _key): + # assert the values + values = np.unique(mask) + for i in range(len(values)): + assert (values[i] in _mapping) + index = np.digitize(mask.ravel(), _mapping, right=True) + return _key[index].reshape(mask.shape) + + def save_mask(self, img_id, mode): + if mode == 'train': + mask = Image.fromarray(self._class_to_index(self.train_detail.getMask(img_id), _mapping=self._mapping, _key=self._key)) + elif mode == 'val': + mask = Image.fromarray(self._class_to_index(self.val_detail.getMask(img_id), _mapping=self._mapping, _key=self._key)) + filename = img_id['file_name'] + basename, _ = os.path.splitext(filename) + if filename.endswith(".jpg"): + mask_png_name = basename + '.png' + mask.save(os.path.join(self.label_dir, mask_png_name)) + return basename + + def generate_label(self): + + with open(os.path.join(self.voc_path, 'ImageSets/Segmentation/train_context.txt'), 'w') as f: + for img_id in tqdm.tqdm(self.train_ids, desc='train'): + basename = self.save_mask(img_id, 'train') + f.writelines(''.join([basename, '\n'])) + + with open(os.path.join(self.voc_path, 'ImageSets/Segmentation/val_context.txt'), 'w') as f: + for img_id in tqdm.tqdm(self.val_ids, desc='val'): + basename = self.save_mask(img_id, 'val') + f.writelines(''.join([basename, '\n'])) + + with open(os.path.join(self.voc_path, 'ImageSets/Segmentation/trainval_context.txt'), 'w') as f: + for img in tqdm.tqdm(os.listdir(self.label_dir), desc='trainval'): + if img.endswith('.png'): + basename = img.split('.', 1)[0] + f.writelines(''.join([basename, '\n'])) + + +def main(): + args = parse_args() + generator = PascalContextGenerator(voc_path=args.voc_path, annotation_path=args.annotation_path) + generator.generate_label() + +if __name__ == '__main__': + main() diff --git a/train.py b/train.py index f9f4465d96..76be634c7c 100644 --- a/train.py +++ b/train.py @@ -17,7 +17,7 @@ import paddle from paddleseg.cvlibs import manager, Config -from paddleseg.utils import get_sys_env, logger +from paddleseg.utils import get_sys_env, logger, config_check from paddleseg.core import train @@ -115,9 +115,13 @@ def main(args): batch_size=args.batch_size) train_dataset = cfg.train_dataset - if not train_dataset: + if train_dataset is None: raise RuntimeError( 'The training dataset is not specified in the configuration file.') + elif len(train_dataset) == 0: + raise ValueError( + 'The length of train_dataset is 0. Please check if your dataset is valid' + ) val_dataset = cfg.val_dataset if args.do_eval else None losses = cfg.loss @@ -126,6 +130,8 @@ def main(args): msg += '------------------------------------------------' logger.info(msg) + config_check(cfg, train_dataset=train_dataset, val_dataset=val_dataset) + train( cfg.model, train_dataset, diff --git a/val.py b/val.py index cbc49d63cb..8a3f9c328b 100644 --- a/val.py +++ b/val.py @@ -19,7 +19,7 @@ from paddleseg.cvlibs import manager, Config from paddleseg.core import evaluate -from paddleseg.utils import get_sys_env, logger +from paddleseg.utils import get_sys_env, logger, config_check def parse_args(): @@ -102,10 +102,14 @@ def main(args): cfg = Config(args.cfg) val_dataset = cfg.val_dataset - if not val_dataset: + if val_dataset is None: raise RuntimeError( 'The verification dataset is not specified in the configuration file.' ) + elif len(val_dataset) == 0: + raise ValueError( + 'The length of val_dataset is 0. Please check if your dataset is valid' + ) msg = '\n---------------Config Information---------------\n' msg += str(cfg) @@ -118,6 +122,8 @@ def main(args): model.set_dict(para_state_dict) logger.info('Loaded trained params of model successfully') + config_check(cfg, val_dataset=val_dataset) + evaluate( model, val_dataset,