Pyramid Scene Parsing Network
Official Repo
Code Snippet
Scene parsing is challenging for unrestricted open vocabulary and diverse scenes. In this paper, we exploit the capability of global context information by different-region-based context aggregation through our pyramid pooling module together with the proposed pyramid scene parsing network (PSPNet). Our global prior representation is effective to produce good quality results on the scene parsing task, while PSPNet provides a superior framework for pixel-level prediction tasks. The proposed approach achieves state-of-the-art performance on various datasets. It came first in ImageNet scene parsing challenge 2016, PASCAL VOC 2012 benchmark and Cityscapes benchmark. A single PSPNet yields new record of mIoU accuracy 85.4% on PASCAL VOC 2012 and accuracy 80.2% on Cityscapes.
@inproceedings {zhao2017pspnet ,
title ={ Pyramid Scene Parsing Network} ,
author ={ Zhao, Hengshuang and Shi, Jianping and Qi, Xiaojuan and Wang, Xiaogang and Jia, Jiaya} ,
booktitle ={ CVPR} ,
year ={ 2017}
}
@article {wightman2021resnet ,
title ={ Resnet strikes back: An improved training procedure in timm} ,
author ={ Wightman, Ross and Touvron, Hugo and J{\'e}gou, Herv{\'e}} ,
journal ={ arXiv preprint arXiv:2110.00476} ,
year ={ 2021}
}
Method
Backbone
Crop Size
Lr schd
Mem (GB)
Inf time (fps)
mIoU
mIoU(ms+flip)
config
download
PSPNet
R-50-D8
512x1024
40000
6.1
4.07
77.85
79.18
config
model | log
PSPNet
R-101-D8
512x1024
40000
9.6
2.68
78.34
79.74
config
model | log
PSPNet
R-50-D8
769x769
40000
6.9
1.76
78.26
79.88
config
model | log
PSPNet
R-101-D8
769x769
40000
10.9
1.15
79.08
80.28
config
model | log
PSPNet
R-18-D8
512x1024
80000
1.7
15.71
74.87
76.04
config
model | log
PSPNet
R-50-D8
512x1024
80000
-
-
78.55
79.79
config
model | log
PSPNet
R-50b-D8 rsb
512x1024
80000
6.2
3.82
78.47
79.45
config
model | log
PSPNet
R-101-D8
512x1024
80000
-
-
79.76
81.01
config
model | log
PSPNet (FP16)
R-101-D8
512x1024
80000
5.34
8.77
79.46
-
config
model | log
PSPNet
R-18-D8
769x769
80000
1.9
6.20
75.90
77.86
config
model | log
PSPNet
R-50-D8
769x769
80000
-
-
79.59
80.69
config
model | log
PSPNet
R-101-D8
769x769
80000
-
-
79.77
81.06
config
model | log
PSPNet
R-18b-D8
512x1024
80000
1.5
16.28
74.23
75.79
config
model | log
PSPNet
R-50b-D8
512x1024
80000
6.0
4.30
78.22
79.46
config
model | log
PSPNet
R-101b-D8
512x1024
80000
9.5
2.76
79.69
80.79
config
model | log
PSPNet
R-18b-D8
769x769
80000
1.7
6.41
74.92
76.90
config
model | log
PSPNet
R-50b-D8
769x769
80000
6.8
1.88
78.50
79.96
config
model | log
PSPNet
R-101b-D8
769x769
80000
10.8
1.17
78.87
80.04
config
model | log
PSPNet
R-50-D32
512x1024
80000
3.0
15.21
73.88
76.85
config
model | log
PSPNet
R-50b-D32 rsb
512x1024
80000
3.1
16.08
74.09
77.18
config
model | log
PSPNet
R-50b-D32
512x1024
80000
2.9
15.41
72.61
75.51
config
model | log
Method
Backbone
Crop Size
Lr schd
Mem (GB)
Inf time (fps)
mIoU
mIoU(ms+flip)
config
download
PSPNet
R-50-D8
512x512
80000
8.5
23.53
41.13
41.94
config
model | log
PSPNet
R-101-D8
512x512
80000
12
15.30
43.57
44.35
config
model | log
PSPNet
R-50-D8
512x512
160000
-
-
42.48
43.44
config
model | log
PSPNet
R-101-D8
512x512
160000
-
-
44.39
45.35
config
model | log
Method
Backbone
Crop Size
Lr schd
Mem (GB)
Inf time (fps)
mIoU
mIoU(ms+flip)
config
download
PSPNet
R-50-D8
512x512
20000
6.1
23.59
76.78
77.61
config
model | log
PSPNet
R-101-D8
512x512
20000
9.6
15.02
78.47
79.25
config
model | log
PSPNet
R-50-D8
512x512
40000
-
-
77.29
78.48
config
model | log
PSPNet
R-101-D8
512x512
40000
-
-
78.52
79.57
config
model | log
Method
Backbone
Crop Size
Lr schd
Mem (GB)
Inf time (fps)
mIoU
mIoU(ms+flip)
config
download
PSPNet
R-101-D8
480x480
40000
8.8
9.68
46.60
47.78
config
model | log
PSPNet
R-101-D8
480x480
80000
-
-
46.03
47.15
config
model | log
Method
Backbone
Crop Size
Lr schd
Mem (GB)
Inf time (fps)
mIoU
mIoU(ms+flip)
config
download
PSPNet
R-101-D8
480x480
40000
-
-
52.02
53.54
config
model | log
PSPNet
R-101-D8
480x480
80000
-
-
52.47
53.99
config
model | log
Dark Zurich and Nighttime Driving
We support evaluation results on these two datasets using models above trained on Cityscapes training set.
Method
Backbone
Training Dataset
Test Dataset
mIoU
config
evaluation checkpoint
PSPNet
R-50-D8
Cityscapes Training set
Dark Zurich
10.91
config
model | log
PSPNet
R-50-D8
Cityscapes Training set
Nighttime Driving
23.02
config
model | log
PSPNet
R-50-D8
Cityscapes Training set
Cityscapes Validation set
77.85
config
model | log
PSPNet
R-101-D8
Cityscapes Training set
Dark Zurich
10.16
config
model | log
PSPNet
R-101-D8
Cityscapes Training set
Nighttime Driving
20.25
config
model | log
PSPNet
R-101-D8
Cityscapes Training set
Cityscapes Validation set
78.34
config
model | log
PSPNet
R-101b-D8
Cityscapes Training set
Dark Zurich
15.54
config
model | log
PSPNet
R-101b-D8
Cityscapes Training set
Nighttime Driving
22.25
config
model | log
PSPNet
R-101b-D8
Cityscapes Training set
Cityscapes Validation set
79.69
config
model | log
Method
Backbone
Crop Size
Lr schd
Mem (GB)
Inf time (fps)
mIoU
mIoU(ms+flip)
config
download
PSPNet
R-50-D8
512x512
20000
9.6
20.5
35.69
36.62
config
model | log
PSPNet
R-101-D8
512x512
20000
13.2
11.1
37.26
38.52
config
model | log
PSPNet
R-50-D8
512x512
40000
-
-
36.33
37.24
config
model | log
PSPNet
R-101-D8
512x512
40000
-
-
37.76
38.86
config
model | log
Method
Backbone
Crop Size
Lr schd
Mem (GB)
Inf time (fps)
mIoU
mIoU(ms+flip)
config
download
PSPNet
R-50-D8
512x512
80000
9.6
20.5
38.80
39.19
config
model | log
PSPNet
R-101-D8
512x512
80000
13.2
11.1
40.34
40.79
config
model | log
PSPNet
R-50-D8
512x512
160000
-
-
39.64
39.97
config
model | log
PSPNet
R-101-D8
512x512
160000
-
-
41.28
41.66
config
model | log
PSPNet
R-50-D8
512x512
320000
-
-
40.53
40.75
config
model | log
PSPNet
R-101-D8
512x512
320000
-
-
41.95
42.42
config
model | log
Method
Backbone
Crop Size
Lr schd
Mem (GB)
Inf time (fps)
mIoU
mIoU(ms+flip)
config
download
PSPNet
R-18-D8
512x512
80000
1.45
26.87
48.62
47.57
config
model | log
PSPNet
R-50-D8
512x512
80000
6.14
6.60
50.46
50.19
config
model | log
PSPNet
R-101-D8
512x512
80000
9.61
4.58
51.86
51.34
config
model | log
Method
Backbone
Crop Size
Lr schd
Mem (GB)
Inf time (fps)
mIoU
mIoU(ms+flip)
config
download
PSPNet
R-18-D8
512x512
80000
1.50
85.12
77.09
78.30
config
model | log
PSPNet
R-50-D8
512x512
80000
6.14
30.21
78.12
78.98
config
model | log
PSPNet
R-101-D8
512x512
80000
9.61
19.40
78.62
79.47
config
model | log
Method
Backbone
Crop Size
Lr schd
Mem (GB)
Inf time (fps)
mIoU
mIoU(ms+flip)
config
download
PSPNet
R-18-D8
512x512
80000
1.45
85.06
71.46
73.36
config
model | log
PSPNet
R-50-D8
512x512
80000
6.14
30.29
72.36
73.75
config
model | log
PSPNet
R-101-D8
512x512
80000
9.61
19.97
72.61
74.18
config
model | log
Method
Backbone
Crop Size
Lr schd
Mem (GB)
Inf time (fps)
mIoU
mIoU(ms+flip)
config
download
PSPNet
R-18-D8
896x896
80000
4.52
26.91
60.22
61.25
config
model | log
PSPNet
R-50-D8
896x896
80000
16.58
8.88
65.36
66.48
config
model | log
Note:
FP16
means Mixed Precision (FP16) is adopted in training.
896x896
is the Crop Size of iSAID dataset, which is followed by the implementation of PointFlow: Flowing Semantics Through Points for Aerial Image Segmentation
rsb
is short for 'Resnet strikes back'.
The b
in R-50b
means ResNetV1b, which is a standard ResNet backbone. In MMSegmentation, default backbone is ResNetV1c, which usually performs better in semantic segmentation task.