PVTV2

Content

1. Overview
2. Accuracy, FLOPs and Parameters

1. Overview

PVTV2 is VisionTransformer series model, which build on PVT (Pyramid Vision Transformer). PVT use Transformer block to build feature pyramid network. The mainly designs of PVTV2 are: (1) overlapping patch embedding, (2) convolutional feedforward networks, and (3) linear complexity attention layers. Paper.

2. Accuracy, FLOPs and Parameters

Models	Top1	Top5	Reference top1	Reference top5	FLOPS (G)	Params (M)
PVT_V2_B0	0.705	0.902	0.705	-	0.53	3.7
PVT_V2_B1	0.787	0.945	0.787	-	2.0	14.0
PVT_V2_B2	0.821	0.960	0.820	-	3.9	25.4
PVT_V2_B3	0.831	0.965	0.831	-	6.7	45.2
PVT_V2_B4	0.836	0.967	0.836	-	9.8	62.6
PVT_V2_B5	0.837	0.966	0.838	-	11.4	82.0
PVT_V2_B2_Linear	0.821	0.961	0.821	-	3.8	22.6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PVTV2_en.md

PVTV2_en.md

PVTV2

Content

1. Overview

2. Accuracy, FLOPs and Parameters

Files

PVTV2_en.md

Latest commit

History

PVTV2_en.md

File metadata and controls

PVTV2

Content

1. Overview

2. Accuracy, FLOPs and Parameters