MobileviT

Catalogue

1. Overview
2. Accuracy, FLOPs and Parameters

1. Overview

MobileViT is a lightweight visual Transformer network that can be used as a general backbone network in the field of computer vision. MobileViT combines the advantages of CNN and Transformer, which can better deal with global features and local features, and better solve the problem of lack of inductive bias in Transformer models. , and finally, under the same amount of parameters, compared with other SOTA models, the tasks of image classification, object detection, and semantic segmentation have been greatly improved. Paper

2. Accuracy, FLOPs and Parameters

Models	Top1	Top5	Reference top1	Reference top5	FLOPs (M)	Params (M)
MobileViT_XXS	0.6867	0.8878	0.690	-	1849.35	5.59
MobileViT_XS	0.7454	0.9227	0.747	-	930.75	2.33
MobileViT_S	0.7814	0.9413	0.783	-	337.24	1.28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MobileViT_en.md

MobileViT_en.md

MobileviT

Catalogue

1. Overview

2. Accuracy, FLOPs and Parameters

Files

MobileViT_en.md

Latest commit

History

MobileViT_en.md

File metadata and controls

MobileviT

Catalogue

1. Overview

2. Accuracy, FLOPs and Parameters