This repository is the official PyTorch implementation of TransNeXt for COCO object detection.
Our code is built on MMDetection. The Mask R-CNN method is built on MMDetection version 2.28.2, while the DINO method is built on MMDetection version 3.0.0.
Since MMDetection is no longer compatible with the previous version of the configuration file format after 3.0.0,
different environments need to be built for the two methods. The requirements.txt
can be found in their respective
folders.
COCO object detection and instance segmentation results using the Mask R-CNN method:
Backbone | Pretrained Model | Lr Schd | box mAP | mask mAP | #Params | Download | Config | Log |
---|---|---|---|---|---|---|---|---|
TransNeXt-Tiny | ImageNet-1K | 1x | 49.9 | 44.6 | 47.9M | model | config | log |
TransNeXt-Small | ImageNet-1K | 1x | 51.1 | 45.5 | 69.3M | model | config | log |
TransNeXt-Base | ImageNet-1K | 1x | 51.7 | 45.9 | 109.2M | model | config | log |
- When we checked the training logs, we found that the mask mAP and other detailed performance of the Mask R-CNN using the TransNeXt-Tiny backbone were even better than reported in the paper (versions V1 and V2). We have already fixed this in version V3 (it should be a data entry error).
COCO object detection results using the DINO method:
Backbone | Pretrained Model | scales | epochs | box mAP | #Params | Download | Config | Log |
---|---|---|---|---|---|---|---|---|
TransNeXt-Tiny | ImageNet-1K | 4scale | 12 | 55.1 | 47.8M | model | config | log |
TransNeXt-Tiny | ImageNet-1K | 5scale | 12 | 55.7 | 48.1M | model | config | log |
TransNeXt-Small | ImageNet-1K | 5scale | 12 | 56.6 | 69.6M | model | config | log |
TransNeXt-Base | ImageNet-1K | 5scale | 12 | 57.1 | 110M | model | config | log |
The code & tutorial for the Mask R-CNN method are >> here <<
The code & tutorial for the DINO method are >> here <<
The released script for Object Detection with TransNeXt is built based on the MMDetection and timm library.
This project is released under the Apache 2.0 license. Please see the LICENSE file for more information.
If you find our work helpful, please consider citing the following bibtex. We would greatly appreciate a star for this project.
@InProceedings{shi2023transnext,
author = {Dai Shi},
title = {TransNeXt: Robust Foveal Visual Perception for Vision Transformers},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2024},
pages = {17773-17783}
}