You can load any of our pretrained model in 2 lines of code:
from super_gradients.training import models
from super_gradients.common.object_names import Models
model = models.get(Models.YOLOX_S, pretrained_weights="coco")
All the available models are listed in the column Model name
.
Model | Model name | Dataset | Resolution | Top-1 | Top-5 | Latency (HW)*T4 | Latency (Production)**T4 | Latency (HW)*Jetson Xavier NX | Latency (Production)**Jetson Xavier NX | Latency Cascade Lake | Torch Compile Support |
---|---|---|---|---|---|---|---|---|---|---|---|
ViT base | vit_base | ImageNet21K | 224x224 | 84.15 | - | 4.46ms | 4.60ms | - * | - | 57.22ms | Not Supported |
ViT large | vit_large | ImageNet21K | 224x224 | 85.64 | - | 12.81ms | 13.19ms | - * | - | 187.22ms | Not Supported |
BEiT | beit_base_patch16_224 | ImageNet21K | 224x224 | - | - | -ms | -ms | - * | - | -ms | Supported |
EfficientNet B0 | efficientnet_b0 | ImageNet | 224x224 | 77.62 | 93.49 | 0.93ms | 1.38ms | - * | - | 3.44ms | Supported |
RegNet Y200 | regnetY200 | ImageNet | 224x224 | 70.88 | 89.35 | 0.63ms | 1.08ms | 2.16ms | 2.47ms | 2.06ms | Supported |
RegNet Y400 | regnetY400 | ImageNet | 224x224 | 74.74 | 91.46 | 0.80ms | 1.25ms | 2.62ms | 2.91ms | 2.87ms | Supported |
RegNet Y600 | regnetY600 | ImageNet | 224x224 | 76.18 | 92.34 | 0.77ms | 1.22ms | 2.64ms | 2.93ms | 2.39ms | Supported |
RegNet Y800 | regnetY800 | ImageNet | 224x224 | 77.07 | 93.26 | 0.74ms | 1.19ms | 2.77ms | 3.04ms | 2.81ms | Supported |
ResNet 18 | resnet18 | ImageNet | 224x224 | 70.6 | 89.64 | 0.52ms | 0.95ms | 2.01ms | 2.30ms | 4.56ms | Supported |
ResNet 34 | resnet34 | ImageNet | 224x224 | 74.13 | 91.7 | 0.92ms | 1.34ms | 3.57ms | 3.87ms | 7.64ms | Supported |
ResNet 50 | resnet50 | ImageNet | 224x224 | 81.91 | 93.0 | 1.03ms | 1.44ms | 4.78ms | 5.10ms | 9.25ms | Supported |
MobileNet V3_large-300 epochs | mobilenet_v3_large | ImageNet | 224x224 | 74.52 | 91.92 | 0.67ms | 1.11ms | 2.42ms | 2.71ms | 1.76ms | Supported |
MobileNet V3_small | mobilenet_v3_small | ImageNet | 224x224 | 67.45 | 87.47 | 0.55ms | 0.96ms | 2.01ms * | 2.35ms | 1.06ms | Supported |
MobileNet V2_w1 | mobilenet_v2 | ImageNet | 224x224 | 73.08 | 91.1 | 0.46 ms | 0.89ms | 1.65ms * | 1.90ms | 1.56ms | Supported |
NOTE:
- Latency (HW)* - Hardware performance (not including IO)
- Latency (Production)** - Production Performance (including IO)
- Performance measured for T4 and Jetson Xavier NX with TensorRT, using FP16 precision and batch size 1
- Performance measured for Cascade Lake CPU with OpenVINO, using FP16 precision and batch size 1
Model | Model Name | Dataset | Resolution | mAPval 0.5:0.95 |
Latency (HW)*T4 | Latency (Production)**T4 | Latency (HW)*Jetson Xavier NX | Latency (Production)**Jetson Xavier NX | Latency Cascade Lake | Torch Compile Support |
---|---|---|---|---|---|---|---|---|---|---|
YOLO-NAS S | yolo_nas_s | COCO | 640x640 | 47.5(FP16) 47.03(INT8) | 3.21(FP16) 2.36(INT8) | Supported | ||||
YOLO-NAS M | yolo_nas_m | COCO | 640x640 | 51.55(FP16) 51.0(INT8) | 5.85(FP16) 3.78(INT8) | Supported | ||||
YOLO-NAS L | yolo_nas_l | COCO | 640x640 | 52.22(FP16) 52.1(INT8) | 7.87(FP16) 4.78(INT8) | Supported | ||||
PP-YOLOE small | ppyoloe_s | COCO | 640x640 | 42.52 | 2.39ms | 4.3ms | 14.28ms | 14.99ms | - | Not Supported |
PP-YOLOE medium | ppyoloe_m | COCO | 640x640 | 47.11 | 5.16ms | 7.05ms | 32.71ms | 33.46ms | - | Not Supported |
PP-YOLOE large | ppyoloe_l | COCO | 640x640 | 49.48 | 7.65ms | 9.59ms | 51.13ms | 50.39ms | - | Not Supported |
PP-YOLOE x-large | ppyoloe_x | COCO | 640x640 | 51.15 | 14.04ms | 15.96ms | 94.92ms | 94.22ms | - | Not Supported |
YOLOX nano | yolox_n | COCO | 640x640 | 26.77 | 2.47ms | 4.09ms | 11.49ms | 12.97ms | - | Not Supported |
YOLOX tiny | yolox_t | COCO | 640x640 | 37.18 | 3.16ms | 4.61ms | 15.23ms | 19.24ms | - | Not Supported |
YOLOX small | yolox_s | COCO | 640x640 | 40.47 | 3.58ms | 4.94ms | 18.88ms | 22.48ms | - | Not Supported |
YOLOX medium | yolox_m | COCO | 640x640 | 46.4 | 6.40ms | 7.65ms | 39.22ms | 44.5ms | - | Not Supported |
YOLOX large | yolox_l | COCO | 640x640 | 49.25 | 10.07ms | 11.12ms | 68.73ms | 77.01ms | - | Not Supported |
SSD lite MobileNet v2 | ssd_lite_mobilenet_v2 | COCO | 320x320 | 21.5 | 0.77ms | 1.40ms | 5.28ms | 6.44ms | 4.13ms | Not Supported |
SSD lite MobileNet v1 | ssd_mobilenet_v1 | COCO | 320x320 | 24.3 | 1.55ms | 2.84ms | 8.07ms | 9.14ms | 22.76ms | Not Supported |
NOTE:
- Latency (HW)* - Hardware performance (not including IO)
- Latency (Production)** - Production Performance (including IO)
- Latency performance measured for T4 and Jetson Xavier NX with TensorRT, using FP16 precision and batch size 1
- Latency performance measured for Cascade Lake CPU with OpenVINO, using FP16 precision and batch size 1
Model | Model Name | Dataset | Resolution | mIoU | Latency b1T4 | Latency b1T4 including IO | Latency (Production)**Jetson Xavier NX | Torch Compile Support |
---|---|---|---|---|---|---|---|---|
PP-LiteSeg B50 | pp_lite_b_seg50 | Cityscapes | 512x1024 | 76.48 | 4.18ms | 31.22ms | 31.69ms | Supported |
PP-LiteSeg B75 | pp_lite_b_seg75 | Cityscapes | 768x1536 | 78.52 | 6.84ms | 33.69ms | 49.89ms | Supported |
PP-LiteSeg T50 | pp_lite_t_seg50 | Cityscapes | 512x1024 | 74.92 | 3.26ms | 30.33ms | 26.20ms | Supported |
PP-LiteSeg T75 | pp_lite_t_seg75 | Cityscapes | 768x1536 | 77.56 | 5.20ms | 32.28ms | 38.03ms | Supported |
DDRNet 23 slim | ddrnet_23_slim | Cityscapes | 1024x2048 | 79.41 | 5.74ms | 32.01ms | 45.18ms | Supported |
DDRNet 23 | ddrnet_23 | Cityscapes | 1024x2048 | 81.48 | 12.74ms | 39.01ms | 106.26ms | Supported |
DDRNet 39 | ddrnet_39 | Cityscapes | 1024x2048 | 81.32 | 23.57ms | 52.41ms | 145.79ms | Supported |
STDC 1-Seg50 | stdc1_seg50 | Cityscapes | 512x1024 | 75.11 | 3.34ms | 30.12ms | 27.54ms | Supported |
STDC 1-Seg75 | stdc1_seg75 | Cityscapes | 768x1536 | 77.8 | 5.53ms | 32.490ms | 43.88 | Supported |
STDC 2-Seg50 | stdc2_seg50 | Cityscapes | 512x1024 | 76.44 | 4.12ms | 30.94ms | 32.03ms | Supported |
STDC 2-Seg75 | stdc2_seg75 | Cityscapes | 768x1536 | 78.93 | 6.95ms | 33.89ms | 54.48ms | Supported |
RegSeg (exp48) | regseg48 | Cityscapes | 1024x2048 | 78.15 | 12.03ms | 38.91ms | 78.20ms | Supported |
NOTE:
- Performance measured on T4 GPU with TensorRT, using FP16 precision and batch size 1 (latency), and not including IO
- For resolutions below 1024x2048 we first resize the input to the inference resolution and then resize the predictions to 1024x2048. The time of resizing is included in the measurements so that the practical input-size is 1024x2048.
- DDRNet23 and DDRNet23_Slim results were achieved with channel wise knowledge distillation training recipe.
Model | Model Name | Dataset | Resolution | AP (No TTA / H-Flip TTA / H-Flip TTA+Rescoring) | Latency b1T4 | Latency b1T4 including IO | Latency (Production)**Jetson Xavier NX |
---|---|---|---|---|---|---|---|
DEKR_W32_NO_DC | dekr_w32_no_dc | COCO2017 PE | 640x640 | 63.08 / 64.96 / 67.32 | 13.29 ms | 15.31 ms | 75.99 ms |
YoloNAS POSE N | yolo_nas_pose_n | COCO2017 PE | 640x640 | 59.68 / N/A / N/A | N/A | 2.35 ms | 15.99 ms |
YoloNAS POSE S | yolo_nas_pose_s | COCO2017 PE | 640x640 | 64.15 / N/A / N/A | N/A | 3.29 ms | 21.01 ms |
YoloNAS POSE M | yolo_nas_pose_m | COCO2017 PE | 640x640 | 67.87 / N/A / N/A | N/A | 6.87 ms | 38.40 ms |
YoloNAS POSE L | yolo_nas_pose_l | COCO2017 PE | 640x640 | 68.24 / N/A / N/A | N/A | 8.86 ms | 49.34 ms |
- DensNet (Densely Connected Convolutional Networks) - Densely Connected Convolutional Networks https://arxiv.org/pdf/1608.06993.pdf
- DPN - Dual Path Networks https://arxiv.org/pdf/1707.01629
- EfficientNet - https://arxiv.org/abs/1905.11946
- GoogleNet - https://arxiv.org/pdf/1409.4842
- LeNet - https://yann.lecun.com/exdb/lenet/
- MobileNet - Efficient Convolutional Neural Networks for Mobile Vision Applications https://arxiv.org/pdf/1704.04861
- MobileNet v2 - https://arxiv.org/pdf/1801.04381
- MobileNet v3 - https://arxiv.org/pdf/1905.02244
- PNASNet - Progressive Neural Architecture Search Networks https://arxiv.org/pdf/1712.00559
- Pre-activation ResNet - https://arxiv.org/pdf/1603.05027
- RegNet - https://arxiv.org/pdf/2003.13678.pdf
- RepVGG - Making VGG-style ConvNets Great Again https://arxiv.org/pdf/2101.03697.pdf
- ResNet - Deep Residual Learning for Image Recognition https://arxiv.org/pdf/1512.03385
- ResNeXt - Aggregated Residual Transformations for Deep Neural Networks https://arxiv.org/pdf/1611.05431
- SENet - Squeeze-and-Excitation Networkshttps://arxiv.org/pdf/1709.01507
- ShuffleNet - https://arxiv.org/pdf/1707.01083
- ShuffleNet v2 - Efficient Convolutional Neural Network for Mobile Deviceshttps://arxiv.org/pdf/1807.11164
- VGG - Very Deep Convolutional Networks for Large-scale Image Recognition https://arxiv.org/pdf/1409.1556
- CSP DarkNet
- DarkNet-53
- SSD (Single Shot Detector) - https://arxiv.org/pdf/1512.02325
- YOLOX - https://arxiv.org/abs/2107.08430
- PP-YoloE - https://arxiv.org/abs/2203.16250
- PP-LiteSeg - https://arxiv.org/pdf/2204.02681v1.pdf
- DDRNet (Deep Dual-resolution Networks) - https://arxiv.org/pdf/2101.06085.pdf
- LadderNet - Multi-path networks based on U-Net for medical image segmentation https://arxiv.org/pdf/1810.07810
- RegSeg - Rethink Dilated Convolution for Real-time Semantic Segmentation https://arxiv.org/pdf/2111.09957
- ShelfNet - https://arxiv.org/pdf/1811.11254
- STDC - Rethinking BiSeNet For Real-time Semantic Segmentation https://arxiv.org/pdf/2104.13188
- HRNet DEKR - Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression https://arxiv.org/pdf/2104.02300.pdf
- YoloNAS Pose