Skip to content

Latest commit

 

History

History
150 lines (117 loc) · 23.3 KB

model_zoo.md

File metadata and controls

150 lines (117 loc) · 23.3 KB

Model Zoo

Computer Vision Models - Pretrained Checkpoints

You can load any of our pretrained model in 2 lines of code:

from super_gradients.training import models
from super_gradients.common.object_names import Models

model = models.get(Models.YOLOX_S, pretrained_weights="coco")

All the available models are listed in the column Model name.

Pretrained Classification PyTorch Checkpoints

Model Model name Dataset Resolution Top-1 Top-5 Latency (HW)*T4 Latency (Production)**T4 Latency (HW)*Jetson Xavier NX Latency (Production)**Jetson Xavier NX Latency Cascade Lake Torch Compile Support
ViT base vit_base ImageNet21K 224x224 84.15 - 4.46ms 4.60ms - * - 57.22ms Not Supported
ViT large vit_large ImageNet21K 224x224 85.64 - 12.81ms 13.19ms - * - 187.22ms Not Supported
BEiT beit_base_patch16_224 ImageNet21K 224x224 - - -ms -ms - * - -ms Supported
EfficientNet B0 efficientnet_b0 ImageNet 224x224 77.62 93.49 0.93ms 1.38ms - * - 3.44ms Supported
RegNet Y200 regnetY200 ImageNet 224x224 70.88 89.35 0.63ms 1.08ms 2.16ms 2.47ms 2.06ms Supported
RegNet Y400 regnetY400 ImageNet 224x224 74.74 91.46 0.80ms 1.25ms 2.62ms 2.91ms 2.87ms Supported
RegNet Y600 regnetY600 ImageNet 224x224 76.18 92.34 0.77ms 1.22ms 2.64ms 2.93ms 2.39ms Supported
RegNet Y800 regnetY800 ImageNet 224x224 77.07 93.26 0.74ms 1.19ms 2.77ms 3.04ms 2.81ms Supported
ResNet 18 resnet18 ImageNet 224x224 70.6 89.64 0.52ms 0.95ms 2.01ms 2.30ms 4.56ms Supported
ResNet 34 resnet34 ImageNet 224x224 74.13 91.7 0.92ms 1.34ms 3.57ms 3.87ms 7.64ms Supported
ResNet 50 resnet50 ImageNet 224x224 81.91 93.0 1.03ms 1.44ms 4.78ms 5.10ms 9.25ms Supported
MobileNet V3_large-300 epochs mobilenet_v3_large ImageNet 224x224 74.52 91.92 0.67ms 1.11ms 2.42ms 2.71ms 1.76ms Supported
MobileNet V3_small mobilenet_v3_small ImageNet 224x224 67.45 87.47 0.55ms 0.96ms 2.01ms * 2.35ms 1.06ms Supported
MobileNet V2_w1 mobilenet_v2 ImageNet 224x224 73.08 91.1 0.46 ms 0.89ms 1.65ms * 1.90ms 1.56ms Supported

NOTE:

  • Latency (HW)* - Hardware performance (not including IO)
  • Latency (Production)** - Production Performance (including IO)
  • Performance measured for T4 and Jetson Xavier NX with TensorRT, using FP16 precision and batch size 1
  • Performance measured for Cascade Lake CPU with OpenVINO, using FP16 precision and batch size 1

Pretrained Object Detection PyTorch Checkpoints

Model Model Name Dataset Resolution mAPval
0.5:0.95
Latency (HW)*T4 Latency (Production)**T4 Latency (HW)*Jetson Xavier NX Latency (Production)**Jetson Xavier NX Latency Cascade Lake Torch Compile Support
YOLO-NAS S yolo_nas_s COCO 640x640 47.5(FP16) 47.03(INT8) 3.21(FP16) 2.36(INT8) Supported
YOLO-NAS M yolo_nas_m COCO 640x640 51.55(FP16) 51.0(INT8) 5.85(FP16) 3.78(INT8) Supported
YOLO-NAS L yolo_nas_l COCO 640x640 52.22(FP16) 52.1(INT8) 7.87(FP16) 4.78(INT8) Supported
PP-YOLOE small ppyoloe_s COCO 640x640 42.52 2.39ms 4.3ms 14.28ms 14.99ms - Not Supported
PP-YOLOE medium ppyoloe_m COCO 640x640 47.11 5.16ms 7.05ms 32.71ms 33.46ms - Not Supported
PP-YOLOE large ppyoloe_l COCO 640x640 49.48 7.65ms 9.59ms 51.13ms 50.39ms - Not Supported
PP-YOLOE x-large ppyoloe_x COCO 640x640 51.15 14.04ms 15.96ms 94.92ms 94.22ms - Not Supported
YOLOX nano yolox_n COCO 640x640 26.77 2.47ms 4.09ms 11.49ms 12.97ms - Not Supported
YOLOX tiny yolox_t COCO 640x640 37.18 3.16ms 4.61ms 15.23ms 19.24ms - Not Supported
YOLOX small yolox_s COCO 640x640 40.47 3.58ms 4.94ms 18.88ms 22.48ms - Not Supported
YOLOX medium yolox_m COCO 640x640 46.4 6.40ms 7.65ms 39.22ms 44.5ms - Not Supported
YOLOX large yolox_l COCO 640x640 49.25 10.07ms 11.12ms 68.73ms 77.01ms - Not Supported
SSD lite MobileNet v2 ssd_lite_mobilenet_v2 COCO 320x320 21.5 0.77ms 1.40ms 5.28ms 6.44ms 4.13ms Not Supported
SSD lite MobileNet v1 ssd_mobilenet_v1 COCO 320x320 24.3 1.55ms 2.84ms 8.07ms 9.14ms 22.76ms Not Supported

NOTE:

  • Latency (HW)* - Hardware performance (not including IO)
  • Latency (Production)** - Production Performance (including IO)
  • Latency performance measured for T4 and Jetson Xavier NX with TensorRT, using FP16 precision and batch size 1
  • Latency performance measured for Cascade Lake CPU with OpenVINO, using FP16 precision and batch size 1

Pretrained Semantic Segmentation PyTorch Checkpoints

Model Model Name Dataset Resolution mIoU Latency b1T4 Latency b1T4 including IO Latency (Production)**Jetson Xavier NX Torch Compile Support
PP-LiteSeg B50 pp_lite_b_seg50 Cityscapes 512x1024 76.48 4.18ms 31.22ms 31.69ms Supported
PP-LiteSeg B75 pp_lite_b_seg75 Cityscapes 768x1536 78.52 6.84ms 33.69ms 49.89ms Supported
PP-LiteSeg T50 pp_lite_t_seg50 Cityscapes 512x1024 74.92 3.26ms 30.33ms 26.20ms Supported
PP-LiteSeg T75 pp_lite_t_seg75 Cityscapes 768x1536 77.56 5.20ms 32.28ms 38.03ms Supported
DDRNet 23 slim ddrnet_23_slim Cityscapes 1024x2048 79.41 5.74ms 32.01ms 45.18ms Supported
DDRNet 23 ddrnet_23 Cityscapes 1024x2048 81.48 12.74ms 39.01ms 106.26ms Supported
DDRNet 39 ddrnet_39 Cityscapes 1024x2048 81.32 23.57ms 52.41ms 145.79ms Supported
STDC 1-Seg50 stdc1_seg50 Cityscapes 512x1024 75.11 3.34ms 30.12ms 27.54ms Supported
STDC 1-Seg75 stdc1_seg75 Cityscapes 768x1536 77.8 5.53ms 32.490ms 43.88 Supported
STDC 2-Seg50 stdc2_seg50 Cityscapes 512x1024 76.44 4.12ms 30.94ms 32.03ms Supported
STDC 2-Seg75 stdc2_seg75 Cityscapes 768x1536 78.93 6.95ms 33.89ms 54.48ms Supported
RegSeg (exp48) regseg48 Cityscapes 1024x2048 78.15 12.03ms 38.91ms 78.20ms Supported

NOTE:

  • Performance measured on T4 GPU with TensorRT, using FP16 precision and batch size 1 (latency), and not including IO
  • For resolutions below 1024x2048 we first resize the input to the inference resolution and then resize the predictions to 1024x2048. The time of resizing is included in the measurements so that the practical input-size is 1024x2048.
  • DDRNet23 and DDRNet23_Slim results were achieved with channel wise knowledge distillation training recipe.

Pretrained Pose Estimation PyTorch Checkpoints

Model Model Name Dataset Resolution AP (No TTA / H-Flip TTA / H-Flip TTA+Rescoring) Latency b1T4 Latency b1T4 including IO Latency (Production)**Jetson Xavier NX
DEKR_W32_NO_DC dekr_w32_no_dc COCO2017 PE 640x640 63.08 / 64.96 / 67.32 13.29 ms 15.31 ms 75.99 ms
YoloNAS POSE N yolo_nas_pose_n COCO2017 PE 640x640 59.68 / N/A / N/A N/A 2.35 ms 15.99 ms
YoloNAS POSE S yolo_nas_pose_s COCO2017 PE 640x640 64.15 / N/A / N/A N/A 3.29 ms 21.01 ms
YoloNAS POSE M yolo_nas_pose_m COCO2017 PE 640x640 67.87 / N/A / N/A N/A 6.87 ms 38.40 ms
YoloNAS POSE L yolo_nas_pose_l COCO2017 PE 640x640 68.24 / N/A / N/A N/A 8.86 ms 49.34 ms

Implemented Model Architectures

Image Classification

Object Detection

Semantic Segmentation

Pose Estimation