Skip to content

Releases: ultralytics/yolov5

v7.0 - YOLOv5 SOTA Realtime Instance Segmentation

22 Nov 15:29
915bbf2
Compare
Choose a tag to compare

Our new YOLOv5 v7.0 instance segmentation models are the fastest and most accurate in the world, beating all current SOTA benchmarks. We've made them super simple to train, validate and deploy. See full details in our Release Notes and visit our YOLOv5 Segmentation Colab Notebook for quickstart tutorials.


Our primary goal with this release is to introduce super simple YOLOv5 segmentation workflows just like our existing object detection models. The new v7.0 YOLOv5-seg models below are just a start, we will continue to improve these going forward together with our existing detection and classification models. We'd love your feedback and contributions on this effort!

This release incorporates 280 PRs from 41 contributors since our last release in August 2022.

Important Updates

  • Segmentation Models ⭐ NEW: SOTA YOLOv5-seg COCO-pretrained segmentation models are now available for the first time (#9052 by @glenn-jocher, @AyushExel and @Laughing-q)
  • Paddle Paddle Export: Export any YOLOv5 model (cls, seg, det) to Paddle format with python export.py --include paddle (#9459 by @glenn-jocher)
  • YOLOv5 AutoCache: Use python train.py --cache ram will now scan available memory and compare against predicted dataset RAM usage. This reduces risk in caching and should help improve adoption of the dataset caching feature, which can significantly speed up training. (#10027 by @glenn-jocher)
  • Comet Logging and Visualization Integration: Free forever, Comet lets you save YOLOv5 models, resume training, and interactively visualise and debug predictions. (#9232 by @DN6)

New Segmentation Checkpoints

We trained YOLOv5 segmentations models on COCO for 300 epochs at image size 640 using A100 GPUs. We exported all models to ONNX FP32 for CPU speed tests and to TensorRT FP16 for GPU speed tests. We ran all speed tests on Google Colab Pro notebooks for easy reproducibility.

Model size
(pixels)
mAPbox
50-95
mAPmask
50-95
Train time
300 epochs
A100 (hours)
Speed
ONNX CPU
(ms)
Speed
TRT A100
(ms)
params
(M)
FLOPs
@640 (B)
YOLOv5n-seg 640 27.6 23.4 80:17 62.7 1.2 2.0 7.1
YOLOv5s-seg 640 37.6 31.7 88:16 173.3 1.4 7.6 26.4
YOLOv5m-seg 640 45.0 37.1 108:36 427.0 2.2 22.0 70.8
YOLOv5l-seg 640 49.0 39.9 66:43 (2x) 857.4 2.9 47.9 147.7
YOLOv5x-seg 640 50.7 41.4 62:56 (3x) 1579.2 4.5 88.8 265.7
  • All checkpoints are trained to 300 epochs with SGD optimizer with lr0=0.01 and weight_decay=5e-5 at image size 640 and all default settings.
    Runs logged to https://wandb.ai/glenn-jocher/YOLOv5_v70_official
  • Accuracy values are for single-model single-scale on COCO dataset.
    Reproduce by python segment/val.py --data coco.yaml --weights yolov5s-seg.pt
  • Speed averaged over 100 inference images using a Colab Pro A100 High-RAM instance. Values indicate inference speed only (NMS adds about 1ms per image).
    Reproduce by python segment/val.py --data coco.yaml --weights yolov5s-seg.pt --batch 1
  • Export to ONNX at FP32 and TensorRT at FP16 done with export.py.
    Reproduce by python export.py --weights yolov5s-seg.pt --include engine --device 0 --half

New Segmentation Usage Examples

Train

YOLOv5 segmentation training supports auto-download COCO128-seg segmentation dataset with --data coco128-seg.yaml argument and manual download of COCO-segments dataset with bash data/scripts/get_coco.sh --train --val --segments and then python train.py --data coco.yaml.

# Single-GPU
python segment/train.py --model yolov5s-seg.pt --data coco128-seg.yaml --epochs 5 --img 640

# Multi-GPU DDP
python -m torch.distributed.run --nproc_per_node 4 --master_port 1 segment/train.py --model yolov5s-seg.pt --data coco128-seg.yaml --epochs 5 --img 640 --device 0,1,2,3

Val

Validate YOLOv5m-seg accuracy on ImageNet-1k dataset:

bash data/scripts/get_coco.sh --val --segments  # download COCO val segments split (780MB, 5000 images)
python segment/val.py --weights yolov5s-seg.pt --data coco.yaml --img 640  # validate

Predict

Use pretrained YOLOv5m-seg to predict bus.jpg:

python segment/predict.py --weights yolov5m-seg.pt --data data/images/bus.jpg
model = torch.hub.load('ultralytics/yolov5', 'custom', 'yolov5m-seg.pt')  # load from PyTorch Hub (WARNING: inference not yet supported)
zidane bus

Export

Export YOLOv5s-seg model to ONNX and TensorRT:

python export.py --weights yolov5s-seg.pt --include onnx engine --img 640 --device 0

Changelog

πŸ› οΈ New Features and Bug Fixes (280) * Improve classification comments by @glenn-jocher in https://github.com//pull/8997 * Update `attempt_download(release='v6.2')` by @glenn-jocher in https://github.com//pull/8998 * Update README_cn.md by @KieraMengru0907 in https://github.com//pull/9001 * Update dataset `names` from array to dictionary by @glenn-jocher in https://github.com//pull/9000 * [segment]: Allow inference on dirs and videos by @AyushExel in https://github.com//pull/9003 * DockerHub tag update Usage example by @glenn-jocher in https://github.com//pull/9005 * Add weight `decay` to argparser by @glenn-jocher in https://github.com//pull/9006 * Add glob quotes to detect.py usage example by @glenn-jocher in https://github.com//pull/9007 * Fix TorchScript JSON string key bug by @glenn-jocher in https://github.com//pull/9015 * EMA FP32 assert classification bug fix by @glenn-jocher in https://github.com//pull/9016 * Faster pre-processing for gray image input by @cher-liang in https://github.com//pull/9009 * Improved `Profile()` inference timing by @glenn-jocher in https://github.com//pull/9024 * `torch.empty()` for speed improvements by @glenn-jocher in https://github.com//pull/9025 * Remove unused `time_sync` import by @glenn-jocher in https://github.com//pull/9026 * Add PyTorch Hub classification CI checks by @glenn-jocher in https://github.com//pull/9027 *...
Read more

v6.2 - YOLOv5 Classification Models, Apple M1, Reproducibility, ClearML and Deci.ai integrations

17 Aug 10:26
d3ea0df
Compare
Choose a tag to compare

This release incorporates 401 PRs from 41 contributors since our last release in February 2022. It adds Classification training, validation, prediction and export (to all 11 formats), and also provides ImageNet-pretrained YOLOv5m-cls, ResNet (18, 34, 50, 101) and EfficientNet (b0-b3) models.

My main goal with this release is to introduce super simple YOLOv5 classification workflows just like our existing object detection models. The new v6.2 YOLOv5-cls models below are just a start, we will continue to improve these going forward together with our existing detection models. We'd love your contributions on this effort!

Our next release, v6.3 is scheduled for September and will bring official instance segmentation support to YOLOv5, with a major v7.0 release later this year updating architectures across all 3 tasks - classification, detection and segmentation.

Important Updates

  • Classification Models ⭐ NEW: YOLOv5-cls ImageNet-pretrained classification models are now available for the first time (#8956 by @glenn-jocher)
  • ClearML logging ⭐ NEW: Integration with the open-source experiment tracker ClearML. Installing with pip install clearml will enable the integration and allow users to track every training run in ClearML. This in turn allows users to track and compare runs and even schedule runs remotely. (#8620 by @thepycoder)
  • Deci.ai optimization ⭐ NEW: Automatically compile and quantize YOLOv5 for better inference performance in one click at Deci (#8956 by @glenn-jocher).
  • GPU Export Benchmarks: Benchmark (mAP and speed) all YOLOv5 export formats with python utils/benchmarks.py --weights yolov5s.pt --device 0 for GPU benchmarks or --device cpu for CPU benchmarks (#6963 by @glenn-jocher).
  • Training Reproducibility: Single-GPU YOLOv5 training with torch>=1.12.0 is now fully reproducible, and a new --seed argument can be used (default seed=0) (#8213 by @AyushExel).
  • Apple Metal Performance Shader (MPS) Support: MPS support for Apple M1/M2 devices with --device mps (full functionality is pending torch updates in pytorch/pytorch#77764) (#7878 by @glenn-jocher)

New Classification Checkpoints

We trained YOLOv5-cls classification models on ImageNet for 90 epochs using a 4xA100 instance, and we trained ResNet and EfficientNet models alongside with the same default training settings to compare. We exported all models to ONNX FP32 for CPU speed tests and to TensorRT FP16 for GPU speed tests. We ran all speed tests on Google Colab Pro for easy reproducibility.

Model size
(pixels)
accuracy
top1
accuracy
top5
Train time
90 epochs
4x A100 (hours)
Speed
ONNX-CPU
(ms)
Speed
TensorRT-V100
(ms)
params
(M)
FLOPs
@224 (B)
YOLOv5n-cls 224 64.6 85.4 7:59 3.3 0.5 2.5 0.5
YOLOv5s-cls 224 71.5 90.2 8:09 6.6 0.6 5.4 1.4
YOLOv5m-cls 224 75.9 92.9 10:06 15.5 0.9 12.9 3.9
YOLOv5l-cls 224 78.0 94.0 11:56 26.9 1.4 26.5 8.5
YOLOv5x-cls 224 79.0 94.4 15:04 54.3 1.8 48.1 15.9
ResNet18 224 70.3 89.5 6:47 11.2 0.5 11.7 3.7
ResNet34 224 73.9 91.8 8:33 20.6 0.9 21.8 7.4
ResNet50 224 76.8 93.4 11:10 23.4 1.0 25.6 8.5
ResNet101 224 78.5 94.3 17:10 42.1 1.9 44.5 15.9
EfficientNet_b0 224 75.1 92.4 13:03 12.5 1.3 5.3 1.0
EfficientNet_b1 224 76.4 93.2 17:04 14.9 1.6 7.8 1.5
EfficientNet_b2 224 76.6 93.4 17:10 15.9 1.6 9.1 1.7
EfficientNet_b3 224 77.7 94.0 19:19 18.9 1.9 12.2 2.4
  • All checkpoints are trained to 90 epochs with SGD optimizer with lr0=0.001 and weight_decay=5e-5 at image size 224 and all default settings.
    Runs logged to https://wandb.ai/glenn-jocher/YOLOv5-Classifier-v6-2
  • Accuracy values are for single-model single-scale on ImageNet-1k dataset.
    Reproduce by python classify/val.py --data ../datasets/imagenet --img 224
  • Speed averaged over 100 inference images using a Colab Pro V100 High-RAM instance.
    Reproduce by python classify/val.py --data ../datasets/imagenet --img 224 --batch 1
  • Export to ONNX at FP32 and TensorRT at FP16 done with `export....
Read more

v6.1 - TensorRT, TensorFlow Edge TPU and OpenVINO Export and Inference

22 Feb 11:35
3752807
Compare
Choose a tag to compare

This release incorporates many new features and bug fixes (271 PRs from 48 contributors) since our last release in October 2021. It adds TensorRT, Edge TPU and OpenVINO support, and provides retrained models at --batch-size 128 with new default one-cycle linear LR scheduler. YOLOv5 now officially supports 11 different formats, not just for export but for inference (both detect.py and PyTorch Hub), and validation to profile mAP and speed results after export.

Format export.py --include Model
PyTorch - yolov5s.pt
TorchScript torchscript yolov5s.torchscript
ONNX onnx yolov5s.onnx
OpenVINO openvino yolov5s_openvino_model/
TensorRT engine yolov5s.engine
CoreML coreml yolov5s.mlmodel
TensorFlow SavedModel saved_model yolov5s_saved_model/
TensorFlow GraphDef pb yolov5s.pb
TensorFlow Lite tflite yolov5s.tflite
TensorFlow Edge TPU edgetpu yolov5s_edgetpu.tflite
TensorFlow.js tfjs yolov5s_web_model/

Usage examples (ONNX shown):

Export:          python export.py --weights yolov5s.pt --include onnx
Detect:          python detect.py --weights yolov5s.onnx
PyTorch Hub:     model = torch.hub.load('ultralytics/yolov5', 'custom', 'yolov5s.onnx')
Validate:        python val.py --weights yolov5s.onnx
Visualize:       https://netron.app

Important Updates

  • TensorRT support: TensorFlow, Keras, TFLite, TF.js model export now fully integrated using python export.py --include saved_model pb tflite tfjs (#5699 by @imyhxy)
  • Tensorflow Edge TPU support ⭐ NEW: New smaller YOLOv5n (1.9M params) model below YOLOv5s (7.5M params), exports to 2.1 MB INT8 size, ideal for ultralight mobile solutions. (#3630 by @zldrobit)
  • OpenVINO support: YOLOv5 ONNX models are now compatible with both OpenCV DNN and ONNX Runtime (#6057 by @glenn-jocher).
  • Export Benchmarks: Benchmark (mAP and speed) all YOLOv5 export formats with python utils/benchmarks.py --weights yolov5s.pt. Currently operates on CPU, future updates will implement GPU support. (#6613 by @glenn-jocher).
  • Architecture: no changes
  • Hyperparameters: minor change
  • Training: Default Learning Rate (LR) scheduler updated
    • One-cycle with cosine replace with one-cycle linear for improved results (#6729 by @glenn-jocher).

New Results

All model trainings logged to https://wandb.ai/glenn-jocher/YOLOv5_v61_official

YOLOv5-P5 640 Figure (click to expand)

Figure Notes (click to expand)
  • COCO AP val denotes mAP@0.5:0.95 metric measured on the 5000-image COCO val2017 dataset over various inference sizes from 256 to 1536.
  • GPU Speed measures average inference time per image on COCO val2017 dataset using a AWS p3.2xlarge V100 instance at batch-size 32.
  • EfficientDet data from google/automl at batch size 8.
  • Reproduce by python val.py --task study --data coco.yaml --iou 0.7 --weights yolov5n6.pt yolov5s6.pt yolov5m6.pt yolov5l6.pt yolov5x6.pt

Example YOLOv5l before and after metrics:

YOLOv5l
Large
size
(pixels)
mAPval
0.5:0.95
mAPval
0.5
Speed
CPU b1
(ms)
Speed
V100 b1
(ms)
Speed
V100 b32
(ms)
params
(M)
FLOPs
@640 (B)
v5.0 640 48.2 66.9 457.9 11.6 2.8 47.0 115.4
v6.0 (previous) 640 48.8 67.2 424.5 10.9 2.7 46.5 109.1
v6.1 (this release) 640 49.0 67.3 430.0 10.1 2.7 46.5 109.1

Pretrained Checkpoints

Model size
(pixels)
mAPval
0.5:0.95
mAPval
0.5
Speed
CPU b1
(ms)
Speed
V100 b1
(ms)
Speed
V100 b32
(ms)
params
(M)
FLOPs
@640 (B)
YOLOv5n 640 28.0 45.7 45 6.3 0.6 1.9 4.5
YOLOv5s 640 37.4 56.8 98 6.4 0.9 7.2 16.5
YOLOv5m 640 45.4 64.1 224 8.2 1.7 21.2 49.0
YOLOv5l 640 49.0 67.3 430 10.1 2.7 46.5 109.1
YOLOv5x 640 50.7 68.9 766 12.1 4.8 86.7 205.7
YOLOv5n6 1280 36.0 54.4 153 8.1 2.1 3.2 4.6
YOLOv5s6 1280 44.8 63.7 385 8.2 3.6 12.6 16.8
YOLOv5m6 1280 51.3 69.3 887 11.1 6.8 35.7 50.0
YOLOv5l6 1280 53.7 71.3 1784 15.8 10.5 76.8 111.4
YOLOv5x6
+ TTA
1280
1536
55.0
55.8
72.7
72.7
3136
-
26.2
-
19.4
-
140.7
-
209.8
-
Table Notes (click to expand)
  • All checkpoints are trained to 300 epochs with default settings. Nano and Small models use hyp.scratch-low.yaml hyps, all others use hyp.scratch-high.yaml.
  • mAPval values are for single-model single-scale on COCO val2017 dataset.
    Reproduce by python val.py --data coco.yaml --img 640 --conf 0.001 --iou 0.65
  • Speed averaged over COCO val images using a AWS p3.2xlarge instance. NMS times (~1 ms/img) not included.
    Reproduce by python val.py --data coco.yaml --img 640 --task speed --batch 1
  • TTA Test Time Augmentation includes reflection and scale augmentations.
    Reproduce by python val.py --data coco.yaml --img 1536 --iou 0.7 --augment

Changelog

Changes between previous release and this release: v6.0...v6.1
Changes since this release: v6.1...HEAD

New Features and Bug Fixes (271)
Read more

v6.0 - YOLOv5n 'Nano' models, Roboflow integration, TensorFlow export, OpenCV DNN support

12 Oct 07:08
956be8e
Compare
Choose a tag to compare

This release incorporates many new features and bug fixes (465 PRs from 73 contributors) since our last release v5.0 in April, brings architecture tweaks, and also introduces new P5 and P6 'Nano' models: YOLOv5n and YOLOv5n6. Nano models maintain the YOLOv5s depth multiple of 0.33 but reduce the YOLOv5s width multiple from 0.50 to 0.25, resulting in ~75% fewer parameters, from 7.5M to 1.9M, ideal for mobile and CPU solutions.

Example usage:

python detect.py --weights yolov5n.pt --img 640    # Nano P5 model trained at --img 640 (28.4 mAP@0.5:0.95)
python detect.py --weights yolov5n6.pt --img 1280  # Nano P6 model trained at --img 1280 (34.0 mAP0.5:0.95)

Important Updates

  • Roboflow Integration ⭐ NEW: Train YOLOv5 models directly on any Roboflow dataset with our new integration! (#4975 by @Jacobsolawetz)

  • YOLOv5n 'Nano' models ⭐ NEW: New smaller YOLOv5n (1.9M params) model below YOLOv5s (7.5M params), exports to 2.1 MB INT8 size, ideal for ultralight mobile solutions. (#5027 by @glenn-jocher)

  • TensorFlow and Keras Export: TensorFlow, Keras, TFLite, TF.js model export now fully integrated using python export.py --include saved_model pb tflite tfjs (#1127 by @zldrobit)

  • OpenCV DNN: YOLOv5 ONNX models are now compatible with both OpenCV DNN and ONNX Runtime (#4833 by @SamFC10).

  • Model Architecture: Updated backbones are slightly smaller, faster and more accurate.

    • Replacement of Focus() with an equivalent Conv(k=6, s=2, p=2) layer (#4825 by @thomasbi1) for improved exportability
    • New SPPF() replacement for SPP() layer for reduced ops (#4420 by @glenn-jocher)
    • Reduction in P3 backbone layer C3() repeats from 9 to 6 for improved speeds
    • Reorder places SPPF() at end of backbone
    • Reintroduction of shortcut in the last C3() backbone layer
    • Updated hyperparameters with increased mixup and copy-paste augmentation

New Results

YOLOv5-P5 640 Figure (click to expand)

Figure Notes (click to expand)
  • COCO AP val denotes mAP@0.5:0.95 metric measured on the 5000-image COCO val2017 dataset over various inference sizes from 256 to 1536.
  • GPU Speed measures average inference time per image on COCO val2017 dataset using a AWS p3.2xlarge V100 instance at batch-size 32.
  • EfficientDet data from google/automl at batch size 8.
  • Reproduce by python val.py --task study --data coco.yaml --iou 0.7 --weights yolov5n6.pt yolov5s6.pt yolov5m6.pt yolov5l6.pt yolov5x6.pt

mAP improves from +0.3% to +1.1% across all models, and ~5% FLOPs reduction produces slight speed improvements and a reduced CUDA memory footprint. Example YOLOv5l before and after metrics:

YOLOv5l
Large
size
(pixels)
mAPval
0.5:0.95
mAPval
0.5
Speed
CPU b1
(ms)
Speed
V100 b1
(ms)
Speed
V100 b32
(ms)
params
(M)
FLOPs
@640 (B)
v5.0 (previous) 640 48.2 66.9 457.9 11.6 2.8 47.0 115.4
v6.0 (this release) 640 48.8 67.2 424.5 10.9 2.7 46.5 109.1

Pretrained Checkpoints

Model size
(pixels)
mAPval
0.5:0.95
mAPval
0.5
Speed
CPU b1
(ms)
Speed
V100 b1
(ms)
Speed
V100 b32
(ms)
params
(M)
FLOPs
@640 (B)
YOLOv5n 640 28.4 46.0 45 6.3 0.6 1.9 4.5
YOLOv5s 640 37.2 56.0 98 6.4 0.9 7.2 16.5
YOLOv5m 640 45.2 63.9 224 8.2 1.7 21.2 49.0
YOLOv5l 640 48.8 67.2 430 10.1 2.7 46.5 109.1
YOLOv5x 640 50.7 68.9 766 12.1 4.8 86.7 205.7
YOLOv5n6 1280 34.0 50.7 153 8.1 2.1 3.2 4.6
YOLOv5s6 1280 44.5 63.0 385 8.2 3.6 16.8 12.6
YOLOv5m6 1280 51.0 69.0 887 11.1 6.8 35.7 50.0
YOLOv5l6 1280 53.6 71.6 1784 15.8 10.5 76.8 111.4
YOLOv5x6
+ TTA
1280
1536
54.7
55.4
72.4
72.3
3136
-
26.2
-
19.4
-
140.7
-
209.8
-
Table Notes (click to expand)
  • All checkpoints are trained to 300 epochs with default settings. Nano models use hyp.scratch-low.yaml hyperparameters, all others use hyp.scratch-high.yaml.
  • mAPval values are for single-model single-scale on COCO val2017 dataset.
    Reproduce by python val.py --data coco.yaml --img 640 --conf 0.001 --iou 0.65
  • Speed averaged over COCO val images using a AWS p3.2xlarge instance. NMS times (~1 ms/img) not included.
    Reproduce by python val.py --data coco.yaml --img 640 --conf 0.25 --iou 0.45
  • TTA Test Time Augmentation includes reflection and scale augmentations.
    Reproduce by python val.py --data coco.yaml --img 1536 --iou 0.7 --augment

Changelog

Changes between previous release and this release: v5.0...v6.0
Changes since this release: v6.0...HEAD

New Features and Bug Fixes (465)
Read more

v5.0 - YOLOv5-P6 1280 models, AWS, Supervise.ly and YouTube integrations

11 Apr 17:25
f5b8f7d
Compare
Choose a tag to compare

This release implements YOLOv5-P6 models and retrained YOLOv5-P5 models. All model sizes YOLOv5s/m/l/x are now available in both P5 and P6 architectures:

  • YOLOv5-P5 models (same architecture as v4.0 release): 3 output layers P3, P4, P5 at strides 8, 16, 32, trained at --img 640
python detect.py --weights yolov5s.pt  # P5 models
                           yolov5m.pt
                           yolov5l.pt
                           yolov5x.pt
  • YOLOv5-P6 models: 4 output layers P3, P4, P5, P6 at strides 8, 16, 32, 64 trained at --img 1280
python detect.py --weights yolov5s6.pt  # P6 models
                           yolov5m6.pt
                           yolov5l6.pt
                           yolov5x6.pt

Example usage:

# Command Line
python detect.py --weights yolov5m.pt --img 640  # P5 model at 640
python detect.py --weights yolov5m6.pt --img 640  # P6 model at 640
python detect.py --weights yolov5m6.pt --img 1280  # P6 model at 1280
# PyTorch Hub
model = torch.hub.load('ultralytics/yolov5', 'yolov5m6')  #  P6 model
results = model(imgs, size=1280)  # inference at 1280

Notable Updates

  • YouTube Inference: Direct inference from YouTube videos, i.e. python detect.py --source 'https://youtu.be/NUsoVlDFqZg'. Live streaming videos and normal videos supported. (#2752)
  • AWS Integration: Amazon AWS integration and new AWS Quickstart Guide for simple EC2 instance YOLOv5 training and resuming of interrupted Spot instances. (#2185)
  • Supervise.ly Integration: New integration with the Supervisely Ecosystem for training and deploying YOLOv5 models with Supervise.ly (#2518)
  • Improved W&B Integration: Allows saving datasets and models directly to Weights & Biases. This allows for --resume directly from W&B (useful for temporary environments like Colab), as well as enhanced visualization tools. See this blog by @AyushExel for details. (#2125)

Updated Results

P6 models include an extra P6/64 output layer for detection of larger objects, and benefit the most from training at higher resolution. For this reason we trained all P5 models at 640, and all P6 models at 1280.

YOLOv5-P5 640 Figure (click to expand)

Figure Notes (click to expand)
  • GPU Speed measures end-to-end time per image averaged over 5000 COCO val2017 images using a V100 GPU with batch size 32, and includes image preprocessing, PyTorch FP16 inference, postprocessing and NMS.
  • EfficientDet data from google/automl at batch size 8.
  • Reproduce by python test.py --task study --data coco.yaml --iou 0.7 --weights yolov5s6.pt yolov5m6.pt yolov5l6.pt yolov5x6.pt

Pretrained Checkpoints

Model size
(pixels)
mAPval
0.5:0.95
mAPtest
0.5:0.95
mAPval
0.5
Speed
V100 (ms)
params
(M)
FLOPS
640 (B)
YOLOv5s 640 36.7 36.7 55.4 2.0 7.3 17.0
YOLOv5m 640 44.5 44.5 63.1 2.7 21.4 51.3
YOLOv5l 640 48.2 48.2 66.9 3.8 47.0 115.4
YOLOv5x 640 50.4 50.4 68.8 6.1 87.7 218.8
YOLOv5s6 1280 43.3 43.3 61.9 4.3 12.7 17.4
YOLOv5m6 1280 50.5 50.5 68.7 8.4 35.9 52.4
YOLOv5l6 1280 53.4 53.4 71.1 12.3 77.2 117.7
YOLOv5x6 1280 54.4 54.4 72.0 22.4 141.8 222.9
YOLOv5x6 TTA 1280 55.0 55.0 72.0 70.8 - -
Table Notes (click to expand)
  • APtest denotes COCO test-dev2017 server results, all other AP results denote val2017 accuracy.
  • AP values are for single-model single-scale unless otherwise noted. Reproduce mAP by python test.py --data coco.yaml --img 640 --conf 0.001 --iou 0.65
  • SpeedGPU averaged over 5000 COCO val2017 images using a GCP n1-standard-16 V100 instance, and includes FP16 inference, postprocessing and NMS. Reproduce speed by python test.py --data coco.yaml --img 640 --conf 0.25 --iou 0.45
  • All checkpoints are trained to 300 epochs with default settings and hyperparameters (no autoaugmentation).
  • Test Time Augmentation (TTA) includes reflection and scale augmentation. Reproduce TTA by python test.py --data coco.yaml --img 1536 --iou 0.7 --augment

Changelog

Changes between previous release and this release: v4.0...v5.0
Changes since this release: v5.0...HEAD

Click a section below to expand details:

Implemented Enhancements (26)
  • Return predictions as json #2703
  • Single channel image training? #2609
  • Images in MPO Format are considered corrupted #2446
  • Improve Validation Visualization #2384
  • Add ASFF (three fuse feature layers) int the Head for V5(s,m,l,x) #2348
  • Dear author, can you provide a visualization scheme for YOLOV5 feature graphs during detect.py? Thank you! #2259
  • Dataloader #2201
  • Update Train Custom Data wiki page #2187
  • Multi-class NMS #2162
  • πŸ’‘Idea: Mosaic cropping using segmentation labels #2151
  • Improving Confusion Matrix Interpretability: FP and FN vectors should be switched to align with Predicted and True axis #2071
  • Interpreting model YoloV5 by Grad-cam #2065
  • Output optimal confidence threshold based on PR curve #2048
  • is it valuable that add --cache-images option to detect.py? #2004
  • I want to change the anchor box to anchor circles, where do you think the change to be made ? #1987
  • Support for imgaug #1954
  • Any plan for Knowledge Distillation? #1762
  • Is there a wasy to run detections on a video/webcam/rtrsp, etc EVERY x SECONDS? #1742
  • Can yolov5 support rotated target detection? #1728
  • Deploying yolov5 to TorchServe (GPU compatible) #1681
  • Why diffrent colors of bboxs? #1638
  • Yet another export yolov5 models to ONNX and inference with TensorRT #1597
  • Rerange the blocks of Focus Layer into row major to be compatible with tensorflow SpaceToDepth #413
  • YouTube Livestream Detection #2752 (ben-milanko)
  • Add TransformerLayer, TransformerBlock, C3TR modules #2333 (dingyiwei)
  • I...
Read more

v4.0 - nn.SiLU() activations, Weights & Biases logging, PyTorch Hub integration

05 Jan 04:26
69be8e7
Compare
Choose a tag to compare

This release implements two architecture changes to YOLOv5, as well as various bug fixes and performance improvements.

Breaking Changes

  • nn.SiLU() activations replace nn.LeakyReLU(0.1) and nn.Hardswish() activations used in previous versions. nn.SiLU() was introduced in PyTorch 1.7.0 (https://pytorch.org/docs/stable/generated/torch.nn.SiLU.html), and due to the recent timeframe certain export pipelines may be temporarily unavailable (CoreML possibly) without updates to the associated tools (i.e. coremltools).

Bug Fixes

  • Multi-GPU --resume #1810
  • leaf Variable inplace bug fix #1759
  • Various additional bug fixes contained in PRs #1235 through #1837

Added Functionality

  • Weights & Biases (W&B) Feature Addition #1235
  • Utils reorganization #1392
  • PyTorch Hub and autoShape update #1415
  • W&B artifacts feature addition #1712
  • Various additional feature additions contained in PRs #1235 through #1837

Updated Results

Latest models are all slightly smaller to due removal of one convolution within each bottleneck, which have been renamed as C3() modules now in light of the 3 I/O convolutions each one does vs the 4 in the standard CSP bottleneck. The previous manual concatenation and LeakyReLU(0.1) activations have both removed, simplifying the architecture, reducing parameter count, and better exploiting the .fuse() operation at inference time.

nn.SiLU() activations replace nn.LeakyReLU(0.1) and nn.Hardswish() activations throughout the model, simplifying the architecture as we now only have one single activation function used everywhere rather than the two types before.

In general the changes result in smaller models (89.0M params -> 87.7M YOLOv5x), faster inference times (6.9ms -> 6.0ms), and improved mAP (49.2 -> 50.1) for all models except YOLOv5s, which reduced mAP slightly (37.0 -> 36.8). In general the largest models benefit the most from this update. YOLOv5x in particular is now above 50.0 mAP at --img-size 640, which may be the first time this is possible at 640 resolution for any architecture I'm aware of (correct me if I'm wrong though).

** GPU Speed measures end-to-end time per image averaged over 5000 COCO val2017 images using a V100 GPU with batch size 32, and includes image preprocessing, PyTorch FP16 inference, postprocessing and NMS. EfficientDet data from google/automl at batch size 8.

Pretrained Checkpoints

Model size APval APtest AP50 SpeedV100 FPSV100 params GFLOPS
YOLOv5s 640 36.8 36.8 55.6 2.2ms 455 7.3M 17.0
YOLOv5m 640 44.5 44.5 63.1 2.9ms 345 21.4M 51.3
YOLOv5l 640 48.1 48.1 66.4 3.8ms 264 47.0M 115.4
YOLOv5x 640 50.1 50.1 68.7 6.0ms 167 87.7M 218.8
YOLOv5x + TTA 832 51.9 51.9 69.6 24.9ms 40 87.7M 1005.3

v3.1 - Bug Fixes and Performance Improvements

29 Oct 15:07
c8c5ef3
Compare
Choose a tag to compare

This release aggregates various minor bug fixes and performance improvements since the main v3.0 release and incorporates PyTorch 1.7.0 compatibility updates. v3.1 models share weights with v3.0 models but contain minor module updates (inplace fields for nn.Hardswish() activations) for native PyTorch 1.7.0 compatibility. For PyTorch 1.7.0 release updates see https://github.com/pytorch/pytorch/releases/tag/v1.7.0.

Breaking Changes

  • 'giou' hyperparameter has been renamed to 'box' to better reflect a criteria-agnostic regression loss term (#1120)

Bug Fixes

  • PyTorch 1.7 compatibility update. torch>=1.6.0 required, torch>=1.7.0 recommended (#1233)
  • GhostConv module bug fix (#1176)
  • Rectangular padding min stride bug fix from 64 to 32 (#1165)
  • Mosaic4 bug fix (#1021)
  • Logging directory runs/exp bug fix (#978)
  • Various additional

Added Functionality

  • PyTorch Hub functionality with YOLOv5 .autoshape() method added (#1210)
  • Autolabelling addition and standardization across detect.py and test.py (#1182)
  • Precision-Recall Curve automatic plotting when testing (#1107)
  • Self-host VOC dataset for more reliable access and faster downloading (#1077)
  • Adding option to output autolabel confidence with --save-conf in test.py and detect.py (#994)
  • Google App Engine deployment option (#964)
  • Infinite Dataloader for faster training (#876)
  • Various additional

v3.0

13 Aug 21:28
1ddf692
Compare
Choose a tag to compare

This releases includes nn.Hardswish() activation implementation on Conv() modules, which increases mAP for all models at the expense of about 10% in inference speed. Training speeds are not significantly affected, though CUDA memory requirements increase about 10%. Training from scratch as well as finetuning both benefit from this change. The smallest models benefit the most from the Hardswish() activations, with increases of +0.9/+0.8/+0.7/+0.2mAP@0.5:0.95 for YOLOv5s/m/l/x.

All mAP values in our README are now reported at --img-size 640 (v2.0 reported at 672, and v1.0 reported at 736), so we've succeeded in increasing mAP while reducing the required --img-size :)

We've also listed YOLOv5x Test Time Augmentation (TTA) mAP and speeds for v3.0 in our README table for the first time (and for v2.0 below). Best results are YOLOv5x with TTA at 50.8 mAP@0.5:0.95. We've also updated efficientdet results in our comparison plot to reflect recent improvements in the google/automl repo.

Breaking Changes

  • This release does not contain breaking changes.
  • This release is only backwards compatible with v2.0 models trained with torch>=1.6.

Bug Fixes

  • Hyperparameter evolution fixed, tutorial added (#607)

Added Functionality

  • PyTorch 1.6 compatible. torch>=1.6 required (43a616a)
  • PyTorch 1.6 native Automatic Mixed Precision (AMP) replaces NVIDIA Apex AMP (#573)
  • nn.Hardswish() activations replace nn.LeakyReLU(0.1) in base convolution module models.Conv()
  • Dataset Autodownload feature added (#685)
  • Model Autodownload improved (#711)
  • Layer freezing code added (#679)
  • TensorRT export tutorial added (#623)

** GPU Speed measures end-to-end time per image averaged over 5000 COCO val2017 images using a V100 GPU with batch size 32, and includes image preprocessing, PyTorch FP16 inference, postprocessing and NMS. EfficientDet data from [google/automl](https://github.com/google/automl) at batch size 8.

  • August 13, 2020: v3.0 release: nn.Hardswish() activations, data autodownload, native AMP.
  • July 23, 2020: v2.0 release: improved model definition, training and mAP.
  • June 22, 2020: PANet updates: new heads, reduced parameters, improved speed and mAP 364fcfd.
  • June 19, 2020: FP16 as new default for smaller checkpoints and faster inference d4c6674.
  • June 9, 2020: CSP updates: improved speed, size, and accuracy (credit to @WongKinYiu for CSP).
  • May 27, 2020: Public release. YOLOv5 models are SOTA among all known YOLO implementations.
  • April 1, 2020: Start development of future compound-scaled YOLOv3/YOLOv4-based PyTorch models.

Pretrained Checkpoints

v3.0 with nn.Hardswish()

Model APval APtest AP50 SpeedGPU FPSGPU params FLOPS
YOLOv5s 37.0 37.0 56.2 2.4ms 476 7.5M 13.2B
YOLOv5m 44.3 44.3 63.2 3.4ms 333 21.8M 39.4B
YOLOv5l 47.7 47.7 66.5 4.4ms 256 47.8M 88.1B
YOLOv5x 49.2 49.2 67.7 6.9ms 164 89.0M 166.4B
YOLOv5x + TTA 50.8 50.8 68.9 25.5ms 39 89.0M 354.3B
YOLOv3-SPP 45.6 45.5 65.2 4.5ms 222 63.0M 118.0B

** APtest denotes COCO test-dev2017 server results, all other AP results in the table denote val2017 accuracy.
** All AP numbers are for single-model single-scale without ensemble or test-time augmentation except for TTA. Reproduce by python test.py --data coco.yaml --img 640 --conf 0.001. Test Time Augmentation (TTA) runs at 3 image sizes. Reproduce TTA results by python test.py --data coco.yaml --img 832 --augment
** SpeedGPU measures end-to-end time per image averaged over 5000 COCO val2017 images using a GCP n1-standard-16 instance with one V100 GPU, and includes image preprocessing, PyTorch FP16 image inference at --batch-size 32 --img-size 640, postprocessing and NMS. Average NMS time included in this chart is 1-2ms/img. Reproduce by python test.py --data coco.yaml --img 640 --conf 0.1
** All checkpoints are trained to 300 epochs with default settings and hyperparameters (no autoaugmentation).

v2.0 with nn.LeakyReLU(0.1)

Model APval APtest AP50 SpeedGPU FPSGPU params FLOPS
YOLOv5s 36.1 36.1 55.3 2.2ms 476 7.5M 13.2B
YOLOv5m 43.5 43.5 62.5 3.2ms 333 21.8M 39.4B
YOLOv5l 47.0 47.1 65.6 4.1ms 256 47.8M 88.1B
YOLOv5x 49.0 49.0 67.4 6.4ms 164 89.0M 166.4B
YOLOv5x + TTA 50.4 50.4 68.5 23.4ms 43 89.0M 354.3B
YOLOv3-SPP 45.6 45.5 65.2 4.5ms 222 63.0M 118.0B

** APtest denotes COCO test-dev2017 server results, all other AP results in the table denote val2017 accuracy.
** All AP numbers are for single-model single-scale without ensemble or test-time augmentation. Reproduce by python test.py --data coco.yaml --img 672 --conf 0.001
** SpeedGPU measures end-to-end time per image averaged over 5000 COCO val2017 images using a GCP n1-standard-16 instance with one V100 GPU, and includes image preprocessing, PyTorch FP16 image inference at --batch-size 32 --img-size 640, postprocessing and NMS. Average NMS time included in this chart is 1-2ms/img. Reproduce by python test.py --data coco.yaml --img 640 --conf 0.1
** All checkpoints are trained to 300 epochs with default settings and hyperparameters (no autoaugmentation).

v2.0

23 Jul 22:48
2a149a4
Compare
Choose a tag to compare

Breaking Changes

IMPORTANT: v2.0 release contains breaking changes. Models trained with earlier versions will not operate correctly with v2.0. The last commit before v2.0 that operates correctly with all earlier pretrained models is:
https://github.com/ultralytics/yolov5/tree/5e970d45c44fff11d1eb29bfc21bed9553abf986

To clone last commit prior to v2.0:

git clone https://github.com/ultralytics/yolov5  # clone repo
cd yolov5
git reset --hard 5e970d4  # last commit before v2.0

Bug Fixes

  • Various

Added Functionality

  • Various

** GPU Speed measures end-to-end time per image averaged over 5000 COCO val2017 images using a V100 GPU with batch size 8, and includes image preprocessing, PyTorch FP16 inference, postprocessing and NMS.

  • July 23, 2020: v2.0 release: improved model definition, training and mAP.
  • June 22, 2020: PANet updates: new heads, reduced parameters, improved speed and mAP 364fcfd.
  • June 19, 2020: FP16 as new default for smaller checkpoints and faster inference d4c6674.
  • June 9, 2020: CSP updates: improved speed, size, and accuracy (credit to @WongKinYiu for CSP).
  • May 27, 2020: Public release. YOLOv5 models are SOTA among all known YOLO implementations.
  • April 1, 2020: Start development of future compound-scaled YOLOv3/YOLOv4-based PyTorch models.

Pretrained Checkpoints

Model APval APtest AP50 SpeedGPU FPSGPU params FLOPS
YOLOv5s 36.1 36.1 55.3 2.1ms 476 7.5M 13.2B
YOLOv5m 43.5 43.5 62.5 3.0ms 333 21.8M 39.4B
YOLOv5l 47.0 47.1 65.6 3.9ms 256 47.8M 88.1B
YOLOv5x 49.0 49.0 67.4 6.1ms 164 89.0M 166.4B
YOLOv3-SPP 45.6 45.5 65.2 4.5ms 222 63.0M 118.0B

** APtest denotes COCO test-dev2017 server results, all other AP results in the table denote val2017 accuracy.
** All AP numbers are for single-model single-scale without ensemble or test-time augmentation. Reproduce by python test.py --data coco.yaml --img 672 --conf 0.001
** SpeedGPU measures end-to-end time per image averaged over 5000 COCO val2017 images using a GCP n1-standard-16 instance with one V100 GPU, and includes image preprocessing, PyTorch FP16 image inference at --batch-size 32 --img-size 640, postprocessing and NMS. Average NMS time included in this chart is 1-2ms/img. Reproduce by python test.py --data coco.yaml --img 640 --conf 0.1
** All checkpoints are trained to 300 epochs with default settings and hyperparameters (no autoaugmentation).

v1.0

25 Jun 19:25
8f7e11d
Compare
Choose a tag to compare

YOLOv5 1.0 Release Notes

  • June 22, 2020: PANet updates: increased layers, reduced parameters, faster inference and improved mAP 364fcfd.
  • June 19, 2020: FP16 as new default for smaller checkpoints and faster inference d4c6674.
  • June 9, 2020: CSP updates: improved speed, size, and accuracy. Credit to @WongKinYiu for excellent CSP work.
  • May 27, 2020: Public release of repo. YOLOv5 models are SOTA among all known YOLO implementations.
  • April 1, 2020: Start development of future YOLOv3/YOLOv4-based PyTorch models in a range of compound-scaled sizes.

** GPU Speed measures end-to-end time per image averaged over 5000 COCO val2017 images using a V100 GPU with batch size 8, and includes image preprocessing, PyTorch FP16 inference, postprocessing and NMS.

Pretrained Checkpoints

Model APval APtest AP50 SpeedGPU FPSGPU params FLOPS
YOLOv5s 36.6 36.6 55.8 2.1ms 476 7.5M 13.2B
YOLOv5m 43.4 43.4 62.4 3.0ms 333 21.8M 39.4B
YOLOv5l 46.6 46.7 65.4 3.9ms 256 47.8M 88.1B
YOLOv5x 48.4 48.4 66.9 6.1ms 164 89.0M 166.4B
YOLOv3-SPP 45.6 45.5 65.2 4.5ms 222 63.0M 118.0B

** APtest denotes COCO test-dev2017 server results, all other AP results in the table denote val2017 accuracy.
** All AP numbers are for single-model single-scale without ensemble or test-time augmentation. Reproduce by python test.py --img 736 --conf 0.001
** SpeedGPU measures end-to-end time per image averaged over 5000 COCO val2017 images using a GCP n1-standard-16 instance with one V100 GPU, and includes image preprocessing, PyTorch FP16 image inference at --batch-size 32 --img-size 640, postprocessing and NMS. Average NMS time included in this chart is 1-2ms/img. Reproduce by python test.py --img 640 --conf 0.1
** All checkpoints are trained to 300 epochs with default settings and hyperparameters (no autoaugmentation).