Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change broadcast Add/Mul to element-wise Add/Mul in Detect layer #4811

Closed
jebastin-nadar opened this issue Sep 15, 2021 · 17 comments · Fixed by #4833 or #5136
Closed

Change broadcast Add/Mul to element-wise Add/Mul in Detect layer #4811

jebastin-nadar opened this issue Sep 15, 2021 · 17 comments · Fixed by #4833 or #5136
Labels
enhancement New feature or request

Comments

@jebastin-nadar
Copy link
Contributor

🚀 Feature

Motivation

ONNX model produced by export.py is not compatible for inference (even with --simplify) in OpenCV's DNN module, as mentioned in these issues #4471 opencv/opencv#20072.

The problematic nodes are 2 broadcast add and mul nodes in the final detect layer. OpenCV's DNN module cannot handle these broadcast operations currently leading to errors.

Screenshot (115)_LI

Pitch

The add node comes from the broadcast add of self.grid

xy = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * self.stride[i] # xy

and the mul node from the broadcast mul of self.anchor_grid

wh = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i].view(1, self.na, 1, 1, 2) # wh

Both grid and anchor_grid are constant, so I suggest expanding these tensors to their respective input sizes using pytorch's expand or repeat operation so that an elementwise operation is used. I have tried modifying the Detect to expand these tensors, but there are additional nodes added in the final onnx model.

I request @glenn-jocher or another contributor to take a look at this so that the exported yolov5 onnx model can be used in opencv for faster inference.

@jebastin-nadar jebastin-nadar added the enhancement New feature or request label Sep 15, 2021
@github-actions
Copy link
Contributor

github-actions bot commented Sep 15, 2021

👋 Hello @SamFC10, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://ultralytics.com or email Glenn Jocher at glenn.jocher@ultralytics.com.

Requirements

Python>=3.6.0 with all requirements.txt installed including PyTorch>=1.7. To get started:

$ git clone https://github.com/ultralytics/yolov5
$ cd yolov5
$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

@glenn-jocher
Copy link
Member

glenn-jocher commented Sep 15, 2021

@SamFC10 thanks for the explanation! Expanding these may seem simple, but be advised input shapes are constantly changing, so if you expand you must check shapes and redefine for every batch, which will slow things down and is not required for pytorch inference or training.

The fastest and easiest way to incorporate your ideas into the official codebase is to submit a Pull Request (PR) implementing your idea, and if applicable providing before and after profiling/inference/training results to help us understand the improvement your feature provides. This allows us to directly see the changes in the code and to understand how they affect workflows and performance.

Please see our ✅ Contributing Guide to get started.

@glenn-jocher glenn-jocher added the TODO High priority items label Sep 15, 2021
@glenn-jocher
Copy link
Member

@SamFC10 can you verify that expanding works with DNN?

self.grid[i] = self.grid[i].expand(bs, self.na, -1, -1, -1)
self.anchor_grid[i] = self.anchor_grid[i].expand(bs, -1, ny, nx, -1)

@jebastin-nadar
Copy link
Contributor Author

self.anchor_grid[i] = self.anchor_grid[i].expand(bs, -1, ny, nx, -1)

Causes an error during creation of onnx model

File "/content/yolov5/models/yolo.py", line 62, in forward
    self.anchor_grid[i] = self.anchor_grid[i].expand(bs, -1, ny, nx, -1)
RuntimeError: The expanded size of the tensor (1) must match the existing size (80) at non-singleton dimension 3. 
              Target sizes: [1, 3, 1, 1, 2].  Tensor sizes: [3, 80, 80, 2]

@jebastin-nadar
Copy link
Contributor Author

slow things down and is not required for pytorch inference or training

This was my concern as well. I have tried few methods myself to expand these grids to the correct shape and minimize any computation overhead.

  • For the broadcast "add" node, I expanded the grid in _make_grid() itself and modified the call for _make_grid()
- def _make_grid(nx=20, ny=20):
+ def _make_grid(nx=20, ny=20, na=3):
        yv, xv = torch.meshgrid([torch.arange(ny), torch.arange(nx)])
-        return torch.stack((xv, yv), 2).view((1, 1, ny, nx, 2)).float()
+        return torch.stack((xv, yv), 2).expand((1, na, ny, nx, 2)).float()

This changes the addition to elementwise without any significant overhead (I think!)

  • Expanding self.anchor_grid seems to be problematic and any solutions that I have tried causes additional nodes in the final onnx model (with --simplify, these nodes are removed)
- wh = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i].view(1, self.na, 1, 1, 2)  # wh
+ wh = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i].view(1, self.na, 1, 1, 2).expand(1, self.na, ny, nx, 2)  # wh

Without --simplify (notice the extra nodes at the right)
Screenshot (117) (1)

With --simplify
Screenshot (118)


This new onnx model with --simplify runs perfectly in OpenCV DNN. The onnx model without --simplify option has some extra nodes, so I wonder if there is a better way of expanding anchor_grid or creating anchor_grid from anchors differently.

@glenn-jocher
Copy link
Member

@SamFC10 got it! Please submit a PR with the fix that works for DNN and we will take a look at it there. There's probably always going to be added ops that should be balanced against improved exportability, maybe we can introduce an --expand flag to Detect() that is true only for ONNX export.

@GioFic95
Copy link

GioFic95 commented Oct 4, 2021

This new onnx model with --simplify runs perfectly in OpenCV DNN.

Hi @SamFC10, I tried your fix but I still have some issues with the integration of YOLOv5 into OpenCV DNN. Could you please share the code you use for inference? Thank you very much in advance.

@jebastin-nadar
Copy link
Contributor Author

@GioFic95 Make sure you are using my fork of yolov5 as my fixes haven't been merged yet. Checkout export-dnn-simple branch for the fix in the comment above or export-dnn branch for the fix in the PR.

git clone --single-branch --branch export-dnn-simple https://github.com/SamFC10/yolov5.git
# git clone --single-branch --branch export-dnn https://github.com/SamFC10/yolov5.git
cd yolov5
python3 export.py --weights yolov5s.pt --include onnx --simplify

Code :

import numpy as np
import cv2

inp = np.random.rand(1, 3, 640, 640).astype(np.float32)
net = cv2.dnn.readNetFromONNX('yolov5s.onnx')
net.setInput(inp)
out = net.forward()
print(out.shape)

returns (1, 25200, 85) with both the branches.

@GioFic95
Copy link

GioFic95 commented Oct 5, 2021

@SamFC10 I checked that I have the same output shape as you, but nonetheless the results obtained via OpenCV aren't the same I obtain via PyTorch.

That is, applying this code to the output in OpenCV ONNX (in C++) I get these results:
res_onnx

While the original results, obtained directly with the trained model are the following:
res

Moreover, these are the results obtained using detect.py with the model exported using your repo (the same used in OpenCV ONNX):
detect

Do you have any advice on how to solve the issue or hypothesis about its reason to suggest?
Thank you very much again.

@jebastin-nadar
Copy link
Contributor Author

@GioFic95
To verify if the new onnx model works using OpenCV DNN, I did the following:

git clone --single-branch --branch export-dnn-simple https://github.com/SamFC10/yolov5.git
cd yolov5
python3 export.py --weights yolov5s.pt --include onnx --simplify

1. Inference using ONNXRuntime

python3 detect.py --weights yolov5s.onnx

zidane

2. Inference using OpenCV DNN

To use opencv instead of onnxruntime in detect.py, make these changes
Line 89 :

- check_requirements(('onnx', 'onnxruntime'))
- import onnxruntime
- session = onnxruntime.InferenceSession(w, None)
+ net = cv2.dnn.readNetFromONNX(w)

Line 147

- pred = torch.tensor(session.run([session.get_outputs()[0].name], {session.get_inputs()[0].name: img}))
+ net.setInput(img)
+ pred = torch.tensor(net.forward())

Again using

python3 detect.py --weights yolov5s.onnx

zidane_new

No visible difference


hypothesis about its reason

I suspect there is something wrong in post-processing in your C++ code (see this #708 (comment)). I'm not an expert in C++, so can't point out where exactly is the mistake. To check this, maybe try the opposite of what I did. In your C++ code, use onnxruntime instead of opencv and use the exported onnx model from the master repository. If the outputs are still wrong, then post-processing steps has some bugs.

@glenn-jocher
Copy link
Member

@SamFC10 might be nice to have a --use-dnn flag in detect.py to simplify this comparison.

@msly
Copy link

msly commented Oct 9, 2021

@SamFC10 dnn (0.554s) is slower than onnxruntime(0.318) in same onnx file

@jebastin-nadar
Copy link
Contributor Author

@msly Yes opencv inference is slower than onnxruntime (difference of around 50ms - 100ms on my device).

The goal of this issue and related PR is to not improve inference speed, but rather making the onnx export of yolov5 compatible with various other backends and not limit it to onnxruntime.

@glenn-jocher glenn-jocher removed the TODO High priority items label Oct 11, 2021
@glenn-jocher
Copy link
Member

Removed TODO after PR #4833 merged.

@glenn-jocher glenn-jocher linked a pull request Oct 11, 2021 that will close this issue
@glenn-jocher
Copy link
Member

glenn-jocher commented Oct 11, 2021

@SamFC10 I've opened a new PR #5136 to add DNN inference to detect.py using your example here:

@GioFic95 Make sure you are using my fork of yolov5 as my fixes haven't been merged yet. Checkout export-dnn-simple branch for the fix in the comment above or export-dnn branch for the fix in the PR.

git clone --single-branch --branch export-dnn-simple https://github.com/SamFC10/yolov5.git
# git clone --single-branch --branch export-dnn https://github.com/SamFC10/yolov5.git
cd yolov5
python3 export.py --weights yolov5s.pt --include onnx --simplify

Code :

import numpy as np
import cv2

inp = np.random.rand(1, 3, 640, 640).astype(np.float32)
net = cv2.dnn.readNetFromONNX('yolov5s.onnx')
net.setInput(inp)
out = net.forward()
print(out.shape)

returns (1, 25200, 85) with both the branches.

But I am running into a bug on net = cv2.dnn.readNetFromONNX(w)

(venv) glennjocher@Glenns-iMac yolov5 % python detect.py --weights yolov5s.onnx --dnn
detect: weights=['yolov5s.onnx'], source=data/images, imgsz=[640, 640], conf_thres=0.25, iou_thres=0.45, max_det=1000, device=, view_img=False, save_txt=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=runs/detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False, dnn=True
YOLOv5 🚀 v5.0-509-g9d75e42 torch 1.9.1 CPU

[ERROR:0] global /private/var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/pip-req-build-vy_omupv/opencv/modules/dnn/src/onnx/onnx_importer.cpp (2127) handleNode DNN/ONNX: ERROR during processing node with 2 inputs and 1 outputs: [Unsqueeze]:(390)
Traceback (most recent call last):
  File "/Users/glennjocher/PycharmProjects/yolov5/detect.py", line 306, in <module>
    main(opt)
  File "/Users/glennjocher/PycharmProjects/yolov5/detect.py", line 301, in main
    run(**vars(opt))
  File "/Users/glennjocher/PycharmProjects/yolov5/venv/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "/Users/glennjocher/PycharmProjects/yolov5/detect.py", line 92, in run
    net = cv2.dnn.readNetFromONNX(w)
cv2.error: OpenCV(4.5.3) /private/var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/pip-req-build-vy_omupv/opencv/modules/dnn/src/onnx/onnx_importer.cpp:2146: error: (-2:Unspecified error) in function 'handleNode'
> Node [Unsqueeze]:(390) parse error: OpenCV(4.5.3) /private/var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/pip-req-build-vy_omupv/opencv/modules/dnn/src/onnx/onnx_importer.cpp:1551: error: (-215:Assertion failed) node_proto.input_size() == 1 in function 'handleNode'

I created the ONNX model simply with python export.py --weights yolov5s.pt --include onnx. Any idea what might be happening?

@jebastin-nadar
Copy link
Contributor Author

OpenCV(4.5.3)

Need to use latest version i.e. 4.5.4 which was released just a few days ago. I have added a fix for this exact error in opencv opencv/opencv#20713 which will be present in the 4.5.4 version.

opencv-python is still on 4.5.3 so we need to wait till the latest version is released which contains the fix. Once it is released,
pip install -U opencv-python should solve the issue.

Meanwhile, the following onnx models should also work with OpenCV 4.5.3:

python export.py --weights yolov5s.pt --include onnx --opset 11
or
python export.py --weights yolov5s.pt --include onnx --simplify

@glenn-jocher
Copy link
Member

@SamFC10 thanks! I've added your comments to the PR and a new commented line to check the >=4.5.4 requirement, will uncomment line once version is released.

            # check_requirements(('opencv-python>=4.5.4',))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
4 participants