Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CoreML Export Error: export failure: 'torch._C.Node' object has no attribute 'ival' #2961

Closed
samygarg opened this issue Apr 28, 2021 · 33 comments · Fixed by #3055
Closed

CoreML Export Error: export failure: 'torch._C.Node' object has no attribute 'ival' #2961

samygarg opened this issue Apr 28, 2021 · 33 comments · Fixed by #3055
Labels
bug Something isn't working

Comments

@samygarg
Copy link

🐛 Bug

I am trying to export the default trained YOLOv5 Model as given here to CoreML but getting an error on both Colab as well as my laptop:

CoreML: export failure: 'torch._C.Node' object has no attribute 'ival'

Screenshot 2021-04-28 at 13 55 39

To Reproduce (REQUIRED)

Follow the steps mentioned here.

Expected behavior

Export the CoreML model successfully.

Environment

Colab and Macbook Pro 13 inch 2019.

@samygarg samygarg added the bug Something isn't working label Apr 28, 2021
@github-actions
Copy link
Contributor

github-actions bot commented Apr 28, 2021

👋 Hello @samygarg, thank you for your interest in 🚀 YOLOv5! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://www.ultralytics.com or email Glenn Jocher at glenn.jocher@ultralytics.com.

Requirements

Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.7. To install run:

$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

@meng1994412
Copy link

I have the same issue.

The environment I have is:
OS: Ubuntu 18.04
Packages (by using pip install -r requirement.txt):
torch==1.8.1
torchvision==0.9.1
coremltools==4.1
onnx==1.9.0
scikit-learn==0.19.2

@haynec
Copy link

haynec commented Apr 29, 2021

I also get the same issue.

The environment I have is:
OS: Ubuntu 20.04
Packages (by using pip install -r requirement.txt):
torch==1.8.1
torchvision==0.9.1
coremltools==4.1
onnx==1.9.0
scikit-learn==0.19.2

@JorgeCeja
Copy link

Hey everyone, it appears that optimize_for_mobile from torch is what causes the incompatibility issue with coremltools.

The solution is to comment the line before export. Optimally it should be an arg option, pull request anyone?

ts = optimize_for_mobile(ts) # https://pytorch.org/tutorials/recipes/script_optimized.html

@haynec
Copy link

haynec commented Apr 29, 2021

Thanks @JorgeCeja, that solution worked for me.

@samygarg
Copy link
Author

samygarg commented May 4, 2021

@JorgeCeja It solved the original issue but it still doesn't work. Tried on colab as well as macbook pro.

Here's what I am getting:

CoreML: starting export with coremltools 4.1...
Tuple detected at graph output. This will be flattened in the converted model.
Converting graph.
Adding op '1' of type const
Adding op '2' of type const
...
Converting op 728 : sub
Adding op '728' of type sub
Converting op 729 : add
Adding op '729' of type add
Converting op 730 : select
Converting Frontend ==> MIL Ops:  78% 545/695 [00:00<00:00, 933.85 ops/s] 
CoreML: export failure:

@meng1994412
Copy link

Thanks @JorgeCeja, the solution you provided worked for me.

@glemarivero
Copy link

glemarivero commented May 6, 2021

Hey everyone, it appears that optimize_for_mobile from torch is what causes the incompatibility issue with coremltools.

The solution is to comment the line before export. Optimally it should be an arg option, pull request anyone?

ts = optimize_for_mobile(ts) # https://pytorch.org/tutorials/recipes/script_optimized.html

@JorgeCeja Can you share your environment? This change fixed the original issue, but I still face the same issue as @samygarg

@pocketpixels
Copy link

It seems CoreML export is broken in multiple ways currently. The export did work with the above change until very recently. However only when not specifying --grid, which meant that the Detect module did not get exported. When trying to export with --grid you would get the same export failure at op 730.
Commit b292837 from issue #2982 (from May 3rd) changed the export implementation to export the Detect module by default.

@pocketpixels
Copy link

After trying many different previous commits (and different Pytorch versions) today my impression is that exporting the whole network including the Detect module to CoreML probably never worked? If anyone knows of a version/commit (and environment) where it did work I would love to know.

@zhedahe
Copy link

zhedahe commented May 6, 2021

@JorgeCeja It solved the original issue but it still doesn't work. Tried on colab as well as macbook pro.

Here's what I am getting:

CoreML: starting export with coremltools 4.1...
Tuple detected at graph output. This will be flattened in the converted model.
Converting graph.
Adding op '1' of type const
Adding op '2' of type const
...
Converting op 728 : sub
Adding op '728' of type sub
Converting op 729 : add
Adding op '729' of type add
Converting op 730 : select
Converting Frontend ==> MIL Ops:  78% 545/695 [00:00<00:00, 933.85 ops/s] 
CoreML: export failure:

I meet the same question, my enviroment:
OS: Ubuntu 16.04
Packages (by using pip install -r requirement.txt):
torch==1.8.1
torchvision==0.9.1
coremltools==4.1
onnx==1.9.0
scikit-learn==0.19.2

error message:
……
Converting op 725 : constant
Adding op '725' of type const
Converting op 726 : mul
Adding op '726' of type mul
Converting op 727 : constant
Adding op '727' of type const
Converting op 728 : sub
Adding op '728' of type sub
Converting op 729 : add
Adding op '729' of type add
Converting op 730 : select
Converting Frontend ==> MIL Ops: 87%|███▍| 604/695 [00:00<00:00, 1149.09 ops/s]
CoreML: export failure:

hope somebody gave advice, thanks!

@glenn-jocher glenn-jocher linked a pull request May 6, 2021 that will close this issue
@glenn-jocher
Copy link
Member

glenn-jocher commented May 6, 2021

@meng1994412 @haynec @JorgeCeja @samygarg good news 😃! Your original issue may now been fixed ✅ in PR #3055. Note that this does not solve CoreML export completely, but it should resolve the original error message in this issue.

To receive this update you can:

  • git pull from within your yolov5/ directory
  • git clone https://github.com/ultralytics/yolov5 again
  • Force-reload PyTorch Hub: model = torch.hub.load('ultralytics/yolov5', 'yolov5s', force_reload=True)
  • View our updated notebooks: Open In Colab Open In Kaggle

Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 🚀!

@jedikim
Copy link

jedikim commented May 6, 2021

@JorgeCeja It solved the original issue but it still doesn't work. Tried on colab as well as macbook pro.
Here's what I am getting:

CoreML: starting export with coremltools 4.1...
Tuple detected at graph output. This will be flattened in the converted model.
Converting graph.
Adding op '1' of type const
Adding op '2' of type const
...
Converting op 728 : sub
Adding op '728' of type sub
Converting op 729 : add
Adding op '729' of type add
Converting op 730 : select
Converting Frontend ==> MIL Ops:  78% 545/695 [00:00<00:00, 933.85 ops/s] 
CoreML: export failure:

I meet the same question, my enviroment:
OS: Ubuntu 16.04
Packages (by using pip install -r requirement.txt):
torch==1.8.1
torchvision==0.9.1
coremltools==4.1
onnx==1.9.0
scikit-learn==0.19.2

error message:
……
Converting op 725 : constant
Adding op '725' of type const
Converting op 726 : mul
Adding op '726' of type mul
Converting op 727 : constant
Adding op '727' of type const
Converting op 728 : sub
Adding op '728' of type sub
Converting op 729 : add
Adding op '729' of type add
Converting op 730 : select
Converting Frontend ==> MIL Ops: 87%|███▍| 604/695 [00:00<00:00, 1149.09 ops/s]
CoreML: export failure:

hope somebody gave advice, thanks!

@glenn-jocher using current version(hotfixed),
Still this issue is happen with me.

OS: Ubuntu 20.04
Packages (by using pip install -r requirement.txt):
torch==1.8.1
torchvision==0.9.1
coremltools==4.1
onnx==1.9.0
scikit-learn==0.19.2

@meng1994412 @haynec @JorgeCeja @samygarg good news 😃! Your original issue may now been fixed ✅ in PR #3055. Note that this does not solve CoreML export completely, but it should resolve the original error message in this issue.

To receive this update you can:

  • git pull from within your yolov5/ directory
  • git clone https://github.com/ultralytics/yolov5 again
  • Force-reload PyTorch Hub: model = torch.hub.load('ultralytics/yolov5', 'yolov5s', force_reload=True)
  • View our updated notebooks: Open In Colab Open In Kaggle

Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 🚀!

@glenn-jocher using current version(hotfixed),
Still this issue is happen with me.

OS: Ubuntu 20.04
Packages (by using pip install -r requirement.txt):
torch==1.8.1
torchvision==0.9.1
coremltools==4.1
onnx==1.9.0
scikit-learn==0.19.2

Adding op '724' of type slice_by_index
Adding op '724_begin_0' of type const
Adding op '724_end_0' of type const
Adding op '724_end_mask_0' of type const
Converting op 725 : constant
Adding op '725' of type const
Converting op 726 : mul
Adding op '726' of type mul
Converting op 727 : constant
Adding op '727' of type const
Converting op 728 : sub
Adding op '728' of type sub
Converting op 729 : add
Adding op '729' of type add
Converting op 730 : select
Converting Frontend ==> MIL Ops: 87%|████▎| 604/695 [00:00<00:00, 970.22 ops/s]
CoreML: export failure:

@pocketpixels
Copy link

Still this issue is happen with me.

@jedikim He mentioned that this does not (yet?) fix CoreML export, it only fixes the particular issue reported in this bug report (the first post at the top).

@zhedahe
Copy link

zhedahe commented May 7, 2021

@JorgeCeja It solved the original issue but it still doesn't work. Tried on colab as well as macbook pro.
Here's what I am getting:

CoreML: starting export with coremltools 4.1...
Tuple detected at graph output. This will be flattened in the converted model.
Converting graph.
Adding op '1' of type const
Adding op '2' of type const
...
Converting op 728 : sub
Adding op '728' of type sub
Converting op 729 : add
Adding op '729' of type add
Converting op 730 : select
Converting Frontend ==> MIL Ops:  78% 545/695 [00:00<00:00, 933.85 ops/s] 
CoreML: export failure:

I meet the same question, my enviroment:
OS: Ubuntu 16.04
Packages (by using pip install -r requirement.txt):
torch==1.8.1
torchvision==0.9.1
coremltools==4.1
onnx==1.9.0
scikit-learn==0.19.2

error message:
……
Converting op 725 : constant
Adding op '725' of type const
Converting op 726 : mul
Adding op '726' of type mul
Converting op 727 : constant
Adding op '727' of type const
Converting op 728 : sub
Adding op '728' of type sub
Converting op 729 : add
Adding op '729' of type add
Converting op 730 : select
Converting Frontend ==> MIL Ops: 87%|███▍| 604/695 [00:00<00:00, 1149.09 ops/s]
CoreML: export failure:

hope somebody gave advice, thanks!

I try it again after update, but it output the same err result with yesterday, so as glenn-jocher mentioned above: this does not solve CoreML export completely

@glemarivero
Copy link

Here are my two cents on this:
You can checkout a previous commit such as 33712d6
and comment line

ts = optimize_for_mobile(ts) # https://pytorch.org/tutorials/recipes/script_optimized.html

as they said above. However, as @pocketpixels said, this will not export the complete model. Instead the outputs will be the nl outputs given by:
x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()

Which means you have to do the grid scaling operations in the CoreML side, and concatenate the nl results to obtain a [n_achors x (nc+5)] matrix.
Then you will need to adapt this to the input format of the Non Maxima Suppression layer:

  • split them into boxes_scores and boxes_coordsarrays
  • and also may need to adapt it to the input format of the layer (xx, yc, w, h) in normalized coords (not exactly sure of the format of yolov5 output)

I'm pretty new to using CoreM builder. So far I'm using this as my guideline: https://github.com/hollance/coreml-survival-guide/blob/master/MobileNetV2%2BSSDLite/ssdlite.py
If anyone knows how to do it and could post the complete solution it would be great. Otherwise, I'll be working on that, and once I finish (if I do) I'll post it here.

@glenn-jocher
Copy link
Member

glenn-jocher commented May 7, 2021

@meng1994412 @haynec @JorgeCeja @samygarg @glemarivero good news 😃! Outstanding CoreML export issues may now been fixed ✅ in a second PR #3066. This adds a --train option suitable for CoreML model export which exports the model in .train() mode rather than .eval() mode, avoiding the grid construction code that causes CoreML export to fail:

python models/export.py --train

All batchnorm fusion ops have already occured at the new model.train() point, so the only difference should be in the Detect layer.

To receive this update you can:

  • git pull from within your yolov5/ directory
  • git clone https://github.com/ultralytics/yolov5 again
  • Force-reload PyTorch Hub: model = torch.hub.load('ultralytics/yolov5', 'yolov5s', force_reload=True)
  • View our updated notebooks: Open In Colab Open In Kaggle

Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 🚀!

@zhedahe
Copy link

zhedahe commented May 7, 2021

@meng1994412 @haynec @JorgeCeja @samygarg @glemarivero good news 😃! Outstanding CoreML export issues may now been fixed ✅ in a second PR #3066. This adds a --train option suitable for CoreML model export which exports the model in .train() mode rather than .eval() mode, avoiding the grid construction code that causes CoreML export to fail:

python models/export.py --train

All batchnorm fusion ops have already occured at the new model.train() point, so the only difference should be in the Detect layer.

To receive this update you can:

* `git pull` from within your `yolov5/` directory

* `git clone https://github.com/ultralytics/yolov5` again

* Force-reload [PyTorch Hub](https://pytorch.org/hub/ultralytics_yolov5/): `model = torch.hub.load('ultralytics/yolov5', 'yolov5s', force_reload=True)`

* View our updated notebooks:  [![Open In Colab](https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667)](https://colab.research.google.com/github/ultralytics/yolov5/blob/master/tutorial.ipynb) [![Open In Kaggle](https://camo.githubusercontent.com/a08ca511178e691ace596a95d334f73cf4ce06e83a5c4a5169b8bb68cac27bef/68747470733a2f2f6b6167676c652e636f6d2f7374617469632f696d616765732f6f70656e2d696e2d6b6167676c652e737667)](https://www.kaggle.com/models/ultralytics/yolov5)

Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 🚀!

yeah!!! I input a cmd: python models/export.py --train --weights yolov5s.pt --img 640 --batch 1, and it works ok with no error!!!
thanks a lot, 3x!

@glemarivero
Copy link

glemarivero commented May 7, 2021

Thanks for adding the --train option. But we still can't use the CoreML model for inference, right?
Or am I missing something?

@glenn-jocher
Copy link
Member

glenn-jocher commented May 7, 2021

@glemarivero yes the exported model can be used for any purpose.

@glemarivero
Copy link

glemarivero commented May 7, 2021

I meant that we still need to do what I put earlier. Aren't the outputs of the model still 714, 727 and 740?
Thanks

@meng1994412
Copy link

meng1994412 commented May 7, 2021

I agree with @pocketpixels and @glemarivero.
The CoreML model (the latest update)currently got exported does not contain any detect module. Thus the CoreML model cannot be directly used for inference.

@meng1994412
Copy link

meng1994412 commented May 7, 2021

@meng1994412 @haynec @JorgeCeja @samygarg @glemarivero good news ! Outstanding CoreML export issues may now been fixed in a second PR #3066. This adds a --train option suitable for CoreML model export which exports the model in .train() mode rather than .eval() mode, avoiding the grid construction code that causes CoreML export to fail:

python models/export.py --train

All batchnorm fusion ops have already occured at the new model.train() point, so the only difference should be in the Detect layer.

To receive this update you can:

  • git pull from within your yolov5/ directory
  • git clone https://github.com/ultralytics/yolov5 again
  • Force-reload PyTorch Hub: model = torch.hub.load('ultralytics/yolov5', 'yolov5s', force_reload=True)
  • View our updated notebooks: Open In Colab Open In Kaggle

Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 !

Will the grid construction be included in CoreML export in the future update?

@pocketpixels
Copy link

It definitely would be desirable to have the detect module included in the CoreML output. And if and when we can get that to work it might also be worthwhile to add a CoreML NMS layer to the generated CoreML model (as discussed by @glemarivero).

@glenn-jocher Do you happen to know which part of the Detect implementation the CoreML converter chokes on? Maybe it could be possible to find a workaround by reformulating one of the Pytorch operations involved?

@pocketpixels
Copy link

I looked into what is causing the export failure a bit.
What I found so far is that it is related to self.stride and self.anchor_grid in the box calculations here:

yolov5/models/yolo.py

Lines 55 to 61 in d2a1728

if self.inplace:
y[..., 0:2] = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * self.stride[i] # xy
y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i] # wh
else: # for YOLOv5 on AWS Inferentia https://github.com/ultralytics/yolov5/pull/2953
xy = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * self.stride[i] # xy
wh = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i].view(1, self.na, 1, 1, 2) # wh
y = torch.cat((xy, wh, y[..., 4:]), -1)

If we comment out or remove those from the calculations then the CoreML conversion runs to completion (accessing and using self.grid in those calculations seems to be fine).

I have not yet figured out though why these are causing problems. With anchor_grid I initially suspected it could be that the tensor rank is higher than CoreML can handle. However stride is just a vector of 3 floats. It gets set outside of the module's init, maybe that could be causing the issue somehow?
I'll look into this more later, but thought I'd share what I found so far in case someone else (who is maybe more experienced with Pytorch & CoreML) has ideas and/or wants to investigate further.

@glemarivero
Copy link

Hi, I was able to put everything together. Take a look at this notebook:
example_yolov5s_to_coreml.ipynb.zip
Please let me know if you find any errors. It is only for the yolov5 small version, but it shouldn't be difficult to adapt it to the others.
Hope is useful for the rest 🙂

@pocketpixels
Copy link

@glemarivero Fantastic work, thank you for sharing!

@pocketpixels
Copy link

pocketpixels commented May 8, 2021

Continuing my investigation into the cause for the error during the CoreML export of the Detect module:
Just focusing on the --inplace branch in the code cited above, so these two lines:

yolov5/models/yolo.py

Lines 56 to 57 in d2a1728

y[..., 0:2] = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * self.stride[i] # xy
y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i] # wh

With these modifications the CoreML conversion completes without errors:

s = self.stride[i].item()
ag = self.anchor_grid[i].numpy()
y[..., 0:2] = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * s  # xy
y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * ag  # wh

That is if we force Pytorch to treat stride and anchor_grid as constants and forget how they were computed (which I believe should be ok, because they are not input dependend?) then the CoreML converter has no issues.
(I have not tried running the resulting model on iOS yet).

Clearly the above change is not the solution (as I believe it would impact inference performance), but maybe it is a good hint at what a better solution might be (for someone like @glenn-jocher who understands the code base and PyTorch better than I do)?

Update: While the conversion completes, looking at the resulting graph in Netron I don't think it actually includes the box coordinate computations.

Update 2:
Converting without --inplace and making the equivalent changes to that branch of the code does result in a model that seems to include the box coordinate computations.

s = self.stride[i].item()
ag = self.anchor_grid[i].view(1, self.na, 1, 1, 2).numpy()
xy = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * s  # xy
wh = (y[..., 2:4] * 2) ** 2 * ag  # wh
y = torch.cat((xy, wh, y[..., 4:]), -1)

coreml_export

@glenn-jocher
Copy link
Member

glenn-jocher commented May 8, 2021

@meng1994412 @glemarivero @pocketpixels to clarify, all modules including the Detect() layer are exported by export.py, no modules are missing. The --train flag simply places the model in model.train() mode, which allows the Detect() layer to sidestep the grid and concatenation ops.

yolov5/models/export.py

Lines 57 to 58 in 251aeaf

if opt.train:
model.train() # training mode (no grid construction in Detect layer)

@glemarivero
Copy link

glemarivero commented May 10, 2021

but if you do how do you continue? how do you get the final bounding boxes?

@glemarivero
Copy link

glemarivero commented May 11, 2021

In case anyone is interested, I put together a script to output a CoreML .mlmodel that can be opened with XCode (the previous model wasn't), and can be used to preview inference results inside it. Again, I only did it for yolov5s.

export.py.zip

python models/export.py --train

image

@pocketpixels
Copy link

Thanks for sharing @glemarivero.
I also wrote a similar (but different) CoreML export script that generates a CoreML model that can be previewed in Xcode and can easily be used with Apple's Vision framework and yields a VNRecognizedObjectObservation for each detected object.
I modified the code in the Detect module similar to what I discussed above (but there was still a missing step) so that it can be exported by the coremltools convert function.
It should work for all the differently sized variants of the Yolo v5 model.
To try it I recommend checking out the branch from my forked repo into a separate directory:
git clone -b better_coreml_export https://github.com/pocketpixels/yolov5.git yolov5_coreml_export
From within that directory use it with
python models/coreml_export.py --weights [model weights file]

@glemarivero
Copy link

Nice work @pocketpixels! Thanks for sharing 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants