torchvision.ops.batched_nms() crashes with pytorch 1.9.0 and torchvision 0.10.0 #4071

immanuelweber · 2021-06-16T07:41:16Z

🐛 Bug

with the just released pytorch 1.9.0 and torchvision 0.10.0 torchvision.ops.batched_nms() crashes on my machine with the following error:

RuntimeError: Couldn't load custom C++ ops. This can happen if your PyTorch and torchvision versions are incompatible, or if you had errors while compiling torchvision from source. For further information on the compatible versions, check https://github.com/pytorch/vision#installation for the compatibility matrix. Please check your PyTorch version with torch.__version__ and your torchvision version with torchvision.__version__ and verify if they are compatible, and if not please reinstall torchvision so that it matches your PyTorch install.

Since both are of the current version, I guess they should be compatible (they are not yet listed in the compatibility matrix).

To Reproduce

Steps to reproduce the behavior:

this example code shows the behavior on my machine:

import torch as th
import torchvision as tv

boxes = th.zeros(1000, 4)
scores = th.zeros(1000)
idxs = th.zeros(1000)

tv.ops.batched_nms(boxes, scores, idxs, 0.5)

Expected behavior

This should not result in an error.

Environment

Collecting environment information...
PyTorch version: 1.9.0
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.5 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.27

Python version: 3.9 (64-bit runtime)
Python platform: Linux-4.15.0-144-generic-x86_64-with-glibc2.27
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: Tesla V100-SXM2-32GB
GPU 1: Tesla V100-SXM2-32GB

Nvidia driver version: 460.32.03
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.20.2
[pip3] torch==1.9.0
[pip3] torchaudio==0.9.0a0+33b2469
[pip3] torchvision==0.10.0
[conda] blas 1.0 mkl
[conda] cudatoolkit 10.2.89 h8f6ccaa_8 conda-forge
[conda] ffmpeg 4.3 hf484d3e_0 pytorch
[conda] mkl 2021.2.0 h726a3e6_389 conda-forge
[conda] mkl-service 2.4.0 py39h3811e60_0 conda-forge
[conda] mkl_fft 1.3.0 py39h42c9631_2
[conda] mkl_random 1.2.2 py39hde0f152_0 conda-forge
[conda] numpy 1.20.2 py39h2d18471_0
[conda] numpy-base 1.20.2 py39hfae3a4d_0
[conda] pytorch 1.9.0 py3.9_cuda10.2_cudnn7.6.5_0 pytorch
[conda] torchaudio 0.9.0 py39 pytorch
[conda] torchvision 0.10.0 py39_cu102 pytorch

Additional context

The text was updated successfully, but these errors were encountered:

KonstantinKhabarlak · 2021-06-16T09:28:07Z

Can also report a REGRESSION.
A similar issue has occurred to me when running torch.jit.script
Code that worked with pytorch 1.8.0 and torchvision 0.9.1 after update to pytorch 1.9.0 and torchvision 0.10.0 now fails with:

RuntimeError: 
object has no attribute nms:
  File "C:\tools\Anaconda3\lib\site-packages\torchvision\ops\boxes.py", line 35
    """
    _assert_has_ops()
    return torch.ops.torchvision.nms(boxes, scores, iou_threshold)
           ~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
'nms' is being compiled since it was called from '_batched_nms_vanilla'
  File "C:\tools\Anaconda3\lib\site-packages\torchvision\ops\boxes.py", line 102
    for class_id in torch.unique(idxs):
        curr_indices = torch.where(idxs == class_id)[0]
        curr_keep_indices = nms(boxes[curr_indices], scores[curr_indices], iou_threshold)
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
        keep_mask[curr_indices[curr_keep_indices]] = True
    keep_indices = torch.where(keep_mask)[0]
'_batched_nms_vanilla' is being compiled since it was called from 'batched_nms'
  File "C:\tools\Anaconda3\lib\site-packages\torchvision\ops\boxes.py", line 66
    # Ideally for GPU we'd use a higher threshold
    if boxes.numel() > 4_000 and not torchvision._is_tracing():
        return _batched_nms_vanilla(boxes, scores, idxs, iou_threshold)
               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    else:
        return _batched_nms_coordinate_trick(boxes, scores, idxs, iou_threshold)

fmassa · 2021-06-16T09:40:06Z

Thanks for the reports.

We are looking into this

NicolasHug · 2021-06-16T09:41:38Z

For ref I'm unable to reproduce on OSX with conda create -n new pytorch torchvision -c pytorch, the tests pass just fine.

PyTorch version: 1.9.0
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 11.3.1 (x86_64)
GCC version: Could not collect
Clang version: 12.0.0 (clang-1200.0.32.29)
CMake version: Could not collect
Libc version: N/A

Python version: 3.8 (64-bit runtime)
Python platform: macOS-10.16-x86_64-i386-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.20.2
[pip3] torch==1.9.0
[pip3] torchvision==0.10.0
[conda] blas                      1.0                         mkl
[conda] ffmpeg                    4.3                  h0a44026_0    pytorch
[conda] mkl                       2021.2.0           hecd8cb5_269
[conda] mkl-service               2.3.0            py38h9ed2024_1
[conda] mkl_fft                   1.3.0            py38h4a7008c_2
[conda] mkl_random                1.2.1            py38hb2f4e1b_2
[conda] numpy                     1.20.2           py38h4b4dc7a_0
[conda] numpy-base                1.20.2           py38he0bd621_0
[conda] pytorch                   1.9.0                   py3.8_0    pytorch
[conda] torchvision               0.10.0                 py38_cpu    pytorch

fmassa · 2021-06-16T09:47:36Z

FYI I've also tried with pip by doing

conda create -n test python=3.9
pip install torch torchvision

on a GPU machine and it worked fine.

Collecting environment information...
PyTorch version: 1.9.0+cu102
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.1 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.31

Python version: 3.9 (64-bit runtime)
Python platform: Linux-5.4.0-52-generic-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: Quadro GP100
GPU 1: Quadro GP100

Nvidia driver version: 450.80.02
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.20.3
[pip3] torch==1.9.0
[pip3] torchvision==0.10.0
[conda] numpy                     1.20.3                    <pip>
[conda] torch                     1.9.0                     <pip>
[conda] torchvision               0.10.0                    <pip>

I'm now trying on conda with the same environment

dodobyte · 2021-06-16T10:16:10Z

Have the same issue, installed with conda, also a GPU machine.

NicolasHug · 2021-06-16T10:20:49Z

On a Linux GPU machine it looks like torchvision 0.2.2 gets installed. I tried both with cuda 10.2 and 11.1 and both fail with AttributeError: module 'torchvision' has no attribute 'ops'.

conda create -n new python=3.9 pytorch torchvision cudatoolkit=10.2 -c pytorch

Collecting environment information...
PyTorch version: 1.9.0
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.5 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: Could not collect
CMake version: version 3.18.4
Libc version: glibc-2.27

Python version: 3.9 (64-bit runtime)
Python platform: Linux-5.4.0-1041-aws-x86_64-with-glibc2.27
Is CUDA available: True
CUDA runtime version: 11.1.105
GPU models and configuration:
GPU 0: Tesla V100-SXM2-16GB
GPU 1: Tesla V100-SXM2-16GB
GPU 2: Tesla V100-SXM2-16GB
GPU 3: Tesla V100-SXM2-16GB

Nvidia driver version: 450.80.02
cuDNN version: Probably one of the following:
/usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudnn.so.7.6.5
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudnn.so.7.6.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn.so.8.0.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.0.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.0.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.0.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.0.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.0.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.0.5
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.20.2
[pip3] torch==1.9.0
[pip3] torchvision==0.2.2
[conda] blas                      1.0                         mkl
[conda] cudatoolkit               10.2.89              hfd86e86_1
[conda] mkl                       2021.2.0           h06a4308_296
[conda] mkl-service               2.3.0            py39h27cfd23_1
[conda] mkl_fft                   1.3.0            py39h42c9631_2
[conda] mkl_random                1.2.1            py39ha9443f7_2
[conda] numpy                     1.20.2           py39h2d18471_0
[conda] numpy-base                1.20.2           py39hfae3a4d_0
[conda] pytorch                   1.9.0           py3.9_cuda10.2_cudnn7.6.5_0    pytorch
[conda] torchvision               0.2.2                      py_3    pytorch

conda create -n new python=3.9 pytorch torchvision cudatoolkit=11.1 -c pytorch -c nvidia

PyTorch version: 1.9.0
Is debug build: False
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.5 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: Could not collect
CMake version: version 3.18.4
Libc version: glibc-2.27

Python version: 3.9 (64-bit runtime)
Python platform: Linux-5.4.0-1041-aws-x86_64-with-glibc2.27
Is CUDA available: True
CUDA runtime version: 11.1.105
GPU models and configuration:
GPU 0: Tesla V100-SXM2-16GB
GPU 1: Tesla V100-SXM2-16GB
GPU 2: Tesla V100-SXM2-16GB
GPU 3: Tesla V100-SXM2-16GB

Nvidia driver version: 450.80.02
cuDNN version: Probably one of the following:
/usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudnn.so.7.6.5
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudnn.so.7.6.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn.so.8.0.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.0.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.0.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.0.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.0.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.0.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.0.5
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.20.2
[pip3] torch==1.9.0
[pip3] torchvision==0.2.2
[conda] blas                      1.0                         mkl
[conda] cudatoolkit               11.1.74              h6bb024c_0    nvidia
[conda] mkl                       2021.2.0           h06a4308_296
[conda] mkl-service               2.3.0            py39h27cfd23_1
[conda] mkl_fft                   1.3.0            py39h42c9631_2
[conda] mkl_random                1.2.1            py39ha9443f7_2
[conda] numpy                     1.20.2           py39h2d18471_0
[conda] numpy-base                1.20.2           py39hfae3a4d_0
[conda] pytorch                   1.9.0           py3.9_cuda11.1_cudnn8.0.5_0    pytorch
[conda] torchvision               0.2.2                      py_3    pytorch

immanuelweber · 2021-06-16T11:10:51Z

@NicolasHug regarding, 0.2.2, yesterday I also observed that sometimes conda only found this version, when uninstalling torchvision 0.10.0 and reinstalling it, but I am unable to recreate this at this moment.
the installation line you posted results in 0.10.0 being installed on my machine.

fmassa · 2021-06-16T13:16:54Z

Looking at https://anaconda.org/pytorch/torchvision/files, it seems that the py39_cu102 and py39_cu111 are available, so I'm not sure why it's not being found.

@malfet @seemethere there are problems with torchvision CUDA binaries on Linux for Python 3.9 (details in #4071 (comment)).

And I've just tried with Python 3.8, and even though I'm able to install matching versions, I get the same issue as originally reported in #4071 (comment)

In https://anaconda.org/pytorch/torchvision/files, the dates for torchvision binaries dates from 14 days ago, are we sure we copied the new ones that have been regenerated? Looking at the torchvision RCs in https://anaconda.org/pytorch-test/torchvision/files, they have been generated yesterday, so maybe we copied the wrong files when promoting the binaries?

malfet · 2021-06-16T14:45:57Z

Hmm, sample code fails for me with

RuntimeError: boxes should be a 2d tensor, got 3D

This one works as expected:

$ python -c "import torch as th; import torchvision as tv; print(tv.ops.batched_nms(th.zeros(100, 4), th.zeros(100), th.zeros(100), 0.5))"
tensor([62, 74, 73, 72, 71, 70, 69, 68, 67, 66, 65, 64, 63, 75, 61, 60, 59, 58,
        57, 56, 55, 54, 53, 52, 51, 87, 99, 98, 97, 96, 95, 94, 93, 92, 91, 90,
        89, 88, 50, 86, 85, 84, 83, 82, 81, 80, 79, 78, 77, 76, 12, 24, 23, 22,
        21, 20, 19, 18, 17, 16, 15, 14, 13, 25, 11, 10,  9,  8,  7,  6,  5,  4,
         3,  2,  1, 37, 49, 48, 47, 46, 45, 44, 43, 42, 41, 40, 39, 38,  0, 36,
        35, 34, 33, 32, 31, 30, 29, 28, 27, 26])

immanuelweber · 2021-06-16T14:51:03Z

@malfet you are right, should have mentioned that I just created these tensors to satisfy the inputs without caring for actual correct input. it stills fails with the above-mentioned error on my machine. I updated the sample code above however accordingly.

immanuelweber · 2021-06-16T15:06:07Z

Since @fmassa pointed to https://anaconda.org/pytorch-test/torchvision/files, I just installed from there and the sample works.

malfet · 2021-06-16T15:07:28Z

torchvision in https://anaconda.org/pytorch channel was build against 9d5561b whereas one in https://anaconda.org/pytorch was build against ae9963f
I guess promoting package from one channel to another should resolve the issue

fmassa · 2021-06-16T15:10:50Z

@malfet yes, I've tested by installing torchvision from the pytorch-test channel and it works, so promoting the packages should fix the issue

@malfet note that there are no functional differences in the torchvision code in 9d5561b vs ae9963f, just that the PyTorch versions in between when the RC was cut has changed

immanuelweber · 2021-06-20T14:54:15Z

I just checked the packages on PyTorch channel, and they are up-to-date now and the code is working. Am I allowed to close this issue then?

NicolasHug · 2021-06-21T12:48:44Z

@egonuel Could you please detail the command that you run and that's now working?

When I run conda create -n test python=3.9 pytorch torchvision cudatoolkit=11.1 -c pytorch -c nvidia, I still get torchvision 0.2.2, so it seems that not everything is fixed yet

immanuelweber · 2021-06-21T14:25:51Z

@NicolasHug mmhh, this seems to be a different issue. When I run the line you posted on my machine, everything is fine and 0.10.0 is being installed

malfet · 2021-06-21T14:29:51Z

I think difference can be explained by presence/absence of conda-forge in ones .condarc. I got the repro after removing conda-forge dependency, but than fixed it by enabling it in the install command as follows:

conda create -n test python=3.9 pytorch torchvision cudatoolkit=11.1 -c pytorch -c nvidia -c conda-forge

ChouCHou-y · 2021-08-07T06:15:02Z

with the just released pytorch 1.9.0 and torchvision 0.10.0 torchvision.ops.batched_nms() crashes on my machine with the following error:

RuntimeError: Couldn't load custom C++ ops. This can happen if your PyTorch and torchvision versions are incompatible, or if you had errors while compiling torchvision from source. For further information on the compatible versions, check https://github.com/pytorch/vision#installation for the compatibility matrix. Please check your PyTorch version with torch.version and your torchvision version with torchvision.version and verify if they are compatible, and if not please reinstall torchvision so that it matches your PyTorch install.

how to solve？please

fmassa · 2021-08-09T12:45:39Z

@ChouCHou-y this issue should have been fixed in #4240 (comment) , can you try uninstalling torchvision and installing it again?

fmassa added bug high priority topic: binaries labels Jun 16, 2021

pytorch-probot bot added the triage review label Jun 16, 2021

NicolasHug mentioned this issue Jun 17, 2021

Cannot install any version of torchvision newer than 0.2.2 with opencv for python 3.9 and pytorch 1.9.0 #4076

Open

NicolasHug added the release-issue For release-related issues label Jun 21, 2021

NicolasHug mentioned this issue Aug 2, 2021

Installing from conda with official installation command latest PyTorch 1.9.0 installs torchvision 0.2.2 #4240

Closed

vadimkantorov mentioned this issue Aug 2, 2021

[feature request] Implement CI testing of installation commands pytorch/pytorch#62590

Closed

lzhornyak mentioned this issue Aug 9, 2021

Checksum mismatch for pytorch::torchvision-0.10.0-py39_cu111 #4261

Closed

NicolasHug mentioned this issue Aug 13, 2021

when install torchvision, conda can not install latest version, only version 0.2.2 can be installed #4273

Open

datumbox closed this as completed Feb 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

torchvision.ops.batched_nms() crashes with pytorch 1.9.0 and torchvision 0.10.0 #4071

torchvision.ops.batched_nms() crashes with pytorch 1.9.0 and torchvision 0.10.0 #4071

immanuelweber commented Jun 16, 2021 •

edited

Loading

KonstantinKhabarlak commented Jun 16, 2021 •

edited

Loading

fmassa commented Jun 16, 2021

NicolasHug commented Jun 16, 2021

fmassa commented Jun 16, 2021

dodobyte commented Jun 16, 2021

NicolasHug commented Jun 16, 2021

immanuelweber commented Jun 16, 2021 •

edited

Loading

fmassa commented Jun 16, 2021

malfet commented Jun 16, 2021

immanuelweber commented Jun 16, 2021

immanuelweber commented Jun 16, 2021 •

edited

Loading

malfet commented Jun 16, 2021

fmassa commented Jun 16, 2021 •

edited

Loading

immanuelweber commented Jun 20, 2021

NicolasHug commented Jun 21, 2021

immanuelweber commented Jun 21, 2021

malfet commented Jun 21, 2021

ChouCHou-y commented Aug 7, 2021

fmassa commented Aug 9, 2021

torchvision.ops.batched_nms() crashes with pytorch 1.9.0 and torchvision 0.10.0 #4071

torchvision.ops.batched_nms() crashes with pytorch 1.9.0 and torchvision 0.10.0 #4071

Comments

immanuelweber commented Jun 16, 2021 • edited Loading

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

KonstantinKhabarlak commented Jun 16, 2021 • edited Loading

fmassa commented Jun 16, 2021

NicolasHug commented Jun 16, 2021

fmassa commented Jun 16, 2021

dodobyte commented Jun 16, 2021

NicolasHug commented Jun 16, 2021

immanuelweber commented Jun 16, 2021 • edited Loading

fmassa commented Jun 16, 2021

malfet commented Jun 16, 2021

immanuelweber commented Jun 16, 2021

immanuelweber commented Jun 16, 2021 • edited Loading

malfet commented Jun 16, 2021

fmassa commented Jun 16, 2021 • edited Loading

immanuelweber commented Jun 20, 2021

NicolasHug commented Jun 21, 2021

immanuelweber commented Jun 21, 2021

malfet commented Jun 21, 2021

ChouCHou-y commented Aug 7, 2021

fmassa commented Aug 9, 2021

immanuelweber commented Jun 16, 2021 •

edited

Loading

KonstantinKhabarlak commented Jun 16, 2021 •

edited

Loading

immanuelweber commented Jun 16, 2021 •

edited

Loading

immanuelweber commented Jun 16, 2021 •

edited

Loading

fmassa commented Jun 16, 2021 •

edited

Loading