Error when loading in RTMDet ONNX file in OpenCV [Bug] #2048

cozeybozey · 2023-05-03T09:21:58Z

Checklist

I have searched related issues but cannot get the expected help.
2. I have read the FAQ documentation but cannot get the expected help.
3. The bug has not been fixed in the latest version.

Describe the bug

I get an "Inconsistent shape for ConcatLayer" error when I am trying to load in the produced ONNX file in OpenCV. I have tried to use opset 12 and 11, but both led to the same error. I also tried to use main branch in MMDeploy and the dev-1.x branch, but again they both resulted in the same error. Finally I tried simplifying the ONNX file via https://convertmodel.com/#input=onnx&output=onnx, but this did not help either. Thanks in advance for your help!

Reproduction

python tools/torch2onnx.py segmentation_onnxruntime_dynamic-100x100-300x300.py model_config.py model.pth img.bmp

The config file looks like:
base = ["../mmdet/instance-seg/instance-seg_onnxruntime_dynamic.py"]

onnx_config = dict(
opset_version=12,
save_file='rtmdet.onnx'
)

Finally the script that loads in the model into OpenCV looks like:
import onnx
import cv2

Check whether ONNX is correct

onnx_path = "./rtmdet.onnx"
onnx_model = onnx.load(onnx_path)
onnx.checker.check_model(onnx_model)

net = cv2.dnn.readNetFromONNX(onnx_path)
print("OpenCV model was successfully read. Layer IDs: \n", net.getLayerNames())

Environment

05/03 11:02:50 - mmengine - INFO - 

05/03 11:02:50 - mmengine - INFO - **********Environmental information**********
05/03 11:02:50 - mmengine - INFO - sys.platform: linux
05/03 11:02:50 - mmengine - INFO - Python: 3.8.16 (default, Mar  2 2023, 03:21:46) [GCC 11.2.0]
05/03 11:02:50 - mmengine - INFO - CUDA available: True
05/03 11:02:50 - mmengine - INFO - numpy_random_seed: 2147483648
05/03 11:02:50 - mmengine - INFO - GPU 0,1: NVIDIA GeForce GTX 1080
05/03 11:02:50 - mmengine - INFO - CUDA_HOME: /usr
05/03 11:02:50 - mmengine - INFO - NVCC: Cuda compilation tools, release 11.8, V11.8.89
05/03 11:02:50 - mmengine - INFO - GCC: gcc (Debian 12.2.0-14) 12.2.0
05/03 11:02:50 - mmengine - INFO - PyTorch: 1.8.1
05/03 11:02:50 - mmengine - INFO - PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v1.7.0 (Git Hash 7aed236906b1f7a05c0917e5257a1af05e9ff683)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.1
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.0.5
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.8.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 

05/03 11:02:50 - mmengine - INFO - TorchVision: 0.9.1
05/03 11:02:50 - mmengine - INFO - OpenCV: 4.7.0
05/03 11:02:50 - mmengine - INFO - MMEngine: 0.7.3
05/03 11:02:50 - mmengine - INFO - MMCV: 2.0.0
05/03 11:02:50 - mmengine - INFO - MMCV Compiler: GCC 7.3
05/03 11:02:50 - mmengine - INFO - MMCV CUDA Compiler: 11.1
05/03 11:02:50 - mmengine - INFO - MMDeploy: 1.0.0+0196cd0
05/03 11:02:50 - mmengine - INFO - 

05/03 11:02:50 - mmengine - INFO - **********Backend information**********
05/03 11:02:50 - mmengine - INFO - tensorrt:	None
05/03 11:02:50 - mmengine - INFO - ONNXRuntime:	None
05/03 11:02:50 - mmengine - INFO - ONNXRuntime-gpu:	1.8.1
05/03 11:02:50 - mmengine - INFO - ONNXRuntime custom ops:	Available
05/03 11:02:50 - mmengine - INFO - pplnn:	None
05/03 11:02:50 - mmengine - INFO - ncnn:	None
05/03 11:02:50 - mmengine - INFO - snpe:	None
05/03 11:02:50 - mmengine - INFO - openvino:	None
05/03 11:02:50 - mmengine - INFO - torchscript:	1.8.1
05/03 11:02:50 - mmengine - INFO - torchscript custom ops:	NotAvailable
05/03 11:02:50 - mmengine - INFO - rknn-toolkit:	None
05/03 11:02:50 - mmengine - INFO - rknn-toolkit2:	None
05/03 11:02:50 - mmengine - INFO - ascend:	None
05/03 11:02:50 - mmengine - INFO - coreml:	None
05/03 11:02:50 - mmengine - INFO - tvm:	None
05/03 11:02:50 - mmengine - INFO - vacc:	None
05/03 11:02:50 - mmengine - INFO - 

05/03 11:02:50 - mmengine - INFO - **********Codebase information**********
05/03 11:02:50 - mmengine - INFO - mmdet:	3.0.0
05/03 11:02:50 - mmengine - INFO - mmseg:	None
05/03 11:02:50 - mmengine - INFO - mmcls:	None
05/03 11:02:50 - mmengine - INFO - mmocr:	None
05/03 11:02:50 - mmengine - INFO - mmedit:	None
05/03 11:02:50 - mmengine - INFO - mmdet3d:	None
05/03 11:02:50 - mmengine - INFO - mmpose:	None
05/03 11:02:50 - mmengine - INFO - mmrotate:	None
05/03 11:02:50 - mmengine - INFO - mmaction:	None
05/03 11:02:50 - mmengine - INFO - mmrazor:	None

Error traceback

[ERROR:0@0.113] global onnx_importer.cpp:1051 handleNode DNN/ONNX: ERROR during processing node with 2 inputs and 1 outputs: [Concat]:(onnx_node!Concat_151) from domain='ai.onnx'
Traceback (most recent call last):
  File "test.py", line 10, in <module>
    net = cv2.dnn.readNetFromONNX(onnx_path)
cv2.error: OpenCV(4.7.0) /io/opencv/modules/dnn/src/onnx/onnx_importer.cpp:1073: error: (-2:Unspecified error) in function 'handleNode'
> Node [Concat@ai.onnx]:(onnx_node!Concat_151) parse error: OpenCV(4.7.0) /io/opencv/modules/dnn/src/layers/concat_layer.cpp:109: error: (-201:Incorrect size of input array) Inconsistent shape for ConcatLayer in function 'getMemoryShapes'

RunningLeon · 2023-05-04T02:04:51Z

@cozeybozey hi, sorry for the trouble. Actually, we did not tested the produced onnx model in OpenCV framework. If you use a deploy config with onnxruntime backend, you could go to the visualized output results and check if they are ok at first.
BTW, it seems you are converting a custom model config and custom deploy config. maybe you could try the officially supported model in here.

cozeybozey · 2023-05-04T06:27:27Z

When running deploy.py with the same configs and model I get the following error:

05/04 08:16:34 - mmengine - INFO - Start pipeline mmdeploy.apis.pytorch2onnx.torch2onnx in subprocess
05/04 08:16:34 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "Codebases" registry tree. As a workaround, the current "Codebases" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized.
05/04 08:16:34 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "mmdet_tasks" registry tree. As a workaround, the current "mmdet_tasks" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized.
Loads checkpoint by local backend from path: /home/user/Repositories/mmdetection/work_dirs/rtmdet_fruit_config/epoch_5000.pth
05/04 08:16:34 - mmengine - WARNING - DeprecationWarning: get_onnx_config will be deprecated in the future. 
05/04 08:16:34 - mmengine - INFO - Export PyTorch model to ONNX: /home/user/Repositories/mmdeploy/rtmdet.onnx.
05/04 08:16:34 - mmengine - WARNING - Can not find torch._C._jit_pass_onnx_autograd_function_process, function rewrite will not be applied
/home/user/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmdeploy/core/optimizers/function_marker.py:160: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  ys_shape = tuple(int(s) for s in ys.shape)
/home/user/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/torch/nn/functional.py:3454: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
  warnings.warn(
/home/user/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmdeploy/codebase/mmdet/models/dense_heads/rtmdet_ins_head.py:143: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  iou_threshold = torch.tensor([iou_threshold], dtype=torch.float32)
/home/user/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmdeploy/codebase/mmdet/models/dense_heads/rtmdet_ins_head.py:144: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  score_threshold = torch.tensor([score_threshold], dtype=torch.float32)
/home/user/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmdeploy/pytorch/functions/topk.py:28: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  k = torch.tensor(k, device=input.device, dtype=torch.long)
/home/user/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmdeploy/mmcv/ops/nms.py:44: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  score_threshold = float(score_threshold)
/home/user/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmdeploy/mmcv/ops/nms.py:45: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  iou_threshold = float(iou_threshold)
/home/user/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmcv/ops/nms.py:123: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert boxes.size(1) == 4
/home/user/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmcv/ops/nms.py:124: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert boxes.size(0) == scores.size(0)
/home/user/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmcv/ops/nms.py:30: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if max_num > 0:
/home/user/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmdeploy/codebase/mmdet/models/dense_heads/rtmdet_ins_head.py:264: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if num_inst < 1:
/home/user/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/torch/onnx/symbolic_opset9.py:2620: UserWarning: Exporting aten::index operator of advanced indexing in opset 12 is achieved by combination of multiple ONNX operators, including Reshape, Transpose, Concat, and Gather. If indices include negative values, the exported graph will produce incorrect results.
  warnings.warn("Exporting aten::index operator of advanced indexing in opset " +
05/04 08:16:43 - mmengine - INFO - Execute onnx optimize passes.
05/04 08:16:43 - mmengine - WARNING - Can not optimize model, please build torchscipt extension.
More details: https://github.com/open-mmlab/mmdeploy/tree/1.x/docs/en/experimental/onnx_optimizer.md
05/04 08:16:43 - mmengine - INFO - Finish pipeline mmdeploy.apis.pytorch2onnx.torch2onnx
05/04 08:16:43 - mmengine - INFO - Start pipeline mmdeploy.apis.utils.utils.to_backend in main process
05/04 08:16:43 - mmengine - INFO - Finish pipeline mmdeploy.apis.utils.utils.to_backend
05/04 08:16:43 - mmengine - INFO - visualize onnxruntime model start.
05/04 08:16:44 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "Codebases" registry tree. As a workaround, the current "Codebases" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized.
05/04 08:16:44 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "mmdet_tasks" registry tree. As a workaround, the current "mmdet_tasks" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized.
05/04 08:16:44 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "backend_detectors" registry tree. As a workaround, the current "backend_detectors" registry in "mmdeploy" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized.
2023-05-04:08:16:44 - root - ERROR - class `End2EndModel` in mmdeploy/codebase/mmdet/deploy/object_detection_model.py: libcudnn.so.8: cannot open shared object file: No such file or directory
Traceback (most recent call last):
  File "/home/user/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmengine/registry/build_functions.py", line 122, in build_from_cfg
    obj = obj_cls(**args)  # type: ignore
  File "/home/user/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmdeploy/codebase/mmdet/deploy/object_detection_model.py", line 53, in __init__
    self._init_wrapper(
  File "/home/user/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmdeploy/codebase/mmdet/deploy/object_detection_model.py", line 67, in _init_wrapper
    self.wrapper = BaseBackendModel._build_wrapper(
  File "/home/user/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmdeploy/codebase/base/backend_model.py", line 65, in _build_wrapper
    return backend_mgr.build_wrapper(backend_files, device, input_names,
  File "/home/user/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmdeploy/backend/onnxruntime/backend_manager.py", line 33, in build_wrapper
    from .wrapper import ORTWrapper
  File "/home/user/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmdeploy/backend/onnxruntime/wrapper.py", line 5, in <module>
    import onnxruntime as ort
  File "/home/user/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/onnxruntime/__init__.py", line 34, in <module>
    raise import_capi_exception
  File "/home/user/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/onnxruntime/__init__.py", line 23, in <module>
    from onnxruntime.capi._pybind_state import get_all_providers, get_available_providers, get_device, set_seed, \
  File "/home/user/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/onnxruntime/capi/_pybind_state.py", line 11, in <module>
    from . import _ld_preload  # noqa: F401
  File "/home/user/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/onnxruntime/capi/_ld_preload.py", line 14, in <module>
    _libcudnn = CDLL("libcudnn.so.8", mode=RTLD_GLOBAL)
  File "/home/user/miniconda3/envs/mmdeploy/lib/python3.8/ctypes/__init__.py", line 373, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libcudnn.so.8: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmdeploy/utils/utils.py", line 41, in target_wrapper
    result = target(*args, **kwargs)
  File "/home/user/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmdeploy/apis/visualize.py", line 65, in visualize_model
    model = task_processor.build_backend_model(
  File "/home/user/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmdeploy/codebase/mmdet/deploy/object_detection.py", line 157, in build_backend_model
    model = build_object_detection_model(
  File "/home/user/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmdeploy/codebase/mmdet/deploy/object_detection_model.py", line 933, in build_object_detection_model
    backend_detector = __BACKEND_MODEL.build(
  File "/home/user/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmengine/registry/registry.py", line 548, in build
    return self.build_func(cfg, *args, **kwargs, registry=self)
  File "/home/user/miniconda3/envs/mmdeploy/lib/python3.8/site-packages/mmengine/registry/build_functions.py", line 144, in build_from_cfg
    raise type(e)(
OSError: class `End2EndModel` in mmdeploy/codebase/mmdet/deploy/object_detection_model.py: libcudnn.so.8: cannot open shared object file: No such file or directory
05/04 08:16:44 - mmengine - ERROR - tools/deploy.py - create_process - 82 - visualize onnxruntime model failed.

Also I sort of need this to work for RTMDet specifically and in the link you send I don't see RTMDet in the list. However according to this issue (#1662) it should be able to export RTMDet models to ONNX files right?

Finally I get a ton of warnings. I get the same warnings when running torch2onnx. Could you maybe give some insight into whether those warnings are problematic and if so, why they are occurring?

RunningLeon · 2023-05-04T07:04:44Z

@cozeybozey Hi

rtmdet is supported.
lots of warning from pytorch could be ignored.
it seems you are using onnxruntime-gpu but did not download the cuda lib. Pls. follow this doc and download gpu version or simply pip uninstall onnxruntime-gpu and reinstall cpu version.

cozeybozey · 2023-05-04T08:27:47Z

I deleted my entire conda environment to remake it with onnxruntime instead of onnxruntime-gpu. But I am following the get started guide and now it is saying that I don't have onnxruntime installed. Shouldn't the following code snippet in the get started section also contain "pip install onnxruntime"?

After installing "pip install onnxruntime" the deploy script does run successfully. Unfortunately though, I still get the same OpenCV error.

RunningLeon · 2023-05-04T09:00:28Z

@cozeybozey hi, as mentioned before, we only make sure the produced onnx models from mmdeploy can be used in onnxruntime, tensorrt and other backends. OpenCV is not included.

cozeybozey · 2023-05-04T09:16:46Z

I see, well thanks for the help anyway!

RunningLeon self-assigned this May 4, 2023

RunningLeon added the mmdet label May 4, 2023

cozeybozey closed this as completed May 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when loading in RTMDet ONNX file in OpenCV [Bug] #2048

Error when loading in RTMDet ONNX file in OpenCV [Bug] #2048

cozeybozey commented May 3, 2023

RunningLeon commented May 4, 2023

cozeybozey commented May 4, 2023 •

edited

Loading

RunningLeon commented May 4, 2023

cozeybozey commented May 4, 2023

RunningLeon commented May 4, 2023

cozeybozey commented May 4, 2023

Error when loading in RTMDet ONNX file in OpenCV [Bug] #2048

Error when loading in RTMDet ONNX file in OpenCV [Bug] #2048

Comments

cozeybozey commented May 3, 2023

Checklist

Describe the bug

Reproduction

Check whether ONNX is correct

Environment

Error traceback

RunningLeon commented May 4, 2023

cozeybozey commented May 4, 2023 • edited Loading

RunningLeon commented May 4, 2023

cozeybozey commented May 4, 2023

RunningLeon commented May 4, 2023

cozeybozey commented May 4, 2023

cozeybozey commented May 4, 2023 •

edited

Loading