Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] SDK无法支持多卡GPU #2380

Closed
3 tasks
ChrisKong93 opened this issue Aug 28, 2023 · 3 comments
Closed
3 tasks

[Bug] SDK无法支持多卡GPU #2380

ChrisKong93 opened this issue Aug 28, 2023 · 3 comments
Assignees

Comments

@ChrisKong93
Copy link

Checklist

  • I have searched related issues but cannot get the expected help.
  • 2. I have read the FAQ documentation but cannot get the expected help.
  • 3. The bug has not been fixed in the latest version.

Describe the bug

我想在一个python代码中,将模型同时加载到两个GPU上,循环让两个GPU进行推理,第一次可以推理成功,第二次就报错了

Reproduction

主要代码如下:
`gpu_count = len(gpus_id)

try:
for i in range(len(gpus_id)):
gpu_id = int(gpus_id[i])
print(gpu_id)
model_path = "'./resnet50{}'".format(i)
exec('classifier{} = Classifier(model_path= {} ,device_name= {}, device_id = {})'.format(
i, model_path, "'cuda'", gpu_id))
except RuntimeError as e:
classifier = Classifier(model_path='./resnet50', device_name='cpu', device_id=0)`

Environment

08/28 09:30:11 - mmengine - INFO - 

08/28 09:30:11 - mmengine - INFO - **********Environmental information**********
08/28 09:30:11 - mmengine - INFO - sys.platform: linux
08/28 09:30:11 - mmengine - INFO - Python: 3.8.17 (default, Jul  5 2023, 21:04:15) [GCC 11.2.0]
08/28 09:30:11 - mmengine - INFO - CUDA available: True
08/28 09:30:11 - mmengine - INFO - GPU 0,1: NVIDIA GeForce RTX 3090
08/28 09:30:11 - mmengine - INFO - CUDA_HOME: /usr/local/cuda-11.3
08/28 09:30:11 - mmengine - INFO - NVCC: Cuda compilation tools, release 11.3, V11.3.109
08/28 09:30:11 - mmengine - INFO - GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
08/28 09:30:11 - mmengine - INFO - PyTorch: 1.12.1
08/28 09:30:11 - mmengine - INFO - PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.3
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.3.2  (built against CUDA 11.5)
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.3.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.12.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

08/28 09:30:11 - mmengine - INFO - TorchVision: 0.13.1
08/28 09:30:11 - mmengine - INFO - OpenCV: 4.8.0
08/28 09:30:11 - mmengine - INFO - MMCV: 1.5.1
08/28 09:30:11 - mmengine - INFO - MMCV Compiler: GCC 7.5
08/28 09:30:11 - mmengine - INFO - MMCV CUDA Compiler: 11.3
08/28 09:30:11 - mmengine - INFO - MMDeploy: 1.2.0+553f9b8
08/28 09:30:11 - mmengine - INFO - 

08/28 09:30:11 - mmengine - INFO - **********Backend information**********
08/28 09:30:11 - mmengine - INFO - tensorrt:    None
08/28 09:30:11 - mmengine - INFO - ONNXRuntime: None
08/28 09:30:11 - mmengine - INFO - ONNXRuntime-gpu:     1.8.1
08/28 09:30:11 - mmengine - INFO - ONNXRuntime custom ops:      Available
08/28 09:30:11 - mmengine - INFO - pplnn:       None
08/28 09:30:11 - mmengine - INFO - ncnn:        None
08/28 09:30:11 - mmengine - INFO - snpe:        None
08/28 09:30:11 - mmengine - INFO - openvino:    None
08/28 09:30:11 - mmengine - INFO - torchscript: 1.12.1
08/28 09:30:11 - mmengine - INFO - torchscript custom ops:      NotAvailable
08/28 09:30:11 - mmengine - INFO - rknn-toolkit:        None
08/28 09:30:11 - mmengine - INFO - rknn-toolkit2:       None
08/28 09:30:11 - mmengine - INFO - ascend:      None
08/28 09:30:11 - mmengine - INFO - coreml:      None
08/28 09:30:11 - mmengine - INFO - tvm: None
08/28 09:30:11 - mmengine - INFO - vacc:        None
08/28 09:30:11 - mmengine - INFO - 

08/28 09:30:11 - mmengine - INFO - **********Codebase information**********
08/28 09:30:11 - mmengine - INFO - mmdet:       None
08/28 09:30:11 - mmengine - INFO - mmseg:       None
08/28 09:30:11 - mmengine - INFO - mmpretrain:  None
08/28 09:30:11 - mmengine - INFO - mmocr:       None
08/28 09:30:11 - mmengine - INFO - mmagic:      None
08/28 09:30:11 - mmengine - INFO - mmdet3d:     None
08/28 09:30:11 - mmengine - INFO - mmpose:      None
08/28 09:30:11 - mmengine - INFO - mmrotate:    None
08/28 09:30:11 - mmengine - INFO - mmaction:    None
08/28 09:30:11 - mmengine - INFO - mmrazor:     None
08/28 09:30:11 - mmengine - INFO - mmyolo:      None

Error traceback

[ERROR][2023-08-28 09:35:33.983][resize.cu:1202] CUDA error: invalid resource handle
Aborted (core dumped)
@irexyc
Copy link
Collaborator

irexyc commented Aug 29, 2023

@ChrisKong93

device 管理的有点问题,我们会在下一版修复,目前的话,你可以先用 cudaSetDevice 绑一下线程和device.

如果你用多线程的,并且线程不会切换device使用的话,绑一次就可以了

pip install cuda-python==11.5
from mmdeploy_runtime import Classifier
import cv2
import numpy as np
from cuda import cudart

img = cv2.imread('/root/workspace/mmpretrain/demo/demo.JPEG')

model = []
for i in range(2):
    model.append(Classifier('/root/workspace/mmdeploy/work-dir/ort', 'cuda', i))

while True:
    for i in range(2):
        cudart.cudaSetDevice(i)
        res = model[i](img)
        print(res)

@ChrisKong93
Copy link
Author

@ChrisKong93

device 管理的有点问题,我们会在下一版修复,目前的话,你可以先用 cudaSetDevice 绑一下线程和device.

如果你用多线程的,并且线程不会切换device使用的话,绑一次就可以了

pip install cuda-python==11.5
from mmdeploy_runtime import Classifier
import cv2
import numpy as np
from cuda import cudart

img = cv2.imread('/root/workspace/mmpretrain/demo/demo.JPEG')

model = []
for i in range(2):
    model.append(Classifier('/root/workspace/mmdeploy/work-dir/ort', 'cuda', i))

while True:
    for i in range(2):
        cudart.cudaSetDevice(i)
        res = model[i](img)
        print(res)

好的,感谢,我试一下这个方法,我用这个办法解决的,效果是达到了,但不知道解决方法是不是正确的

@github-actions
Copy link

This issue is closed because it has been stale for 5 days. Please open a new issue if you have similar issues or you have any new updates now.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants