Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

源码安装的GPU版本,选择AI模型时会闪退 #500

Closed
catxiaoyi opened this issue Jul 4, 2024 · 8 comments
Closed

源码安装的GPU版本,选择AI模型时会闪退 #500

catxiaoyi opened this issue Jul 4, 2024 · 8 comments
Labels
question Further information is requested

Comments

@catxiaoyi
Copy link

你好,非常感谢你的贡献!
我的安装步骤是:
a)创建一个名为X-AnyLabeling的文件夹
b)打开anaconda prompt终端,创建一个名为anylabeling的虚拟环境,python版本为3.8
conda create -n anylabeling  python=3.8
c)激活环境 conda activate anylabeling
d)进入步骤a)中创建的文件夹中,cd C:\Users\Administrator\X-AnyLabeling
e)从官网在该文件夹下git clone代码,git clone https://github.com/CVHub520/X-AnyLabeling.git
f)找到文件夹下的子文件./anylabeling/app_info.py文件,将__preferred_device__ = "CPU" 改为 GPU
g)安装依赖文件requirements,命令是pip install -r requirements-gpu.txt
h)安装cuda,这里用的是直接安装torch的方法,在pytorch官网找到pytorch v2.2.2版本下wheel安装(pip)模式中:conda install pytorch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 pytorch-cuda=11.8 -c pytorch -c nvidia 用来安装cuda
f)在当前环境下输入 python anylabeling/app.py 打开软件

按照以上步骤打开软件后,选择AI模型SAM-HQ软件就会闪崩。
报错信息:
(anylabeling) C:\Users\Administrator\X-AnyLabeling\X-AnyLabeling>python anylabeling/app.py
qt.qpa.fonts: Unable to enumerate family ' "Droid Sans Mono Dotted for Powerline" '
qt.qpa.fonts: Unable to enumerate family ' "Droid Sans Mono Slashed for Powerline" '
qt.qpa.fonts: Unable to enumerate family ' "Roboto Mono Medium for Powerline" '
qt.qpa.fonts: Unable to enumerate family ' "Ubuntu Mono derivative Powerline" '
Traceback (most recent call last):
File "C:\Users\Administrator\X-AnyLabeling\X-AnyLabeling.\anylabeling\utils.py", line 15, in run
self.func(*self.args, **self.kwargs)
File "C:\Users\Administrator\X-AnyLabeling\X-AnyLabeling.\anylabeling\services\auto_labeling\model_manager.py", line 965, in load_model
from .sam_hq import SAM_HQ
File "C:\Users\Administrator\X-AnyLabeling\X-AnyLabeling.\anylabeling\services\auto_labeling\sam_hq.py", line 7, in
import onnxruntime
File "D:\Users\Administrator\anaconda3\envs\anylabeling\lib\site-packages\onnxruntime_init
.py", line 57, in
raise import_capi_exception
File "D:\Users\Administrator\anaconda3\envs\anylabeling\lib\site-packages\onnxruntime_init_.py", line 23, in
from onnxruntime.capi._pybind_state import ExecutionMode # noqa: F401
File "D:\Users\Administrator\anaconda3\envs\anylabeling\lib\site-packages\onnxruntime\capi_pybind_state.py", line 32, in
from .onnxruntime_pybind11_state import * # noqa
ImportError: DLL load failed while importing onnxruntime_pybind11_state: 动态链接库(DLL)初始化例程失败。

请问这是什么原因呢?谢谢

@CVHub520
Copy link
Owner

CVHub520 commented Jul 4, 2024

Hello! @catxiaoyi 👋

Thank you for reaching out and sharing the details of your installation process. I understand you're encountering an issue when trying to use the SAM-HQ model within X-AnyLabeling, leading to the application crashing immediately after launch. This error seems related to the ONNX Runtime library failing to initialize properly, which could be due to missing dependencies or compatibility issues.

Before we dive into potential solutions, I encourage you to check related issues, e.g., #480 and #496, ..., as it discusses similar problems and might already contain information that could help resolve your issue.

Alternatively, here are some troubleshooting steps you could try:

  1. Ensure that your CUDA and cuDNN versions are compatible with the ORT version you installed. The error message suggests that there might be a mismatch or that the required components are not correctly installed. Btw, the pytorch dependencies is no need to install.

  2. Reinstall ONNX Runtime. Sometimes, reinstalling a problematic package can resolve dependency issues. You can do this by running pip uninstall onnxruntime followed by pip install onnxruntime followed by official instruction.

  3. Verify that your system environment variables include the paths to the CUDA and cuDNN libraries. These should typically be added during the installation process.

If you continue to experience issues after trying these steps, please let us know, and we'll be happy to assist further. Also, feel free to contribute to the discussion on open/cloesed issues, as your insights might help others facing similar challenges.

Thank you for your patience and for using X-AnyLabeling! 😃

Best regards,
CVHub

@CVHub520 CVHub520 added the question Further information is requested label Jul 4, 2024
@catxiaoyi
Copy link
Author

你好,我按照#480中的方法把onnxruntime-gpu降到1.16.0版本确实解决了闪退的问题。#496问题也看过了,因为我的cuda不是12.x版本,应该不需要本地安装。
但是我在标注数据时,电脑风扇声音很大,任务管理器显示我的CPU使用率很高,GPU几乎没有被调用。
44dd32c9f24d8de774a1b8623c6211a
这样应该不对吧?
我怀疑是环境变量的原因,但不知如何修改,
我的问题是:我在安装anaconda的时候,配置过环境变量了。通过conda创建虚拟环境后,安装cuda还需要配置环境变量吗?如果有,如何操作呢,谢谢。

@CVHub520
Copy link
Owner

CVHub520 commented Jul 5, 2024

以下是一个简单的Python脚本示例,用于测试onnxruntime-gpu是否可以被正常调用。请确保您已经安装了onnxruntime-gpu版本,并且在app_info.py文件中,将device变量中的__preferred_device__设置为gpu

import onnxruntime
import app_info
# 检查onnxruntime是否安装了GPU版本
def check_onnxruntime_gpu():
    try:
        # 尝试创建一个GPU会话
        session = onnxruntime.InferenceSession('model.onnx', providers=['CUDAExecutionProvider'])
        print("ONNX Runtime with GPU support is installed and working correctly.")
    except Exception as e:
        print(f"An error occurred: {e}")
        print("Please ensure that ONNX Runtime with GPU support is installed correctly.")
    # 检查app_info.py中的设备设置
    try:
        if app_info.device['__preferred_device__'] == 'gpu':
            print("The preferred device in app_info.py is set to GPU.")
        else:
            print("Please set the '__preferred_device__' variable to 'gpu' in app_info.py.")
    except Exception as e:
        print(f"An error occurred while checking app_info.py: {e}")
# 运行检查
if __name__ == "__main__":
    print("Running ONNX Runtime GPU test...")
    check_onnxruntime_gpu()

在运行此脚本之前,请确保您有一个ONNX模型文件,并将model.onnx替换为您模型的实际文件路径。此脚本将尝试创建一个使用CUDA执行提供程序的ONNX运行时推断会话,并检查app_info.py中的设备设置。
请尝试运行此脚本来检查您的onnxruntime-gpu是否正常工作。如果脚本输出表明一切正常,那么您的环境已经准备好使用GPU进行ONNX模型推理。如果出现错误,请根据错误信息进行相应的修复。

@zoushucai
Copy link

zoushucai commented Jul 10, 2024

是的, 我也遇到同样的问题, 环境如下:

win11, conda 环境

python==3.11,  cuda11.8 ,  onnxruntime-gpu==1.18.0 

遇到的问题 都是加载 ai 模型闪退,或者加载出来了,不能预测, 预测的时候闪退

后来变成上面的环境,然后就好了

python==3.8,  cuda11.8 ,  onnxruntime-gpu==1.16.0 

@zoushucai
Copy link

原因, 我大概找到了. 由于不常用 win 下的 cuda, 因此环境变量的配置有问题,
对于 cuda==11.8 和 cudnn=8.9.X. 可能需要安装 Zlib , 并把它添加到环境变量中, (不知道 win 要不要重启, 反正我重启了系统)

然后在下面的环境中利用 GPU 编译成功,并能够运行ai 模型

win11, cuda11.8,  
python=3.11,   onnxruntime-gpu==1.18.0 

@CVHub520
Copy link
Owner

Thank you for your response regarding the solution to the issue of X-AnyLabeling failing to run on GPU in a Windows system. I appreciate your troubleshooting efforts.

It appears you've identified the root cause, which was related to the environment variable configuration due to less frequent use of CUDA on Windows. Installing Zlib and adding it to the environment variables, along with the specific versions of cuda (11.8) and cudnn (8.9.X), seems to have done the trick. Restarting the system was also part of your solution, which I will keep in mind.

I'm glad to hear that you were able to successfully compile and run the AI model using GPU in the specified environment. Your assistance is greatly appreciated!

@Andy-HKU
Copy link

I run the script, it reports:

Running ONNX Runtime GPU test...
ONNX Runtime with GPU support is installed and working correctly.
An error occurred while checking app_info.py: module 'app_info' has no attribute 'device'

how to solve this issue?

@CVHub520
Copy link
Owner

Hi @Andy-HKU, thank you for reporting the issue.

We understand that the script reports an error when checking app_info.py due to the absence of the device attribute.
Please note that the current version of app.py does not support the device parameter. Users can still run ONNX Runtime with GPU support, but they will need to follow the steps outlined in the tutorial to manually configure the environment for GPU execution.

Thank you for your understanding, and we appreciate your patience as we continue to develop and improve our services.

@CVHub520 CVHub520 closed this as completed Aug 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants