Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

同样是运行2(3)时出错 #11

Open
Mei118 opened this issue Sep 29, 2023 · 3 comments
Open

同样是运行2(3)时出错 #11

Mei118 opened this issue Sep 29, 2023 · 3 comments

Comments

@Mei118
Copy link

Mei118 commented Sep 29, 2023

更改主动旋转滤波器的cuda源码,将#include <THC/THC.h>注释掉,把THCudaCheck替换为AT_CUDA_CHECK,并替换THCCeilDiv(x,y)为(x+y-1)/y,完成以上操作之前我试了一下再次运行python setup.py build_ext --inplace观察是否为原本的THC问题,但是报出错误变成了IndexError: list index out of range,完成THC更改的操作后,运行同样出现了这个错误,使用了python setup.py clean命令再运行仍然报错,部分报错代码如下File "/home/mjh/anaconda3/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1694, in _get_cuda_arch_flags
arch_list[-1] += '+PTX'
IndexError: list index out of range
麻烦作者大大了,感谢感谢

@chongkuiqi
Copy link
Owner

chongkuiqi commented Sep 30, 2023

更改主动旋转滤波器的cuda源码,将#include <THC/THC.h>注释掉,把THCudaCheck替换为AT_CUDA_CHECK,并替换THCCeilDiv(x,y)为(x+y-1)/y,完成以上操作之前我试了一下再次运行python setup.py build_ext --inplace观察是否为原本的THC问题,但是报出错误变成了IndexError: list index out of range,完成THC更改的操作后,运行同样出现了这个错误,使用了python setup.py clean命令再运行仍然报错,部分报错代码如下File "/home/mjh/anaconda3/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1694, in _get_cuda_arch_flags arch_list[-1] += '+PTX' IndexError: list index out of range 麻烦作者大大了,感谢感谢

这个问题我没有遇到过,可用尝试以下两种方法看看是否能够解决,如果成功了麻烦告诉我一下:
(1)首先确保torch.cuda.is_available()为True,然后进行编译
(2)参考这个帖子https://github.com/pytorch/extension-cpp/issues/71,进行修改

@Mei118
Copy link
Author

Mei118 commented Sep 30, 2023

已经成功解决了,我参考的是下面这个链接内容
https://blog.csdn.net/iLOVEJohnny/article/details/123074279
一开始torch.cuda.is_available()为False,于是我按照步骤检查我的CUDA和cuDNN,因为忘记显卡信息,我在终端输入了nvidia-smi,报错为NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
后来才知道是因为重启Ubuntu导致的内核更新使得新版本内核和原来显卡驱动不匹配,后来我根据帖子https://blog.csdn.net/xiaojinger_123/article/details/121161446中内容操作torch.cuda.is_available()为True后再进行编译就没有报错了
但是现在有个新问题,运行
cd S2ANet
python setup.py build_ext --inplace
显示copy的编译文件并没有出现在显示的文件夹中,导致训练模型时出现扩展包无法识别的情况

@chongkuiqi
Copy link
Owner

已经成功解决了,我参考的是下面这个链接内容 https://blog.csdn.net/iLOVEJohnny/article/details/123074279 一开始torch.cuda.is_available()为False,于是我按照步骤检查我的CUDA和cuDNN,因为忘记显卡信息,我在终端输入了nvidia-smi,报错为NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running. 后来才知道是因为重启Ubuntu导致的内核更新使得新版本内核和原来显卡驱动不匹配,后来我根据帖子https://blog.csdn.net/xiaojinger_123/article/details/121161446中内容操作torch.cuda.is_available()为True后再进行编译就没有报错了 但是现在有个新问题,运行 cd S2ANet python setup.py build_ext --inplace 显示copy的编译文件并没有出现在显示的文件夹中,导致训练模型时出现扩展包无法识别的情况

把编译过程产生的build文件夹整个删掉,然后重新编译

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants