Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"CUDA error: No kernel image" still exists after reinstalling torch-points-kernels #100

Open
maosuli opened this issue Mar 26, 2023 · 0 comments

Comments

@maosuli
Copy link

maosuli commented Mar 26, 2023

Hi,

I have to compile the "torch-points-kernels" library in my workstation and then run the code in a remote server using the same conda environment.

The "CUDA error" happened after I submitted the job to the remote server although I could run the code well in my workstation.

Following your solution, I uninstalled the library, cleared the cache, and reinstalled it on my workstation after setting the TORCH_CUDA_ARCH_LIST.

But the same error still happened.

I checked the two GPU cards, which were Quadro RTX 6000 (Turing SM 75) and Tesla V100 (Volta SM70), respectively. And I set 'export TORCH_CUDA_ARCH_LIST="7.0;7.5"' before I reinstalled the library.

The error details are as follows,

Traceback (most recent call last):
File "train_s_stransformer.py", line 613, in
main()
File "train_s_stransformer.py", line 92, in main
main_worker(args.train_gpu, args.ngpus_per_node, args)
File "train_s_stransformer.py", line 327, in main_worker
loss_train, mIoU_train, mAcc_train, allAcc_train= train(train_loader, model, criterion, optimizer, epoch, scaler, scheduler)
File "train_s_stransformer.py", line 426, in train
output = model(feat, coord, offset, batch, neighbor_idx)
File "/home/xxx/.conda/envs/s_transformer10/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/xxx/.conda/envs/s_transformer10/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 166, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/xxx/.conda/envs/s_transformer10/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/xxx/3dSegmentation/stratified_transformer/Stratified-Transformer-main/model/stratified_transformer.py", line 453, in forward
feats, xyz, offset, feats_down, xyz_down, offset_down = layer(feats, xyz, offset)
File "/home/xxx/.conda/envs/s_transformer10/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/xxx/3dSegmentation/stratified_transformer/Stratified-Transformer-main/model/stratified_transformer.py", line 281, in forward
v2p_map, p2v_map, counts = grid_sample(xyz, batch, window_size, start=None)
File "/home/xxx/3dSegmentation/stratified_transformer/Stratified-Transformer-main/model/stratified_transformer.py", line 59, in grid_sample
unique, cluster, counts = torch.unique(cluster, sorted=True, return_inverse=True, return_counts=True)
File "/home/xxx/.conda/envs/s_transformer10/lib/python3.7/site-packages/torch/_jit_internal.py", line 421, in fn
return if_true(*args, **kwargs)
File "/home/xxx/.conda/envs/s_transformer10/lib/python3.7/site-packages/torch/_jit_internal.py", line 421, in fn
return if_true(*args, **kwargs)
File "/home/xxx/.conda/envs/s_transformer10/lib/python3.7/site-packages/torch/functional.py", line 769, in _unique_impl
return_counts=return_counts,
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Please give me some advice on how to use it.

Best,

Eric.

@maosuli maosuli changed the title "CUDA error still No kernel image" still exists after reinstalling torch-points-kernels "CUDA error: No kernel image" still exists after reinstalling torch-points-kernels Mar 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant