Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Undefined behavior of torch extensions with Pytorch >1.7 and CUDA 11 [with workaround information] #3324

Closed
benjaminum opened this issue Apr 22, 2021 · 2 comments · Fixed by #5320
Labels
build/install Build or installation issue ml

Comments

@benjaminum
Copy link
Contributor

benjaminum commented Apr 22, 2021

Describe the bug
torch extensions such as the ops in the open3d.ml.torch.ops namespace have undefined behavior when using torch 1.7 or later with CUDA 11. This may result in segmentation faults or wrong results.

To Reproduce

The attached zip contains a minimal CMake project for reproducing the problem.
Cuda11Debug.zip

Expected behavior
The code is an example taken from the docs of the cub library. Due to the bug cub functions that cache return values may return unexpected values which cause the temporary memory allocation in the example to fail.
If the problem is present the test script will print temp_storage_bytes should not be 0!.

Environment (please complete the following information):

Pytorch 1.7.1
CUDA 11.0

Additional context
The problem is related to pytorch/pytorch#52663

Workaround
The problem can be avoided by compiling torch from source with the flags -Xcompiler -fno-gnu-unique as mentioned in pytorch/pytorch#52663

Wheels with this compile flag are here https://github.com/intel-isl/open3d_downloads/releases/tag/torch1.7.1

@ssheorey
Copy link
Member

With Python 1.12, CUDA 11.6, output for test_script.py:

$ python -c "import torch; print(torch.__version__)"
1.12.0+cu116
$ python test_script.py 
tensor([[1., 1., 1.]], device='cuda:0')
stream = stream 0 on device cuda:0
cuda_device_props = 0x55b3e4bba6e0
texture_alignment = 512
d_keys_in = 0x7f6b87c00000
d_keys_out = 0x7f6b87c78a00
d_values_in = 0x7f6b87cf1400
d_values_out = 0x7f6b87d69e00
d_temp_storage = 0
temp_storage_bytes = 1008639
(None, None, None)

@ssheorey
Copy link
Member

Python 1.9, CUDA 11.1 works as well:

$ python -c "import torch; print(torch.__version__)"
1.9.0+cu111
$ python test_script.py 
tensor([[1., 1., 1.]], device='cuda:0')
stream = stream 0 on device cuda:0
cuda_device_props = 0x5565813033e0
texture_alignment = 512
d_keys_in = 0x7f160e200000
d_keys_out = 0x7f160e278a00
d_values_in = 0x7f160e2f1400
d_values_out = 0x7f160e369e00
d_temp_storage = 0
temp_storage_bytes = 1003519
(None, None, None)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build/install Build or installation issue ml
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants