Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When will cuda 11.2 be supported? #2790

Closed
zhanglu-cst opened this issue Mar 27, 2021 · 7 comments
Closed

When will cuda 11.2 be supported? #2790

zhanglu-cst opened this issue Mar 27, 2021 · 7 comments
Labels
feature request Feature request

Comments

@zhanglu-cst
Copy link

🚀 Feature

NVIDIA 3090 need pytorch 1.8 and cuda 11.2,I tried to compile myself, but ran with an error of CUDA order.
When will cuda 11.2 be supported?

@VoVAllen
Copy link
Collaborator

VoVAllen commented Mar 27, 2021

CUDA 11.2 has been reported with some serious bugs. Therefore you can see PyTorch didn't release with 11.2 also. I don't think we will release 11.2 binary until PyTorch do so. Could you post your compilation error here? Thus we can help make it work on your machine.

@felipemello1
Copy link

Hi, just wanted to share that after 2 days debugging my code using cuda 11.1, I decided to switch pytorch/dgl to cuda 10.x, and got my program to work.

I dont have the compilation error, but the code would work normally on CPU. In GPU, it would work until a certain type of operation was done. If I tried to print the tensor, the notebook would crash. Also, trying to do reduce operations (more specifically torch.cumsum, would also break the notebook. Sending to CPU and doing the same operation would work.

The errors I got were:
RuntimeError: CUDA error: invalid device ordinal
CUDA error 59: Device-side assert triggered

@yzh119
Copy link
Member

yzh119 commented Mar 30, 2021

@fmellomascarenhas the bug you mentioned should have been fixed in #2308 . What version of DGL you were using?

@felipemello1
Copy link

felipemello1 commented Mar 30, 2021

@fmellomascarenhas the bug you mentioned should have been fixed in #2308 . What version of DGL you were using?

My guess is that I used using pip install --pre dgl-cu110. Since I uninstalled everything and reinstalled the new 10.x cuda version, I cant say for sure :/ But I am glad to know that it was fixed. Btw, let me just ask a quick question: To use fp16, do I really have to compile DGL from source, as described in the website? Thanks!

@zhanglu-cst
Copy link
Author

@fmellomascarenhas the bug you mentioned should have been fixed in #2308 . What version of DGL you were using?
When I compiled and ran on cuda11.1, I encountered the same problem: CUDA error: invalid device ordinal.
My DGL is the clone from the lastest github source.

@BarclayII
Copy link
Collaborator

@fmellomascarenhas the bug you mentioned should have been fixed in #2308 . What version of DGL you were using?
When I compiled and ran on cuda11.1, I encountered the same problem: CUDA error: invalid device ordinal.
My DGL is the clone from the lastest github source.

For now, compiling with CUDA 11.1+ would require you specify the following macro definition esp. for PyTorch 1.8.0-:

-DCUB_CPP_DIALECT=2003

The reason is kinda complicated and discussed in multiple places, for instance NVIDIA/thrust#1401 and pytorch/pytorch#54245.

@jermainewang jermainewang added the feature request Feature request label Apr 6, 2021
@TristonC
Copy link
Collaborator

TristonC commented May 25, 2022

NVIDIA build CUDA enabled docker container for DGL. Welcome to try the early access DGL container with 0.8.0post2. 3090 is supported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Feature request
Projects
None yet
Development

No branches or pull requests

7 participants