When will cuda 11.2 be supported? #2790

zhanglu-cst · 2021-03-27T07:39:35Z

🚀 Feature

NVIDIA 3090 need pytorch 1.8 and cuda 11.2，I tried to compile myself, but ran with an error of CUDA order.
When will cuda 11.2 be supported?

VoVAllen · 2021-03-27T09:00:33Z

CUDA 11.2 has been reported with some serious bugs. Therefore you can see PyTorch didn't release with 11.2 also. I don't think we will release 11.2 binary until PyTorch do so. Could you post your compilation error here? Thus we can help make it work on your machine.

felipemello1 · 2021-03-30T13:49:36Z

Hi, just wanted to share that after 2 days debugging my code using cuda 11.1, I decided to switch pytorch/dgl to cuda 10.x, and got my program to work.

I dont have the compilation error, but the code would work normally on CPU. In GPU, it would work until a certain type of operation was done. If I tried to print the tensor, the notebook would crash. Also, trying to do reduce operations (more specifically torch.cumsum, would also break the notebook. Sending to CPU and doing the same operation would work.

The errors I got were:
RuntimeError: CUDA error: invalid device ordinal
CUDA error 59: Device-side assert triggered

yzh119 · 2021-03-30T13:52:56Z

@fmellomascarenhas the bug you mentioned should have been fixed in #2308 . What version of DGL you were using?

felipemello1 · 2021-03-30T14:27:48Z

@fmellomascarenhas the bug you mentioned should have been fixed in #2308 . What version of DGL you were using?

My guess is that I used using pip install --pre dgl-cu110. Since I uninstalled everything and reinstalled the new 10.x cuda version, I cant say for sure :/ But I am glad to know that it was fixed. Btw, let me just ask a quick question: To use fp16, do I really have to compile DGL from source, as described in the website? Thanks!

zhanglu-cst · 2021-03-30T14:33:20Z

@fmellomascarenhas the bug you mentioned should have been fixed in #2308 . What version of DGL you were using?
When I compiled and ran on cuda11.1, I encountered the same problem: CUDA error: invalid device ordinal.
My DGL is the clone from the lastest github source.

BarclayII · 2021-04-01T06:22:06Z

@fmellomascarenhas the bug you mentioned should have been fixed in #2308 . What version of DGL you were using?
When I compiled and ran on cuda11.1, I encountered the same problem: CUDA error: invalid device ordinal.
My DGL is the clone from the lastest github source.

For now, compiling with CUDA 11.1+ would require you specify the following macro definition esp. for PyTorch 1.8.0-:

-DCUB_CPP_DIALECT=2003

The reason is kinda complicated and discussed in multiple places, for instance NVIDIA/thrust#1401 and pytorch/pytorch#54245.

TristonC · 2022-05-25T21:59:42Z

NVIDIA build CUDA enabled docker container for DGL. Welcome to try the early access DGL container with 0.8.0post2. 3090 is supported.

jermainewang added the feature request Feature request label Apr 6, 2021

BarclayII closed this as completed May 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When will cuda 11.2 be supported? #2790

When will cuda 11.2 be supported? #2790

zhanglu-cst commented Mar 27, 2021

VoVAllen commented Mar 27, 2021 •

edited

Loading

felipemello1 commented Mar 30, 2021

yzh119 commented Mar 30, 2021

felipemello1 commented Mar 30, 2021 •

edited

Loading

zhanglu-cst commented Mar 30, 2021

BarclayII commented Apr 1, 2021

TristonC commented May 25, 2022 •

edited

Loading

When will cuda 11.2 be supported? #2790

When will cuda 11.2 be supported? #2790

Comments

zhanglu-cst commented Mar 27, 2021

🚀 Feature

VoVAllen commented Mar 27, 2021 • edited Loading

felipemello1 commented Mar 30, 2021

yzh119 commented Mar 30, 2021

felipemello1 commented Mar 30, 2021 • edited Loading

zhanglu-cst commented Mar 30, 2021

BarclayII commented Apr 1, 2021

TristonC commented May 25, 2022 • edited Loading

VoVAllen commented Mar 27, 2021 •

edited

Loading

felipemello1 commented Mar 30, 2021 •

edited

Loading

TristonC commented May 25, 2022 •

edited

Loading