GPUs requested but none are available #3542

shanhaidexiamo · 2020-09-18T04:01:39Z

my server has 8 GPUs, but when I use the trainer class and set gpus = -1, it gets the run error GPUs requested but none are available, use torch to check the gpus , get the number of gpu is 8, and cuda.is_available is true. Does any one can tell me what's wrong ?

github-actions · 2020-09-18T04:02:23Z

Hi! thanks for your contribution!, great first issue!

Borda · 2020-09-18T06:34:30Z

Mind check if you have installed CUDA version of your PT as torch.cuda.is_available()

awaelchli · 2020-09-19T14:47:21Z

cuda.is_available is true

@shanhaidexiamo what does
torch.cuda.device_count()
return?
cuda.is_available alone does not tell us if the gpus are visible to torch.

kyoungrok0517 · 2020-10-01T07:15:45Z

cuda.is_available is true

@shanhaidexiamo what does
torch.cuda.device_count()
return?
cuda.is_available alone does not tell us if the gpus are visible to torch.

Sorry to interrupt but I'm experiencing the same issue. the device_count() returns 2 in my case, and I'm running on GCP instance with two V100. I had no problem with my own server so it's strange (though the GPU model is different). pytorch-lightning==0.9.0

This is the env

* CUDA:
        - GPU:
                - Tesla V100-SXM2-16GB
                - Tesla V100-SXM2-16GB
        - available:         True
        - version:           10.2
* Packages:
        - numpy:             1.18.5
        - pyTorch_debug:     False
        - pyTorch_version:   1.6.0
        - pytorch-lightning: 0.9.0
        - tqdm:              4.47.0
* System:
        - OS:                Linux
        - architecture:
                - 64bit
                - ELF
        - processor:         x86_64
        - python:            3.8.3
        - version:           #24-Ubuntu SMP Sat Sep 5 02:07:13 UTC 2020

GCP command to make the similar instance

gcloud beta compute --project <project> instances create <instance-name> --zone=us-central1-a --machine-type=n1-standard-16 --subnet=default --network-tier=PREMIUM --maintenance-policy=TERMINATE --service-account=<service_account> --scopes=https://www.googleapis.com/auth/devstorage.read_only,https://www.googleapis.com/auth/logging.write,https://www.googleapis.com/auth/monitoring.write,https://www.googleapis.com/auth/servicecontrol,https://www.googleapis.com/auth/service.management.readonly,https://www.googleapis.com/auth/trace.append --accelerator=type=nvidia-tesla-v100,count=2 --image=ubuntu-2004-focal-v20200917 --image-project=ubuntu-os-cloud --boot-disk-size=200GB --boot-disk-type=pd-standard --boot-disk-device-name=thesis-1 --no-shielded-secure-boot --shielded-vtpm --shielded-integrity-monitoring --reservation-affinity=any

Borda · 2020-10-01T13:21:54Z

@kyoungrok0517 mind sharing this output, just to check that you have properly installed PT and drivers...
python -c "import torch ; print(torch.cuda.device_count())"

kyoungrok0517 · 2020-10-01T15:49:17Z

@Borda That returns 2 as expected. If I use gpus = -1 argument the lightning doesn't work as I described. But if I give the exact number of gpus (e.g. gpus = 2) it works fine. I'm using ddp as the backend.

Borda · 2020-10-01T17:39:21Z

@kyoungrok0517 good catch, mind sending PR?

awaelchli · 2020-10-01T19:47:34Z

@williamFalcon is working on the parsing of gpus for DDP. The error is most likely because they are not correctly passed to or parsed in the child process.

kyoungrok0517 · 2020-10-02T01:57:10Z

@kyoungrok0517 good catch, mind sending PR?

Hmm... so am I late to PR? I never did this before so it'll be grateful if you guide me how it works. I make pull request even though I don't know how to fix it?? Please tell me. I'd like to help.

stale · 2020-11-01T02:10:37Z

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!

fishbotics · 2021-04-05T04:58:29Z

Hi all,

[EDIT] I meant gpus=-1

I am now having the same issue. I'm running my job on a server with 8 GPUs. When I run python -c "import torch ; print(torch.cuda.device_count())" I get 8, but when I run with gpus=1 and auto_select_gpus=True, I get an error saying that there are no GPUs available.

@Borda , @awaelchli : do you know if this was ever fixed?

Thanks!

awaelchli · 2021-04-05T11:34:57Z

@fishbotics What else is running on the gpus?

The original issue reported here was fixed by #4209, I believe.
adding gpus=1 and auto_select_gpus=True works for me with the pl_examples

shanhaidexiamo added the question Further information is requested label Sep 18, 2020

Borda added the waiting on author Waiting on user action, correction, or update label Sep 18, 2020

stale bot added the won't fix This will not be worked on label Nov 1, 2020

stale bot closed this as completed Nov 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPUs requested but none are available #3542

GPUs requested but none are available #3542

shanhaidexiamo commented Sep 18, 2020

github-actions bot commented Sep 18, 2020

Borda commented Sep 18, 2020

awaelchli commented Sep 19, 2020 •

edited

Loading

kyoungrok0517 commented Oct 1, 2020 •

edited

Loading

Borda commented Oct 1, 2020

kyoungrok0517 commented Oct 1, 2020 •

edited

Loading

Borda commented Oct 1, 2020

awaelchli commented Oct 1, 2020

kyoungrok0517 commented Oct 2, 2020

stale bot commented Nov 1, 2020

fishbotics commented Apr 5, 2021 •

edited

Loading

awaelchli commented Apr 5, 2021

GPUs requested but none are available #3542

GPUs requested but none are available #3542

Comments

shanhaidexiamo commented Sep 18, 2020

github-actions bot commented Sep 18, 2020

Borda commented Sep 18, 2020

awaelchli commented Sep 19, 2020 • edited Loading

kyoungrok0517 commented Oct 1, 2020 • edited Loading

Borda commented Oct 1, 2020

kyoungrok0517 commented Oct 1, 2020 • edited Loading

Borda commented Oct 1, 2020

awaelchli commented Oct 1, 2020

kyoungrok0517 commented Oct 2, 2020

stale bot commented Nov 1, 2020

fishbotics commented Apr 5, 2021 • edited Loading

awaelchli commented Apr 5, 2021

awaelchli commented Sep 19, 2020 •

edited

Loading

kyoungrok0517 commented Oct 1, 2020 •

edited

Loading

kyoungrok0517 commented Oct 1, 2020 •

edited

Loading

fishbotics commented Apr 5, 2021 •

edited

Loading