Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running on multiple GPUs on sagemaker #292

Closed
leberknecht opened this issue Jun 15, 2018 · 4 comments
Closed

Running on multiple GPUs on sagemaker #292

leberknecht opened this issue Jun 15, 2018 · 4 comments

Comments

@leberknecht
Copy link

Aloha!

I was testing out pytorch-pix2pix on sagemaker, and i noticed that training doesnt run faster if i run it on a machine with 8 GPUs or with 1. I do see the correct option gpu_ids: 0,1,2,3,4,5,6,7 #011[default: 0] in the output, so that looks ok'ish.
Any ideas where i can start? In this article https://medium.com/@julsimon/training-with-pytorch-on-amazon-sagemaker-58fca8c69987 it just says Multi-GPU training is also possible but requires extra work but doesnt say what needs to be done.

@junyanz
Copy link
Owner

junyanz commented Jun 16, 2018

You probably need a bigger batchSize (e.g., --batchSize 32 for 4 or 8 GPUs).

@leberknecht
Copy link
Author

Indeed, that did the trick :) Many Thanks!

@abhisingh977
Copy link

This does not let me use other gpu its just increasing the batch size of one gpu other gpu is still empty..
I increased the batch size to 128 that is showing cuda0 out of memory no matter if i am using 1 gpu of 8 gpu.

@junyanz
Copy link
Owner

junyanz commented Oct 17, 2020

You still need to use the flag --gpu_ids.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants