Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple GPUs training #327

Closed
mhusseinsh opened this issue Jul 17, 2018 · 6 comments
Closed

Multiple GPUs training #327

mhusseinsh opened this issue Jul 17, 2018 · 6 comments

Comments

@mhusseinsh
Copy link

Hello,

I am running on a server which has 8 GPUs
I want to train the CycleGAN on at least 2 GPUs, so what I did is that I passed this flag --gpu_ids 6,7
It only trained on the the 6th GPU, and didn't allocate the other one

any help ?

@mhusseinsh
Copy link
Author

mhusseinsh commented Jul 17, 2018

even with a single GPU, it allocates the selected one, but doesn't fully utilize it
it only uses 4021MiB / 16276MiB

@junyanz
Copy link
Owner

junyanz commented Jul 24, 2018

You need to increase your batchSize. Try --batchSize 4 or even a larger batchSize. Each GPU will process batchSize/#GPUs

@mhusseinsh
Copy link
Author

mhusseinsh commented Jul 25, 2018

but which size do you recommend ? because I have read some issues about batch size, and most of the people said that --batchSize=1 works best, and you already mentioned here #137 that for you batch size of 1 on a single GPU, gave the best results.

So this means that I choose my batch size according the #GPUs ? eg: If I am using 2 GPUs, then batchSize=2, and if 3 GPUs then batchSize=3 and so ? so it becomes that each GPU will process 1 batch ?

And also I read something about instance_normalization and batch_normalization, when changing the batchSize

what's your opinion in general @junyanz ?

@junyanz
Copy link
Owner

junyanz commented Jul 25, 2018

  1. It could be slow for each GPU to only process 1. You may want to feed 4 images per GPU.
  2. You may want to use instance_normalization. Multi-GPU synchronized batchnorm has not been implemented in this repo. Using batch_norm with multiple GPUs might casue issues.

@deartonym
Copy link

Thanks for this question.
I suggest to put some reminder in ReadMe in case people like us do not notice the multi-GPU and batch size problem.

@junyanz
Copy link
Owner

junyanz commented Sep 5, 2018

Yeah, we added it in Q & A. Will add it in training/testing tips soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants