Multiple GPUs training #327

mhusseinsh · 2018-07-17T09:06:26Z

Hello,

I am running on a server which has 8 GPUs
I want to train the CycleGAN on at least 2 GPUs, so what I did is that I passed this flag --gpu_ids 6,7
It only trained on the the 6th GPU, and didn't allocate the other one

any help ?

The text was updated successfully, but these errors were encountered:

mhusseinsh · 2018-07-17T11:13:25Z

even with a single GPU, it allocates the selected one, but doesn't fully utilize it
it only uses 4021MiB / 16276MiB

junyanz · 2018-07-24T23:06:00Z

You need to increase your batchSize. Try --batchSize 4 or even a larger batchSize. Each GPU will process batchSize/#GPUs

mhusseinsh · 2018-07-25T05:49:31Z

but which size do you recommend ? because I have read some issues about batch size, and most of the people said that --batchSize=1 works best, and you already mentioned here #137 that for you batch size of 1 on a single GPU, gave the best results.

So this means that I choose my batch size according the #GPUs ? eg: If I am using 2 GPUs, then batchSize=2, and if 3 GPUs then batchSize=3 and so ? so it becomes that each GPU will process 1 batch ?

And also I read something about instance_normalization and batch_normalization, when changing the batchSize

what's your opinion in general @junyanz ?

junyanz · 2018-07-25T17:49:32Z

It could be slow for each GPU to only process 1. You may want to feed 4 images per GPU.
You may want to use instance_normalization. Multi-GPU synchronized batchnorm has not been implemented in this repo. Using batch_norm with multiple GPUs might casue issues.

deartonym · 2018-09-04T20:18:06Z

Thanks for this question.
I suggest to put some reminder in ReadMe in case people like us do not notice the multi-GPU and batch size problem.

junyanz · 2018-09-05T02:35:01Z

Yeah, we added it in Q & A. Will add it in training/testing tips soon.

mhusseinsh mentioned this issue Jul 19, 2018

RuntimeError: cuda runtime error (2) : out of memory #328

Closed

junyanz closed this as completed Sep 5, 2018

JamesChenChina mentioned this issue Mar 24, 2019

multi gpus question #573

Closed

sangrockEG mentioned this issue Jun 26, 2019

My machine freezes with multi-gpu learning #685

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple GPUs training #327

Multiple GPUs training #327

mhusseinsh commented Jul 17, 2018

mhusseinsh commented Jul 17, 2018 •

edited

Loading

junyanz commented Jul 24, 2018

mhusseinsh commented Jul 25, 2018 •

edited

Loading

junyanz commented Jul 25, 2018

deartonym commented Sep 4, 2018

junyanz commented Sep 5, 2018

Multiple GPUs training #327

Multiple GPUs training #327

Comments

mhusseinsh commented Jul 17, 2018

mhusseinsh commented Jul 17, 2018 • edited Loading

junyanz commented Jul 24, 2018

mhusseinsh commented Jul 25, 2018 • edited Loading

junyanz commented Jul 25, 2018

deartonym commented Sep 4, 2018

junyanz commented Sep 5, 2018

mhusseinsh commented Jul 17, 2018 •

edited

Loading

mhusseinsh commented Jul 25, 2018 •

edited

Loading