Suggest to use larger gradient accumulation steps instead of multi GPUs #10

hkunzhe · 2023-08-22T08:24:12Z

In the case of the same batch size, it is recommended to use a larger number of gradient accumulation steps in a single GPU instead of multi-GPUs considering huggingface/diffusers#4046. It may lead to fluctuations in the reward.

kvablack · 2023-08-22T23:36:24Z

Thanks so much for pointing this out! What a terrible bug. I've been able to fix it so that gradients are synchronized properly across GPUs, but it uses more memory for some reason (up to 16GB from 10GB before the change).

bhattg · 2023-08-31T20:56:31Z

Hi! Does this bug effect any of the findings in the paper?

kvablack · 2023-09-10T19:43:12Z

@bhattg no, fortunately the results in the paper all used the original Jax codebase.

kvablack closed this as completed Aug 22, 2023

kvablack mentioned this issue Sep 18, 2023

reproducing the aesthetic experiment #3

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggest to use larger gradient accumulation steps instead of multi GPUs #10

Suggest to use larger gradient accumulation steps instead of multi GPUs #10

hkunzhe commented Aug 22, 2023 •

edited

Loading

kvablack commented Aug 22, 2023

bhattg commented Aug 31, 2023

kvablack commented Sep 10, 2023

Suggest to use larger gradient accumulation steps instead of multi GPUs #10

Suggest to use larger gradient accumulation steps instead of multi GPUs #10

Comments

hkunzhe commented Aug 22, 2023 • edited Loading

kvablack commented Aug 22, 2023

bhattg commented Aug 31, 2023

kvablack commented Sep 10, 2023

hkunzhe commented Aug 22, 2023 •

edited

Loading