Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggest to use larger gradient accumulation steps instead of multi GPUs #10

Closed
hkunzhe opened this issue Aug 22, 2023 · 3 comments
Closed

Comments

@hkunzhe
Copy link

hkunzhe commented Aug 22, 2023

In the case of the same batch size, it is recommended to use a larger number of gradient accumulation steps in a single GPU instead of multi-GPUs considering huggingface/diffusers#4046. It may lead to fluctuations in the reward.

@kvablack
Copy link
Owner

Thanks so much for pointing this out! What a terrible bug. I've been able to fix it so that gradients are synchronized properly across GPUs, but it uses more memory for some reason (up to 16GB from 10GB before the change).

@bhattg
Copy link

bhattg commented Aug 31, 2023

Hi! Does this bug effect any of the findings in the paper?

@kvablack
Copy link
Owner

@bhattg no, fortunately the results in the paper all used the original Jax codebase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants