-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question regarding the gradient accumulation #7
Comments
Hi @lehduong thanks for pointing this out! Yes, in Zero123, the gradient accumulation is 1 so this doesn't matter. And I know people are trying to fix it for more than one model. According to your link, the correct way to do it for multiple models is |
Yes, you only need to change this line to |
Thanks a lot for your contribution! Just pushed a fix, please reopen the issue if anything needs further change! |
Hi @lehduong , can I ask do you use gradient accumulation for multiple models in distributed training setting (say multiple GPUs)? |
I trained the model on a single machine (8gpus). |
Hi, thanks for your implementation.
I noticed you accumulated gradients of two models with two different context managers (here). Could you let me know if you verified your implementation with gradient accumulation step different than 1? Apparently, this approach can be erroneous according to this and followed-up comments. I believe the newer version of hf's accelerate has already allowed the context manager to receive multiple models as in here.
The text was updated successfully, but these errors were encountered: