-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WARNING: attempting to recover from OOM in forward/backward pass #23
Comments
Yes, this is the reason. The settings documented in the README are appropriate for 2 GTX 1080 cards (8 GB each). |
In fact, I used 3 GPUs, each of which is 11g. The strange thing is that sometimes it works normally, and sometimes it is reported that the storage is insufficient. |
Did you pre-train the model with CE loss before running SCST? |
Yes. I passed --max-sentences 2, and it ran normally, but I was worried that it would affect performance. I don't know if it will have a significant impact? Besides, why not use .checkpoint/checkpoint_best.pt, is this not the best weight? |
Convergence improves with higher --max-sentences values (but also requires more memory). A value of 5 should work fine on 11 GB cards. Regarding |
I see, thank you. |
What is the frequency of OOMs when you run with --max-sentences 5 or 8? |
Almost every time I encounter it, the strange thing is that it reports a memory error after SCST runs one or two. |
Hi, I encountered some errors during the Self-critical sequence training stage:
WARNING: attempting to recover from OOM in forward/backward pass
Is this because the GPU memory is not enough? It feels very strange, because sometimes it is normal.
The text was updated successfully, but these errors were encountered: