-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Out-of-memory when using target factors #1106
Comments
Any idea on this one? Still encountering it |
The only unusual things I notice about the training setup are a large number of buckets and very long sequences (by machine translation standards). You could try increasing the bucket width and/or splitting examples into shorter sequences. A more typical scenario might have 8 buckets total covering lengths 1-128. |
Thanks! could you please point me to what "buckets" are? I have a vague sense, but maybe an exact understanding will help here. |
During training, examples are grouped into "buckets" of similar-length sequences. The default bucket width is 8, meaning that examples with source length <=8 and target length <=8 go into the first bucket, remaining examples <=16 go into the second, etc., up to the max sequence length. One reason for bucketing is to trade off between efficiency (similar lengths to minimize padding) and variation (some length differences within batches plus the ability to shuffle the entire bucket for each epoch). |
I have an experiment, where I can factorize the source, and target tokens, such that the context length becomes very small.
To test how effective this is, I tried running four configurations, on an A100 GPU with 80GB VRAM:
Surprisingly, the third configuration fails in an OOM error. I would have assumed that if it will be a problem it will either be a problem for "no factors" (larger sequences).
While I could reduce the batch size, I don't understand the logic behind why it would fail with target factors, but not "no factors" or "source and target factors".
Here are the commands I run:
Output
The text was updated successfully, but these errors were encountered: