-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable usage of batch_by_size for other code paths than just train #17
Conversation
required_batch_size_multiple=required_batch_size_multiple, | ||
) | ||
else: | ||
batch_sampler = data_utils.batch_by_size_tpu( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this implemented?
Did I review it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, it was implemented a long time ago, let me find the pr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be "world aware" otherwise you will end up with issues at the end of the samples set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, this is the pr that made the tpu
branch in this repo. But it was really first implemented in our examples
repo, when our runner was still there. See here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when we moved our runner inside the fairseq
repo, it got carried over.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah here is the actual pr that introduced batch_by_size_tpu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The classes that uses this sampler downstream are world aware. Unfortunately, due to the .ceil
thing discussed before, we do get empty batches at the end for some cores. We definitely do always get the same number of iterations per core. All we then have to do is to drop the last batches, which we do here
In train.py, due to our custom data preparation requirements, we use
--input_shapes
This doesn't exist if we train on gpu, or if we are in the sequence generation code path. Thus, we get a
Namespace doesn't have attribute
error. This PR:batch_by_size
instead ofbatch_by_size_tpu
if we're training on gpu.None
ifargs
doesn't haveinput_shapes
(currently the case forgenerate.py
,interactive.py
etc)