feat: use SequentialSampler if curriculum_sampling
is enabled with sample_packing
#2235
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
The
curriculum_sampling
flag introduced in #1567 (cc @winglian) preserves dataset ordering by disabling shuffling, which simplifies experimenting with curriculum learning strategies. However, the flag is ignored if we usesample_packing
. This PR sets the sampler to aSequentialSampler
if thecurriculum_sampling
flag is enabled.How has this been tested?
I've verified that the DataLoader created in this way correctly yields batches with consecutive samples packed together. However, I'm not 100% sure whether other parameters of
MultipackBatchSampler
need adjustment if we potentially don't have a shuffled dataset (e.g.,packing_efficiency_estimate
).Please note that
MultipackBatchSampler
is already used with aSequentialSampler
in_get_eval_sampler()
.