-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allowed custom BatchSampler
s when instantiated in *_dataloader
hook
#13640
Conversation
BatchSampler
s when instantiated in *_dataloader
hook
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is getting really complex 😂
Posting a partial review
Thank you for the review! I addressed all your concerns 👍 Btw, regarding the complexity, it surprisingly wasn't too bad for me, since it's basically just logic from dataloader handling made generic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Minor comments
@carmocca, @awaelchli I finally managed to properly debug the cause of the IPU regression. The problem was, that I was passing all default kwargs from |
Hey @otaj I am confused on how to update my sampler to fit this API. We are dealing with audio data and we want to batch samples by sequence length. In order to do this, we have shape files that are CSVs mapping between each audio sample and its length. Our BatchSampler reads these files, sorts utterances by length, and then constructs fixed batches of similarly long utterances such that all batches are similar in size. I am not sure what to move to a Sampler here. We could move the loading of this shapes file to the sampler backed by a new dataset, but we do not sample from the shapes file to build batches, we consume the entire file at once. I think part of my doubt is a lack of understanding on what exactly you do to the Sampler passed into the BatchSampler. I could move the loading of the shapes file inside a Dataset, pass in a Sampler to this dataset as a hacky way of giving the dataset to the BatchSampler, and totally ignore the Sampler in my BatchSampler logic while just reading from its member Dataset, but I imagine this will not solve the problem I'm having. |
Hi @psridhar-asapp. First things first, this doesn't really add new API. It keeps the functionality as before, just now it allows your custom I remember we spoke on this topic in some other issue (I think it was #11807). I don't think we have a good resolution to it yet, honestly. We don't really do much to your sampler. Everything important regarding Currently, it sets shuffling if it was set to None, and we're training, wraps it in a distributed wrapper and does a bit more of a magic in That is where we wrap either If you don't care about either of these things, you can just expose argument |
What does this PR do?
Allowed custom
BatchSampler
s when instantiated in*_dataloader
hookFixes #13007, #11807
Does your PR introduce any breaking changes? If yes, please list them.
Before submitting
PR review
Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:
Did you have fun?
Make sure you had fun coding 🙃
cc @Borda @justusschock @awaelchli @ninginthecloud @rohitgr7 @otaj @akihironitta