-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor len(datasets) call. #953
Comments
@williamFalcon I'm happy to take a look at this if needed, just let me know :) |
Perfect! |
In this function, And inside, even though the comment says what it does is create a new pytorch DataLoader. I think this logic is flawed.
What I suggest is that in the default setting, only call |
Ok, have a look at #955 - should fix a few things and make it easy to add support for iterable datasets everywhere |
🚀 Feature
Let's minimize len(dataset) calls and do it as late in the training as we can (ie: ideally right before any training loop). This way, we can open up the path to support iterable datasets more cleanly.
Motivation
Getting the length prematurely calls datasets at the wrong time often causing double loads.
This is a blocker to #948
The text was updated successfully, but these errors were encountered: