Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new way of passing dataloaders #759

Merged
merged 24 commits into from
Feb 19, 2020
Merged

new way of passing dataloaders #759

merged 24 commits into from
Feb 19, 2020

Conversation

SkafteNicki
Copy link
Member

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
  • Did you read the contributor guideline?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?

What does this PR do?

This is a initial solution to issue #662. This allows people to pass a train_dataloader, val_dataloader and test_dataloader to the .fit() method of Trainer. As mentioned in the issue, this is a more familiar interface for people coming from keras and scikit-learn.

What needs to be discussed is probably in the case where a dataloader is defined both in the model and passed to fit. Right now the fit method will only use the dataloader defined in the model, and prompt the user that it is skipping the dataloader passed to fit. For train_dataloader there needs to be some kind of tiebreaker, however since both val_dataloader and test_dataloader can be a list of multiple dataloader a option here would be to concatenate dataloaders defined in the model and those passed to fit.

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

yes!

else:
logging.info('Model has predefined val_dataloader, '
'will skip the val_dataloader passed to fit method ')
else:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you may merge this with the first if, so elif val_dataloader:

"""
if train_dataloader:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make it a function since the three are almost the same:

def _set_dataloader(dataloader, attribute):
    if isinstance(dataloader, torch.utils.data.DataLoader):
        if getattr(model, attribute) is None:
            setattr(model, attribute, lambda: dataloader)
        else:
            logging.info(f'Model has predefined `{attribute}`, '
                         'will skip the `{attribute}` passed to fit method.')
    elif dataloader:
        raise ValueError(f'`{attribute}` needs to be an instance of `torch.utils.data.DataLoader`')
    
_set_dataloader(train_dataloader, 'train_dataloader')
...

@SkafteNicki
Copy link
Member Author

@Borda, took your changes into consideration, and made it all into a function. In addition I added:

  1. both val_dataloader and test_dataloader can now be lists of pytorch dataloaders, similar to how it works when they are defined as part of the model
  2. i changed the way that the dataloaders are checked whether they are defined by the user or directly inherited from the base class, such that it does not actually involve evaluating the dataloaders
  3. to do 2. I had to fix a small bug in pl.data_loader decorator, where the @wraps(fn) decorator did not do what it was meant to do

@Borda Borda added the feature Is an improvement or enhancement label Jan 31, 2020
@Borda Borda added this to the 0.6.1 milestone Jan 31, 2020
@williamFalcon
Copy link
Contributor

@SkafteNicki this has been brought up before. Can you tell me again why we should enable this? I do agree that it might be easier to allow this model to server arbitrary data. BUT this will come at the expense of breaking the LightningTemplate and having data + transforms defined in unconventional ways that don't lend themselves to reproducibility

@SkafteNicki
Copy link
Member Author

As I initially wrote I think the current dataloader setup is not intuitive for newcomers. It is easy to learn, but it is just different than other common frameworks.

My personal use case is where i have two models, that needs to be trained on different training sets but evaluated on the same val/test set. For me it makes sense to then let the training dataloaders be defined as part of the model, but pass the val/test dataloaders to the trainer.

Lastly, I do not think that there is anything wrong with having this as additional option for parsing data. It should in no way replace the current way of doing it.

If this means that we are giving up too much reproducibility, then maybe it is not worth it.

@williamFalcon
Copy link
Contributor

@SkafteNicki makes sense! let's do it.
mind fixing the tests?

@williamFalcon
Copy link
Contributor

@SkafteNicki we also need a test for this. Add it under the tests for trainer.

Awesome addition!

@williamFalcon
Copy link
Contributor

@SkafteNicki mind finishing this over the next day or so? that way we can add to next release. awesome feature!

@SkafteNicki
Copy link
Member Author

@williamFalcon i will try to look at this tomorrow, will hopefully get it all to work and implement the appropriate tests. Will let you know as soon as I am done.
When is next release date?

@williamFalcon
Copy link
Contributor

ideally next week (we were on a cycle of every 6th), but we got behind in dec

@Borda
Copy link
Member

Borda commented Feb 6, 2020

@williamFalcon
Copy link
Contributor

@SkafteNicki love this feature. can we please get this merged?

@williamFalcon williamFalcon added the ready PRs ready to be merged label Feb 12, 2020
@williamFalcon
Copy link
Contributor

@SkafteNicki can you add a test and documentation for this please?

@SkafteNicki
Copy link
Member Author

@williamFalcon test and documentation should be added now. The failing CI testing seems to be a windows specific problem not related to this PR. Two notes on changes since last time:

  • I removed the @abstractmethod decorator from train_dataloader(self) since this does not need to be define by the user anymore
  • I did a bit of renaming of the LightningTestModelBase to get one base test model with and without a predefined train_dataloader(self). Please review this part

Copy link
Member

@Borda Borda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this addition...
I am not very sure about some block so could you pls double check it

pytorch_lightning/trainer/trainer.py Show resolved Hide resolved
pytorch_lightning/trainer/trainer.py Outdated Show resolved Hide resolved
pytorch_lightning/trainer/trainer.py Show resolved Hide resolved
pytorch_lightning/trainer/trainer.py Show resolved Hide resolved
pytorch_lightning/trainer/trainer.py Outdated Show resolved Hide resolved
pytorch_lightning/trainer/trainer.py Outdated Show resolved Hide resolved
tests/models/base.py Outdated Show resolved Hide resolved
tests/test_trainer.py Outdated Show resolved Hide resolved
tests/test_trainer.py Show resolved Hide resolved
tests/test_trainer.py Outdated Show resolved Hide resolved
@Borda Borda added information needed and removed ready PRs ready to be merged labels Feb 13, 2020
@pep8speaks
Copy link

pep8speaks commented Feb 16, 2020

Hello @SkafteNicki! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-02-19 09:51:31 UTC

tests/test_trainer.py Outdated Show resolved Hide resolved
Copy link
Member

@Borda Borda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀
@SkafteNicki add change to CHANGELOG.md

pytorch_lightning/trainer/trainer.py Outdated Show resolved Hide resolved
tests/models/__init__.py Outdated Show resolved Hide resolved
@SkafteNicki
Copy link
Member Author

@Borda I have updated code and CHANGELOG per your request

@Borda Borda added the ready PRs ready to be merged label Feb 18, 2020
@Borda
Copy link
Member

Borda commented Feb 18, 2020

@SkafteNicki great job, Thx!

@williamFalcon williamFalcon merged commit ffd6e69 into Lightning-AI:master Feb 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Is an improvement or enhancement ready PRs ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants