Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate DataModule lifecycle properties #7657

Merged
merged 8 commits into from
Jun 9, 2021

Conversation

ananthsub
Copy link
Contributor

@ananthsub ananthsub commented May 22, 2021

What does this PR do?

Fixes #7301

This is the first change required, since this will be a breaking change in the API.
First, we mark the properties as deprecated. Then in v1.6, we can remove them completely, along with all the tracking code outlined in #7301, and change the Trainer behavior to unconditionally call these hooks.

This makes it so that the DataModule is in full control of the state, not the Trainer, giving more flexibility and predictability to users.

Before submitting

  • Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 🙃

@codecov
Copy link

codecov bot commented May 22, 2021

Codecov Report

Merging #7657 (dfb9809) into master (764d2c7) will decrease coverage by 5%.
The diff coverage is 100%.

@@           Coverage Diff           @@
##           master   #7657    +/-   ##
=======================================
- Coverage      91%     86%    -5%     
=======================================
  Files         204     204            
  Lines       13659   13669    +10     
=======================================
- Hits        12435   11819   -616     
- Misses       1224    1850   +626     

@ananthsub ananthsub added design Includes a design discussion data handling Generic data-related topic labels May 22, 2021
@ananthsub ananthsub added this to the v1.4 milestone May 22, 2021
Copy link
Member

@ethanwharris ethanwharris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 😃

Copy link
Contributor

@kaushikb11 kaushikb11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@carmocca carmocca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These warnings will appear every time these functions are called (every setup, teardown, prepare_data call)

Considering these are properties out of the reach of users, they can get very annoying as users won't be able to disable them or update their code.

What's your plan to carry out the complete deprecation?

@ananthsub
Copy link
Contributor Author

These warnings will appear every time these functions are called (every setup, teardown, prepare_data call)

Good catch. What do you think about the trainer directly accessing the private attribute on the module, or wrapping this in a warning cache?

What's your plan to carry out the complete deprecation?

After this PR is merged, in v1.6 we can remove all the tracking code and properties listed in the issue. Are there items in the issue that aren't covered which we should discuss further?

@Borda Borda added the ready PRs ready to be merged label May 26, 2021
pytorch_lightning/core/datamodule.py Outdated Show resolved Hide resolved
pytorch_lightning/core/datamodule.py Outdated Show resolved Hide resolved
pytorch_lightning/core/datamodule.py Outdated Show resolved Hide resolved
pytorch_lightning/core/datamodule.py Outdated Show resolved Hide resolved
pytorch_lightning/core/datamodule.py Outdated Show resolved Hide resolved
pytorch_lightning/core/datamodule.py Outdated Show resolved Hide resolved
pytorch_lightning/core/datamodule.py Outdated Show resolved Hide resolved
pytorch_lightning/core/datamodule.py Outdated Show resolved Hide resolved
pytorch_lightning/core/datamodule.py Outdated Show resolved Hide resolved
@Borda Borda enabled auto-merge (squash) May 26, 2021 09:39
@Borda Borda requested a review from carmocca May 26, 2021 09:39
pytorch_lightning/core/datamodule.py Outdated Show resolved Hide resolved
pytorch_lightning/core/datamodule.py Outdated Show resolved Hide resolved
pytorch_lightning/core/datamodule.py Outdated Show resolved Hide resolved
pytorch_lightning/core/datamodule.py Outdated Show resolved Hide resolved
pytorch_lightning/core/datamodule.py Outdated Show resolved Hide resolved
pytorch_lightning/core/datamodule.py Outdated Show resolved Hide resolved
pytorch_lightning/core/datamodule.py Outdated Show resolved Hide resolved
pytorch_lightning/core/datamodule.py Outdated Show resolved Hide resolved
pytorch_lightning/core/datamodule.py Outdated Show resolved Hide resolved
Copy link
Contributor

@carmocca carmocca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is, usually when something is deprecated, you can do two things as an user:

  1. Stop using the deprecated feature
  2. Upgrade to the alternative.

The issue here is that none of them can be done in this case, leading to frustration (nobody likes warnings!)

What if instead of deprecating these properties which are pretty much internal (I haven't seen any user example relying on them), we print a warning when one of these functions is called twice?

here: https://github.com/PyTorchLightning/pytorch-lightning/blob/037a71b156ac648e05735af12ddca73e48a49e57/pytorch_lightning/core/datamodule.py#L384-L385

(1) and (2) still won't be possible with this solution, but at least the number of users who are relying on the "do not call me again logic" should be much smaller than the number of users who would see these deprecation messages implemented in this PR (all datamodule users)

@Borda Borda removed the ready PRs ready to be merged label May 26, 2021
@ananthsub
Copy link
Contributor Author

I talked to @carmocca around the warnings and deprecation:

  1. I'll change this:
    https://github.com/PyTorchLightning/pytorch-lightning/blob/037a71b156ac648e05735af12ddca73e48a49e57/pytorch_lightning/core/datamodule.py#L384-L385

to this:

if has run:
    rank_zero_warn(f"DataModule.{name} has already been called, so it will not be called again. In v1.6 this behavior will change to always call DataModule.{name}") 
else:
    fn(*args, **kwargs)
  1. The warnings added to the properties will not appear during normal execution because the wrapper directly acceses the private attributes: https://github.com/PyTorchLightning/pytorch-lightning/blob/037a71b156ac648e05735af12ddca73e48a49e57/pytorch_lightning/core/datamodule.py#L372-L381

In this use case:

trainer.fit(module1, dm)
trainer.fit(module2, dm)

setup and teardown will not be called on the datamodule in the second call, which is the motivation of the issue/PR. To get the new behavior before 1.6, the user must do:

trainer.fit(module1, dm)
dm._has_setup_fit = False
dm._has_teardown_fit = False
trainer.fit(module2, dm)

It is not worth adding a reset() method now to clear these states because we will change the behavior in v1.6 anyways, meaning we'd deprecate this reset() method then.

  1. @carmocca @awaelchli I'm still in favor of deprecating these properties. @carmocca raised a good point that users who are upgrading from pre Lightning1.4 to Lightning 1.6+ won't see these warnings, and their code will silently change behavior from their POV. However, we can address this via the Breaking Changes section in the release announcements and by making the execution flow easier to debug (which hooks were called when dumped to a file for inspection). Even now we consider these properties as mostly internal to the datamodule, but they are on the public API, which strongly suggests they ought to be removed.

@carmocca
Copy link
Contributor

carmocca commented May 27, 2021

I'm still in favor of deprecating these properties ...

How easy do we think it will be for users to write idempotent {setup,teardown,prepare_data} implementations?

For example, a basic user might have:

def prepare_data(self):
    torchvision.MNIST(root=self.data_root, download=True)

In this case, prepare_data() can be called any number of times because torchvision has already written the checks to know if the data is already available.

Now, what if you have some complex pre-processing code where knowing whether it has been applied or not is not as simple as a file existence check. In this case, the DataModule.has_prepared_data is helpful, otherwise, the user is left to write and manage this flag himself.

Are we okay with this?

@ananthsub
Copy link
Contributor Author

@carmocca the implementation can check for whether the side-effect has already taken place.

Another reason is what came up in the slack questions channel today. Users can use the datamodule outside of the Lightning Trainer: https://pytorch-lightning.readthedocs.io/en/latest/extensions/datamodules.html#datamodules-without-lightning

In these cases, stage can be a free-form argument, but there's no property that corresponds to it. That's an API gap which we can simplify if we remove the properties altogether

@ananthsub ananthsub force-pushed the feat/deprecate-dm-fields branch from 2b87b08 to 553f076 Compare June 7, 2021 17:59
@mergify mergify bot added the has conflicts label Jun 7, 2021
@ananthsub ananthsub force-pushed the feat/deprecate-dm-fields branch from b48a694 to 066571b Compare June 8, 2021 04:54
@mergify mergify bot removed the has conflicts label Jun 8, 2021
@ananthsub
Copy link
Contributor Author

@carmocca do you think this PR is ready to go now?

Copy link
Contributor

@carmocca carmocca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry! Forgot I had requested changes

Great job!

@carmocca carmocca disabled auto-merge June 9, 2021 12:08
@carmocca carmocca enabled auto-merge (squash) June 9, 2021 12:09
@carmocca carmocca merged commit 6fee926 into Lightning-AI:master Jun 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data handling Generic data-related topic design Includes a design discussion
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RFC] Remove has_{setup/teardown}_{stage} properties from DataModule
6 participants