-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing cleanup after trainer.fit() and trainer.test() #4385
Comments
Recently, I have faced this kind of issue. So, I would like to work on this issue. |
@jabertuhin would be great if you can contribute! Let us know if you need any help. |
@edenlightning I set random seed in my notebook with this:
Here is an example, I am creating two new datamodule objects, and getting different images for first batch. And then I restarted the kernel and ran them again: Is there any other better way to debug this issue? Or is it even an issue or expected behavior ? |
@jabertuhin what you are showing makes sense. If you want the same output in each cell, you also need to put the seeding in both cells. |
This issue is will be fixed by #5007, but need to wait until after 1.4 as some of the changes required will not be possible to make BC. copy-paste full example reported by @ananthsub for me to verify agains #5007. import torch
from torch.utils.data import DataLoader, Dataset
from pytorch_lightning import LightningModule, Trainer
class RandomDataset(Dataset):
def __init__(self, size, length):
self.len = length
self.data = torch.randn(length, size)
def __getitem__(self, index):
return self.data[index]
def __len__(self):
return self.len
class BoringModel(LightningModule):
def __init__(self):
super().__init__()
self.layer = torch.nn.Linear(32, 2)
def forward(self, x):
return self.layer(x)
def training_step(self, batch, batch_idx):
loss = self(batch).sum()
self.log("train_loss", loss)
return {"loss": loss}
def validation_step(self, batch, batch_idx):
loss = self(batch).sum()
self.log("valid_loss", loss)
def validation_epoch_end(self, outputs):
print("validation finished")
def configure_optimizers(self):
return torch.optim.SGD(self.layer.parameters(), lr=0.1)
def train_dataloader(self):
return DataLoader(RandomDataset(32, 64), batch_size=2)
def val_dataloader(self):
return DataLoader(RandomDataset(32, 64), batch_size=2)
trainer_args = dict(
max_epochs=4,
check_val_every_n_epoch=2,
logger=False,
checkpoint_callback=False,
progress_bar_refresh_rate=0,
weights_summary=None,
num_sanity_val_steps=0,
)
def run0():
# validation checks do not regularly since we don'tre-instantiate the trainer inside each loop
for i in range(2):
print("iteration ", i)
trainer = Trainer(**trainer_args)
test_module = BoringModel()
trainer.fit(test_module)
def run1():
reuse_trainer = Trainer(**trainer_args)
# validation checks do not run on the second loop since we don't re-instantiate the trainer inside each loop
for i in range(2):
print("iteration ", i)
test_module = BoringModel()
reuse_trainer.fit(test_module)
if __name__ == "__main__":
# https://github.com/PyTorchLightning/pytorch-lightning/issues/4385
# run0()
run1() |
🐛 Bug
The Lightning trainer holds references to the LightningModule/DataModule after fit/test complete. This can leads to different behavior in calls likeL
Please reproduce using the BoringModel and post here
To Reproduce
Expected behavior
The latter
test_reuse
should re-bind the model + hooks when callingfit
again inside each loopThe text was updated successfully, but these errors were encountered: