-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent behaviour and "AttributeError: _old_init" when using Pytorch Lightning with the ogb library. #14050
Comments
This is super weird for 2 reasons:
I'm not familiar with this And it works fine if I add it to our bug report model. Can you try to reproduce it using it? |
@carmocca - If it helps, I've observed this recently in the Flash CI as well: https://github.com/Lightning-AI/lightning-flash/runs/7668034174?check_suite_focus=true. Didn't get a chance to take a closer look because of bad health, but wanted to dig deeper on Monday. I earlier thought it might be something with Flash, but will have to check with previous PL versions once. |
I've run the bug report model with the import 5 times. I got the exception twice and it worked fine the other 3 times. |
Thanks @krshrimali. The dependencies for that job do not include |
Hi @krshrimali, @carmocca, @schlyah, if it's also affecting Flash CI, then it's almost definitely on us. @schlyah, would you mind sharing:
@krshrimali, I will find all the required info in flash repo, I expect, is that correct? |
Hi, @otaj - Thank you for your message! Yes, you'll definitely find everything in the Flash repo - and if not, I'll be one message away to help you out. I tried reproducing it locally with Flash, but couldn't - this bug is really flaky. :/ |
Hi @schlyah, I spent a while on this and am unable to reproduce. Would you mind sharing your full environment (i.e. |
I am having the same problem when using torch_geometric.loader.DataLoader @contextmanager
def _replace_init_method(base_cls: Type, store_explicit_arg: Optional[str] = None) -> Generator[None, None, None]:
"""This context manager is used to add support for re-instantiation of custom (subclasses) of "base_cls".
It patches the "__init__" method.
"""
classes = _get_all_subclasses(base_cls) | {base_cls}
wrapped = set()
print('before:', classes)
for cls in classes:
if cls.__init__ not in wrapped:
print(cls.__name__)
cls._old_init = cls.__init__
cls.__init__ = _wrap_init_method(cls.__init__, store_explicit_arg)
wrapped.add(cls.__init__)
yield
print('after:', classes)
for cls in classes:
if hasattr(cls, "_old_init"):
print("del", cls.__name__)
cls.__init__ = cls._old_init
del cls._old_init It works well sometimes and get output as before: {<class 'torch_geometric.loader.random_node_sampler.RandomNodeSampler'>, <class 'torch_geometric.loader.neighbor_sampler.NeighborSampler'>, <class 'torch_geometric.loader.graph_saint.GraphSAINTNodeSampler'>, <class 'torch_geometric.loader.neighbor_loader.NeighborLoader'>, <class 'torch_geometric.loader.dense_data_loader.DenseDataLoader'>, <class 'torch_geometric.loader.graph_saint.GraphSAINTEdgeSampler'>, <class 'torch_geometric.loader.hgt_loader.HGTLoader'>, <class 'torch_geometric.loader.dataloader.DataLoader'>, <class 'torch_geometric.loader.cluster.ClusterLoader'>, <class 'torch_geometric.loader.base.BaseDataLoader'>, <class 'torch_geometric.loader.temporal_dataloader.TemporalDataLoader'>, <class 'torch.utils.data.dataloader.DataLoader'>, <class 'torch_geometric.loader.data_list_loader.DataListLoader'>, <class 'torch_geometric.loader.graph_saint.GraphSAINTRandomWalkSampler'>, <class 'torch_geometric.loader.shadow.ShaDowKHopSampler'>, <class 'torch_geometric.loader.graph_saint.GraphSAINTSampler'>}
RandomNodeSampler
NeighborSampler
GraphSAINTNodeSampler
NeighborLoader
DenseDataLoader
GraphSAINTEdgeSampler
HGTLoader
DataLoader
ClusterLoader
BaseDataLoader
TemporalDataLoader
DataLoader
DataListLoader
GraphSAINTRandomWalkSampler
ShaDowKHopSampler
GraphSAINTSampler
before: {<class 'torch.utils.data.sampler.BatchSampler'>}
BatchSampler
after: {<class 'torch.utils.data.sampler.BatchSampler'>}
del BatchSampler
after: {<class 'torch_geometric.loader.random_node_sampler.RandomNodeSampler'>, <class 'torch_geometric.loader.neighbor_sampler.NeighborSampler'>, <class 'torch_geometric.loader.graph_saint.GraphSAINTNodeSampler'>, <class 'torch_geometric.loader.neighbor_loader.NeighborLoader'>, <class 'torch_geometric.loader.dense_data_loader.DenseDataLoader'>, <class 'torch_geometric.loader.graph_saint.GraphSAINTEdgeSampler'>, <class 'torch_geometric.loader.hgt_loader.HGTLoader'>, <class 'torch_geometric.loader.dataloader.DataLoader'>, <class 'torch_geometric.loader.cluster.ClusterLoader'>, <class 'torch_geometric.loader.base.BaseDataLoader'>, <class 'torch_geometric.loader.temporal_dataloader.TemporalDataLoader'>, <class 'torch.utils.data.dataloader.DataLoader'>, <class 'torch_geometric.loader.data_list_loader.DataListLoader'>, <class 'torch_geometric.loader.graph_saint.GraphSAINTRandomWalkSampler'>, <class 'torch_geometric.loader.shadow.ShaDowKHopSampler'>, <class 'torch_geometric.loader.graph_saint.GraphSAINTSampler'>}
del RandomNodeSampler
del NeighborSampler
del GraphSAINTNodeSampler
del NeighborLoader
del DenseDataLoader
del GraphSAINTEdgeSampler
del HGTLoader
del DataLoader
del ClusterLoader
del BaseDataLoader
del TemporalDataLoader
del DataLoader
del DataListLoader
del GraphSAINTRandomWalkSampler
del ShaDowKHopSampler
del GraphSAINTSampler However, sometimes, the error occurs and the output is before: {<class 'torch_geometric.loader.graph_saint.GraphSAINTNodeSampler'>, <class 'torch_geometric.loader.graph_saint.GraphSAINTSampler'>, <class 'torch_geometric.loader.graph_saint.GraphSAINTEdgeSampler'>, <class 'torch_geometric.loader.temporal_dataloader.TemporalDataLoader'>, <class 'torch_geometric.loader.shadow.ShaDowKHopSampler'>, <class 'torch_geometric.loader.neighbor_sampler.NeighborSampler'>, <class 'torch_geometric.loader.neighbor_loader.NeighborLoader'>, <class 'torch_geometric.loader.cluster.ClusterLoader'>, <class 'torch_geometric.loader.random_node_sampler.RandomNodeSampler'>, <class 'torch_geometric.loader.data_list_loader.DataListLoader'>, <class 'torch_geometric.loader.hgt_loader.HGTLoader'>, <class 'torch_geometric.loader.dataloader.DataLoader'>, <class 'torch_geometric.loader.base.BaseDataLoader'>, <class 'torch.utils.data.dataloader.DataLoader'>, <class 'torch_geometric.loader.graph_saint.GraphSAINTRandomWalkSampler'>, <class 'torch_geometric.loader.dense_data_loader.DenseDataLoader'>}
GraphSAINTNodeSampler
GraphSAINTSampler
TemporalDataLoader
ShaDowKHopSampler
NeighborSampler
NeighborLoader
ClusterLoader
RandomNodeSampler
DataListLoader
HGTLoader
DataLoader
BaseDataLoader
DataLoader
GraphSAINTRandomWalkSampler
DenseDataLoader
before: {<class 'torch.utils.data.sampler.BatchSampler'>}
BatchSampler
after: {<class 'torch.utils.data.sampler.BatchSampler'>}
del BatchSampler
after: {<class 'torch_geometric.loader.graph_saint.GraphSAINTNodeSampler'>, <class 'torch_geometric.loader.graph_saint.GraphSAINTSampler'>, <class 'torch_geometric.loader.graph_saint.GraphSAINTEdgeSampler'>, <class 'torch_geometric.loader.temporal_dataloader.TemporalDataLoader'>, <class 'torch_geometric.loader.shadow.ShaDowKHopSampler'>, <class 'torch_geometric.loader.neighbor_sampler.NeighborSampler'>, <class 'torch_geometric.loader.neighbor_loader.NeighborLoader'>, <class 'torch_geometric.loader.cluster.ClusterLoader'>, <class 'torch_geometric.loader.random_node_sampler.RandomNodeSampler'>, <class 'torch_geometric.loader.data_list_loader.DataListLoader'>, <class 'torch_geometric.loader.hgt_loader.HGTLoader'>, <class 'torch_geometric.loader.dataloader.DataLoader'>, <class 'torch_geometric.loader.base.BaseDataLoader'>, <class 'torch.utils.data.dataloader.DataLoader'>, <class 'torch_geometric.loader.graph_saint.GraphSAINTRandomWalkSampler'>, <class 'torch_geometric.loader.dense_data_loader.DenseDataLoader'>}
del GraphSAINTNodeSampler
del GraphSAINTSampler
del GraphSAINTEdgeSampler
Traceback (most recent call last):`
`File "/anaconda/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 868, in test
return self._call_and_handle_interrupt(self._test_impl, model, dataloaders, ckpt_path, verbose, datamodule)
File "/anaconda/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 654, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/anaconda/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 915, in _test_impl
results = self._run(model, ckpt_path=self.ckpt_path)
File "/anaconda/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1166, in _run
results = self._run_stage()
File "/anaconda/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1249, in _run_stage
return self._run_evaluate()
File "/anaconda/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1288, in _run_evaluate
self._evaluation_loop._reload_evaluation_dataloaders()
File "/anaconda/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 234, in _reload_evaluation_dataloaders
self.trainer.reset_test_dataloader()
File "/anaconda/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1941, in reset_test_dataloader
self.num_test_batches, self.test_dataloaders = self._data_connector._reset_eval_dataloader(
File "/anaconda/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py", line 344, in _reset_eval_dataloader
dataloaders = self._request_dataloader(mode)
File "/anaconda/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py", line 427, in _request_dataloader
with _replace_init_method(DataLoader, "dataset"), _replace_init_method(BatchSampler):
File "/anaconda/envs/pytorch/lib/python3.10/contextlib.py", line 142, in __exit__
next(self.gen)
File "/anaconda/envs/pytorch/lib/python3.10/site-packages/pytorch_lightning/utilities/data.py", line 531, in _replace_init_method
del cls._old_init
AttributeError: _old_init |
I wrote the simple code to reproduce it. import torch
import numpy as np
from pytorch_lightning import LightningModule, Trainer, LightningDataModule
from torch.nn import functional as F
from torch_geometric.loader import DataLoader
from dataloader_1 import Dataloader_1
class my_model(LightningModule):
def __init__(self):
super().__init__()
self.l1 = torch.nn.Linear(28, 1)
def forward(self, x):
return self.l1(x)
def training_step(self, batch, batch_nb):
x, y = batch
loss = F.mse_loss(self(x), y)
return loss
def configure_optimizers(self):
return torch.optim.Adam(self.parameters(), lr=0.02)
if __name__ == '__main__':
# Init our model
mnist_model = my_model()
# Init DataLoader from MNIST Dataset
train_ds = [(torch.randint(100, size=[28],dtype=torch.float32), torch.randint(100, size=[1], dtype=torch.float32)) for i in range(1024)]
train_loader = Dataloader_1(train_ds, batch_size=512)
# Initialize a trainer
trainer = Trainer(
accelerator="auto",
devices=1 if torch.cuda.is_available() else None, # limiting got iPython runs
max_epochs=3
)
# Train the model ⚡
trainer.fit(mnist_model, train_loader) and this is "dataloader_1.py" code from torch_geometric.loader import DataLoader
class Dataloader_1(DataLoader):
def __int__(self):
super().__init__()
self.name = "dataloader_1" The error occurs almost 50% probability. |
This is the result of pip freeze: and the code:
|
@schlyah @PurpleSand123, thank you very much for the report, I was able to figure out what is the issue. I will send a PR with the fix shortly. |
Basically the issue is due to improper handling of inheritance and non-deterministic order of removing our wrapper. This usually isn't a big problem because there's hardly ever situation with large inheritance chain, where not many classes override their |
Hi, everyone. I used @PurpleSand123 example with one extra test script import subprocess
import sys
import numpy as np
def main():
i = 0
while i < 100:
i += 1
try:
_ = subprocess.run([sys.executable, "main.py"], check=True, capture_output=True)
except subprocess.CalledProcessError:
print(f"Broke on {i}th try")
break
if i == 100:
print("Didn't break")
return i
if __name__ == "__main__":
results = np.array([main() for _ in range(10)])
print(results.mean()) When running this with PL
When running with the change in the linked PR, the output on my machine was:
Please, try your examples with the changes in the linked PR and check if it fixes it for you. |
Now, the problem is fixed. Thank you |
It works 👍 Thanks for the quick response ! |
🐛 Bug
The simple example from the documentation page https://pytorch-lightning.readthedocs.io/en/latest/notebooks/lightning_examples/mnist-hello-world.html works fine. However, when I add the import instruction
from ogb.nodeproppred import *
, the code has inconsistent behaviour: sometimes it works and sometimes it throws an "AttributeError: _old_init" exception:To Reproduce
The documentation example code with the import instruction:
Environment
cc @justusschock @awaelchli @ninginthecloud @rohitgr7 @otaj
The text was updated successfully, but these errors were encountered: