-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TypeError: can't pickle Environment objects when num_workers > 0 for LSUN #689
Comments
This seems to be a Windows-specific issue. |
this issue also appear in linux, the reason is the opened lmdb env can not be pickled |
@Santiago810 Do you know how to diagnose the issue of an un-pickleable lmdb env? |
I have the same issue with dataloader when I do not use lmdb dataset. |
I think this is a limitation of LMDB in python (and LSUN which uses LMDB internally), and I think there is not much we can do on torchvision side unfortunately. |
I implemented my own LMDB dataset and had the same issue when using LMDB with num_workers > 0 and torch multiprocessing set to spawn. It is very similar to this project's LSUN implementation, in my case the issue was with this line: https://github.com/pytorch/vision/blob/master/torchvision/datasets/lsun.py#L18 When set to fork it works fine, but when using spawn it seems to try to pickle the dataset object which has the self.env attribute which is a lmdb Environment. Just use it and discard the reference in the init then instantiate it again in the getitem and save the reference in the class. |
@4knahs if you think you could send a PR fixing the LSUN implementation it would be great! |
I saw a solution somewhere else by adding getstate and setstate.
This also doens't save self.env but instead of saving the txn. |
Solution: open lmdb in worker_init_fn of torch.utils.data.DataLoader |
Could you elaborate or give an example @Santiago810 ? |
A possible solution is similar to the one for HDF5:
Here is an illustration: class DataLoader(torch.utils.data.Dataset):
def __init__(self):
"""do not open lmdb here!!"""
def open_lmdb(self):
self.env = lmdb.open(self.lmdb_dir, readonly=True, create=False)
self.txn = self.env.begin(buffers=True)
def __getitem__(self, item: int):
if not hasattr(self, 'txn'):
self.open_lmdb()
"""
Then do anything you want with env/txn here.
""" Explanation |
Thank you @airsplay . Excellent solution. You just saved me about a months work !!! |
Hi airsplay, This solution works fine, however I'm struggling to find a way to set the self.size property on the Dataset without loading it in the |
@thecml you can open an LMDB environment in |
Excellent, thanks! |
The program fails to create an iterator for a DataLoader object when the used dataset is LSUN and the amount of workers is greater than zero. I do not have such an error when work with other datasets. Something tells me that the issue might be caused by lmdb. I run on Windows 10, CUDA 10.
Code:
Error:
The text was updated successfully, but these errors were encountered: