TypeError: can't pickle Environment objects when num_workers > 0 for LSUN #689

ArtjomUEA · 2018-12-17T15:13:01Z

The program fails to create an iterator for a DataLoader object when the used dataset is LSUN and the amount of workers is greater than zero. I do not have such an error when work with other datasets. Something tells me that the issue might be caused by lmdb. I run on Windows 10, CUDA 10.

Code:

import torch.utils.data
import torchvision.datasets as dset
import torchvision.transforms as transforms

dataset = dset.LSUN(root='D:/bedroom_train_lmdb', classes=['bedroom_train'],
                            transform=transforms.Compose([
                                transforms.Resize((64, 64)),
                                transforms.ToTensor(),
                                transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
                            ]))

dataloader = torch.utils.data.DataLoader(dataset, batch_size=128,
                                             shuffle=True, num_workers=4)

for data in dataloader:
    print(data)

Error:

Traceback (most recent call last):
  File "C:/Users/x/.PyCharm2018.3/config/scratches/scratch.py", line 15, in <module>
    for data in dataloader:
  File "C:\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 819, in __iter__
    return _DataLoaderIter(self)
  File "C:\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 560, in __init__
    w.start()
  File "C:\Anaconda3\lib\multiprocessing\process.py", line 112, in start
    self._popen = self._Popen(self)
  File "C:\Anaconda3\lib\multiprocessing\context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "C:\Anaconda3\lib\multiprocessing\context.py", line 322, in _Popen
    return Popen(process_obj)
  File "C:\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__
    reduction.dump(process_obj, to_child)
  File "C:\Anaconda3\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
TypeError: can't pickle Environment objects

fmassa · 2018-12-17T16:33:08Z

This seems to be a Windows-specific issue.
But note that even if we address this particular issue (I have no idea how to do it though), you would probably hit another issue further on, which is #619

Santiago810 · 2020-02-06T03:59:37Z

this issue also appear in linux, the reason is the opened lmdb env can not be pickled

IsaacBerman · 2020-02-11T15:20:55Z

@Santiago810 Do you know how to diagnose the issue of an un-pickleable lmdb env?

gebrahimi91 · 2020-02-13T20:52:03Z

I have the same issue with dataloader when I do not use lmdb dataset.

fmassa · 2020-02-14T14:11:14Z

I think this is a limitation of LMDB in python (and LSUN which uses LMDB internally), and I think there is not much we can do on torchvision side unfortunately.

4knahs · 2020-04-29T17:04:39Z

I implemented my own LMDB dataset and had the same issue when using LMDB with num_workers > 0 and torch multiprocessing set to spawn.

It is very similar to this project's LSUN implementation, in my case the issue was with this line:

https://github.com/pytorch/vision/blob/master/torchvision/datasets/lsun.py#L18

When set to fork it works fine, but when using spawn it seems to try to pickle the dataset object which has the self.env attribute which is a lmdb Environment.

Just use it and discard the reference in the init then instantiate it again in the getitem and save the reference in the class.

fmassa · 2020-05-04T12:32:38Z

@4knahs if you think you could send a PR fixing the LSUN implementation it would be great!

ruotianluo · 2020-07-04T19:39:41Z

I saw a solution somewhere else by adding getstate and setstate.

    def __getstate__(self):
        state = self.__dict__
        state["db_txn"] = None
        return state

    def __setstate__(self, state):
        self.__dict__ = state
        env = lmdb.open(self.db_path, subdir=os.path.isdir(self.db_path),
                                readonly=True, lock=False,
                                readahead=False, meminit=False,
                                map_size=1099511627776 * 2,)
        self.db_txn = env.begin(write=False)

This also doens't save self.env but instead of saving the txn.

Santiago810 · 2020-08-04T07:14:22Z

Solution: open lmdb in worker_init_fn of torch.utils.data.DataLoader

RSKothari · 2020-09-15T17:13:20Z

Could you elaborate or give an example @Santiago810 ?

airsplay · 2021-02-28T01:23:33Z

A possible solution is similar to the one for HDF5:

Do not open lmdb inside __init__
Open the lmdb at the first data iteration.

Here is an illustration:

class DataLoader(torch.utils.data.Dataset):
    def __init__(self):
        """do not open lmdb here!!"""

    def open_lmdb(self):
         self.env = lmdb.open(self.lmdb_dir, readonly=True, create=False)
         self.txn = self.env.begin(buffers=True)

    def __getitem__(self, item: int):
        if not hasattr(self, 'txn'):
            self.open_lmdb()
        """
        Then do anything you want with env/txn here.
        """

Explanation
The multi-processing actually happens when you create the data iterator (e.g., when calling for datum in dataloader:):
https://github.com/pytorch/pytorch/blob/461014d54b3981c8fa6617f90ff7b7df51ab1e85/torch/utils/data/dataloader.py#L712-L720
In short, it would create multiple processes which "copy" the state of the current process. This copy involves a pickle of the LMDB's Env thus causes an issue. In our solution, we open it at the first data iteration and the opened lmdb file object would be dedicated to each subprocess.

neillbyrne · 2021-03-02T22:13:37Z

Thank you @airsplay . Excellent solution. You just saved me about a months work !!!

thecml · 2021-05-27T09:32:45Z

A possible solution is similar to the one for HDF5:

Do not open lmdb inside __init__

Open the lmdb at the first data iteration.

Here is an illustration:
class DataLoader(torch.utils.data.Dataset):
    def __init__(self):
        """do not open lmdb here!!"""

    def open_lmdb(self):
         self.env = lmdb.open(self.lmdb_dir, readonly=True, create=False)
         self.txn = self.env.begin(buffers=True)

    def __getitem__(self, item: int):
        if not hasattr(self, 'txn'):
            self.open_lmdb()
        """
        Then do anything you want with env/txn here.
        """
Explanation
The multi-processing actually happens when you create the data iterator (e.g., when calling for datum in dataloader:):
https://github.com/pytorch/pytorch/blob/461014d54b3981c8fa6617f90ff7b7df51ab1e85/torch/utils/data/dataloader.py#L712-L720
In short, it would create multiple processes which "copy" the state of the current process. This copy involves a pickle of the LMDB's Env thus causes an issue. In our solution, we open it at the first data iteration and the opened lmdb file object would be dedicated to each subprocess.

Hi airsplay,

This solution works fine, however I'm struggling to find a way to set the self.size property on the Dataset without loading it in the __init__ function beforehand. I cannot instantiate the torch.utils.data.DataLoader without making sure that __len__ returns a valid value. Right now I save the number of samples in a meta data file and load that manually, but is there a smarter way to do this?

neillbyrne · 2021-05-27T13:10:43Z

@thecml you can open an LMDB environment in __init__ just be sure to close it within __init__. So open it, assign a size variable which is called by __len__ and the close it

thecml · 2021-05-28T12:16:44Z

@thecml you can open an LMDB environment in __init__ just be sure to close it within __init__. So open it, assign a size variable which is called by __len__ and the close it

Excellent, thanks!

metya mentioned this issue Jul 24, 2019

pytorch dataloader with datakek can't pickle transforms lamda fucntion on windows belskikh/kekas#26

Open

fmassa added module: datasets wontfix labels Feb 14, 2020

rharang mentioned this issue Apr 7, 2021

TypeError: can't pickle Environment objects sophos/SOREL-20M#3

Open

thecml mentioned this issue May 28, 2021

Replaced pyarrow - Windows support Lyken17/Efficient-PyTorch#26

Open

shoutOutYangJie mentioned this issue Nov 26, 2021

TypeError: can't pickle Environment objects pytorch/examples#526

Open

Nintorac mentioned this issue Jun 7, 2022

DataLoader leaking resources? pytorch/pytorch#78987

Open

shern2 mentioned this issue Apr 23, 2024

TypeError: cannot pickle 'Environment' object amazon-science/ReFinED#26

Open

nicolay-r mentioned this issue Jul 13, 2024

🐛 TypeError: cannot pickle sqlite3.Connection object nicolay-r/ARElight#147

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TypeError: can't pickle Environment objects when num_workers > 0 for LSUN #689

TypeError: can't pickle Environment objects when num_workers > 0 for LSUN #689

ArtjomUEA commented Dec 17, 2018

fmassa commented Dec 17, 2018

Santiago810 commented Feb 6, 2020

IsaacBerman commented Feb 11, 2020

gebrahimi91 commented Feb 13, 2020

fmassa commented Feb 14, 2020

4knahs commented Apr 29, 2020

fmassa commented May 4, 2020

ruotianluo commented Jul 4, 2020 •

edited

Loading

Santiago810 commented Aug 4, 2020

RSKothari commented Sep 15, 2020

airsplay commented Feb 28, 2021

neillbyrne commented Mar 2, 2021

thecml commented May 27, 2021

neillbyrne commented May 27, 2021 •

edited

Loading

thecml commented May 28, 2021

TypeError: can't pickle Environment objects when num_workers > 0 for LSUN #689

TypeError: can't pickle Environment objects when num_workers > 0 for LSUN #689

Comments

ArtjomUEA commented Dec 17, 2018

fmassa commented Dec 17, 2018

Santiago810 commented Feb 6, 2020

IsaacBerman commented Feb 11, 2020

gebrahimi91 commented Feb 13, 2020

fmassa commented Feb 14, 2020

4knahs commented Apr 29, 2020

fmassa commented May 4, 2020

ruotianluo commented Jul 4, 2020 • edited Loading

Santiago810 commented Aug 4, 2020

RSKothari commented Sep 15, 2020

airsplay commented Feb 28, 2021

neillbyrne commented Mar 2, 2021

thecml commented May 27, 2021

neillbyrne commented May 27, 2021 • edited Loading

thecml commented May 28, 2021

ruotianluo commented Jul 4, 2020 •

edited

Loading

neillbyrne commented May 27, 2021 •

edited

Loading