You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
but it is appeared when num_workers for torch DataLoader more then 0. When num_workers=0 it is goes normal.
So if num_workers > 0 and there is lambda function in transforms code, for example:
defget_transforms(dataset_key, size, p):
PRE_TFMS=Transformer(dataset_key, lambdax: cv2.resize(x, (size, size))) # <-- hereAUGS=Transformer(dataset_key, lambdax: augs()(image=x)["image"]) # <-- hereNRM_TFMS=transforms.Compose([
Transformer(dataset_key, to_torch()), # <-- and here inside to_torch() there is lambda Transformer(dataset_key, normalize())
])
train_tfms=transforms.Compose([PRE_TFMS, AUGS, NRM_TFMS])
val_tfms=transforms.Compose([PRE_TFMS, NRM_TFMS])
returntrain_tfms, val_tfms
I get exception:
AttributeErrorTraceback (mostrecentcalllast)
<ipython-input-35-87bd5485ec48>in<module>4# !rm -r lrlogs/*5---->6BCE_keker.kek_lr(final_lr=0.1, logdir=lrlogdir)
7# BCE_keker.plot_kek_lr(logdir=lrlogdir)D:\metya\Anaconda3\lib\site-packages\kekas-0.1.17-py3.7.egg\kekas\keker.pyinkek_lr(self, final_lr, logdir, init_lr, n_steps, opt, opt_params)
407self.callbacks=Callbacks(self.core_callbacks+ [lrfinder_cb])
408self.kek(lr=init_lr, epochs=n_epochs, skip_val=True, logdir=logdir,
-->409opt=opt, opt_params=opt_params)
410finally:
411self.callbacks=callbacksD:\metya\Anaconda3\lib\site-packages\kekas-0.1.17-py3.7.egg\kekas\keker.pyinkek(self, lr, epochs, skip_val, opt, opt_params, sched, sched_params, stop_iter, logdir, cp_saver_params, early_stop_params)
276forepochinrange(epochs):
277self.set_mode("train")
-->278self._run_epoch(epoch, epochs)
279280ifnotskip_val:
D:\metya\Anaconda3\lib\site-packages\kekas-0.1.17-py3.7.egg\kekas\keker.pyin_run_epoch(self, epoch, epochs)
425426withtorch.set_grad_enabled(self.is_train):
-->427fori, batchinenumerate(self.state.core.loader):
428self.callbacks.on_batch_begin(i, self.state)
429D:\metya\Anaconda3\lib\site-packages\torch\utils\data\dataloader.pyin__iter__(self)
191192def__iter__(self):
-->193return_DataLoaderIter(self)
194195def__len__(self):
D:\metya\Anaconda3\lib\site-packages\torch\utils\data\dataloader.pyin__init__(self, loader)
467# before it starts, and __del__ tries to join but will get:468# AssertionError: can only join a started process.-->469w.start()
470self.index_queues.append(index_queue)
471self.workers.append(w)
D:\metya\Anaconda3\lib\multiprocessing\process.pyinstart(self)
110'daemonic processes are not allowed to have children'111_cleanup()
-->112self._popen=self._Popen(self)
113self._sentinel=self._popen.sentinel114# Avoid a refcycle if the target function holds an indirectD:\metya\Anaconda3\lib\multiprocessing\context.pyin_Popen(process_obj)
221 @staticmethod222def_Popen(process_obj):
-->223return_default_context.get_context().Process._Popen(process_obj)
224225classDefaultContext(BaseContext):
D:\metya\Anaconda3\lib\multiprocessing\context.pyin_Popen(process_obj)
320def_Popen(process_obj):
321from .popen_spawn_win32importPopen-->322returnPopen(process_obj)
323324classSpawnContext(BaseContext):
D:\metya\Anaconda3\lib\multiprocessing\popen_spawn_win32.pyin__init__(self, process_obj)
87try:
88reduction.dump(prep_data, to_child)
--->89reduction.dump(process_obj, to_child)
90finally:
91set_spawning_popen(None)
D:\metya\Anaconda3\lib\multiprocessing\reduction.pyindump(obj, file, protocol)
58defdump(obj, file, protocol=None):
59'''Replacement for pickle.dump() using ForkingPickler.'''--->60ForkingPickler(file, protocol).dump(obj)
6162#AttributeError: Can't pickle local object 'get_transforms.<locals>.<lambda>'
So I changed all lambda functions to normal and replace to_torch() to torchvision.transform.ToTensor() (even monkey patched source kekas transformation.py)
and it works for me with num_workers=0
if num_workers > 0 it is fails with
When setting num_workers=0 the dataloader doesn't use multiprocessing. Multiprocessing is only used for num_workers>0 . That's why it works for num_workers=0
On Windows there is a bug, described here https://discuss.pytorch.org/t/cant-pickle-local-object-dataloader-init-locals-lambda/31857 and here pytorch/vision#689 and here pytorch/ignite#377
but it is appeared when
num_workers
for torchDataLoader
more then 0. Whennum_workers=0
it is goes normal.So if
num_workers > 0
and there is lambda function in transforms code, for example:I get exception:
So I changed all lambda functions to normal and replace
to_torch()
totorchvision.transform.ToTensor()
(even monkey patched source kekas transformation.py)and it works for me with
num_workers=0
if
num_workers > 0
it is fails withI think it is a common bug with workers on Windows
I found related issues like that pytorch/pytorch#8976 or like that pytorch/pytorch#5301
Moreover it is funny, but if
num_workers
set0
and return lambdas back everything works fine.So maybe it is not be the lambdas in code of kekas, but in fucking windows and dataloaders and multiprocessing and I don't know.
The text was updated successfully, but these errors were encountered: