You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.
I discovered this issue while using the new MultiprocessDataLoader with num_workers > 0 and max_instances_in_memory set to some high number (1000 in my case) to load batches that are built with instances that contain TensorFields.
...
File "/home/epwalsh/AllenAI/allennlp/allennlp/data/data_loaders/multi_process_data_loader.py", line 236, in __iter__
yield from self._iter_batches()
File "/home/epwalsh/AllenAI/allennlp/allennlp/data/data_loaders/multi_process_data_loader.py", line 421, in _iter_batches
raise e
RuntimeError: received 0 items of ancdata
The issue is stems from the fact that tensors are passed between processes using shared memory, but some systems (like the one I was on) may have strict limits on shared memory by default. So if you pile too many tensors into shared memory by having max_instances_in_memory too high, you're going to run into this. pytorch/pytorch#973 (comment).
Luckily the solution is simple: either decrease max_instances_in_memory (bringing it down to 100 worked in my case), or increase the shared memory available to your training process.
The text was updated successfully, but these errors were encountered:
I discovered this issue while using the new
MultiprocessDataLoader
withnum_workers > 0
andmax_instances_in_memory
set to some high number (1000 in my case) to load batches that are built with instances that containTensorField
s.The issue is stems from the fact that tensors are passed between processes using shared memory, but some systems (like the one I was on) may have strict limits on shared memory by default. So if you pile too many tensors into shared memory by having
max_instances_in_memory
too high, you're going to run into this. pytorch/pytorch#973 (comment).Luckily the solution is simple: either decrease
max_instances_in_memory
(bringing it down to 100 worked in my case), or increase the shared memory available to your training process.The text was updated successfully, but these errors were encountered: