Multi-process data loader bug with TensorField (RuntimeError: received 0 items of ancdata) #4847

epwalsh · 2020-12-07T00:02:33Z

I discovered this issue while using the new MultiprocessDataLoader with num_workers > 0 and max_instances_in_memory set to some high number (1000 in my case) to load batches that are built with instances that contain TensorFields.

  ...
  File "/home/epwalsh/AllenAI/allennlp/allennlp/data/data_loaders/multi_process_data_loader.py", line 236, in __iter__
    yield from self._iter_batches()
  File "/home/epwalsh/AllenAI/allennlp/allennlp/data/data_loaders/multi_process_data_loader.py", line 421, in _iter_batches
    raise e
RuntimeError: received 0 items of ancdata

The issue is stems from the fact that tensors are passed between processes using shared memory, but some systems (like the one I was on) may have strict limits on shared memory by default. So if you pile too many tensors into shared memory by having max_instances_in_memory too high, you're going to run into this. pytorch/pytorch#973 (comment).

Luckily the solution is simple: either decrease max_instances_in_memory (bringing it down to 100 worked in my case), or increase the shared memory available to your training process.

The text was updated successfully, but these errors were encountered:

github-actions · 2020-12-21T16:47:16Z

This issue is being closed due to lack of activity. If you think it still needs to be addressed, please comment on this thread 👇

Vimos · 2022-04-11T09:44:19Z

Similar issue here when using several workers for the loader,

        label = example.get('label')
        if label is not None:
            fields['label'] = TensorField(np.array(label))

If I comment out the fields['label'], then the loading will be successful.

In my case, changing the multiprocessing strategy of pytorch can also resovle the issue, torch.multiprocessing.set_sharing_strategy("file_system").

However, I suspect the design of TensorField may be the root cause as it pushes the tensor to cpu which depletes the file descriptors.

epwalsh added the bug label Dec 7, 2020

epwalsh mentioned this issue Dec 7, 2020

Switch to torchvision for vision components 👀 #4821

Merged

8 tasks

github-actions bot added the stale label Dec 21, 2020

github-actions bot closed this as completed Dec 21, 2020

gpiat mentioned this issue Feb 15, 2022

Can't seem to replicate perplexity for KnowBert-Wiki & KnowBert-W+W allenai/kb#38

Closed

Vimos mentioned this issue Apr 11, 2022

FloatField for regression problem #5230

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-process data loader bug with TensorField (RuntimeError: received 0 items of ancdata) #4847

Multi-process data loader bug with TensorField (RuntimeError: received 0 items of ancdata) #4847

epwalsh commented Dec 7, 2020 •

edited

Loading

github-actions bot commented Dec 21, 2020

Vimos commented Apr 11, 2022 •

edited

Loading

Multi-process data loader bug with TensorField (RuntimeError: received 0 items of ancdata) #4847

Multi-process data loader bug with TensorField (RuntimeError: received 0 items of ancdata) #4847

Comments

epwalsh commented Dec 7, 2020 • edited Loading

github-actions bot commented Dec 21, 2020

Vimos commented Apr 11, 2022 • edited Loading

epwalsh commented Dec 7, 2020 •

edited

Loading

Vimos commented Apr 11, 2022 •

edited

Loading