Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError: cannot pickle 'Environment' object #26

Open
yayamamo opened this issue Apr 22, 2024 · 4 comments
Open

TypeError: cannot pickle 'Environment' object #26

yayamamo opened this issue Apr 22, 2024 · 4 comments

Comments

@yayamamo
Copy link

yayamamo commented Apr 22, 2024

Hi, I use python 3.12.2 and torch 2.2.2 on macOS 12.7.4.
The ReFinED version is 1.0.
When trying fine-tuning the following error happened.
( python src/refined/training/fine_tune/fine_tune.py --experiment_name test )

TypeError: cannot pickle 'Environment' object

Could you tell me what could be done to workaround this issue?

Thanks.

@yayamamo yayamamo changed the title TypeError Environment TypeError: cannot pickle 'Environment' object Apr 22, 2024
@yayamamo
Copy link
Author

This happens when there is no GPUs, but I am not sure how to workaround this.

@shern2
Copy link

shern2 commented Apr 22, 2024

(I'm not from Amazon).

The Environment object is likely related to the lmdb's https://lmdb.readthedocs.io/en/release/#environment-class
Without the full stack trace, I can only guess.
My guess is something is trying to save the processor which includes the preprocessor, which includes the lookups to the lmdb tables. Perhaps the checkpoint portion.

Nevertheless, I suggest that you use a machine with GPU, because this is research code, and not 'battle-tested' in different environments (e.g. just pure CPU for training). I have testing for inference, CPU-only works, but didn't try training/fine-tuning with just CPU.

@yayamamo
Copy link
Author

Thanks, it may be reasonable to try at a GPU machine.

Just for a reference, I put the full stack trace below.

/Users/yayamamo/.pyenv/versions/3.12.2/lib/python3.12/site-packages/torch/cuda/amp/grad_scaler.py:126: UserWarning: torch.cuda.amp.GradScaler is enabled, but CUDA is not available.  Disabling.
  warnings.warn(
14:35:33 - __main__ - INFO - Fine-tuning end-to-end EL
14:36:00 - __main__ - INFO - Fine-tuning end-to-end EL
INFO:__main__:Fine-tuning end-to-end EL
  0%|                                                                                                                                                                                                                                  | 0/10 [00:00<?, ?it/s]14:36:02 - __main__ - INFO - Starting epoch number 0
INFO:__main__:Starting epoch number 0
14:36:02 - __main__ - INFO - lr: 0.0
INFO:__main__:lr: 0.0
14:36:02 - __main__ - INFO - lr: 0.0
INFO:__main__:lr: 0.0
14:36:02 - __main__ - INFO - lr: 0.0
INFO:__main__:lr: 0.0
14:36:02 - __main__ - INFO - lr: 0.0
INFO:__main__:lr: 0.0
14:36:02 - __main__ - INFO - lr: 0.0
INFO:__main__:lr: 0.0
  0%|                                                                                                                                                                                                                                  | 0/10 [00:03<?, ?it/s]
Traceback (most recent call last):
  File "/Users/yayamamo/git/ReFinED/src/refined/training/fine_tune/fine_tune.py", line 207, in <module>
    main()
  File "/Users/yayamamo/git/ReFinED/src/refined/training/fine_tune/fine_tune.py", line 44, in main
    start_fine_tuning_task(refined=refined,
  File "/Users/yayamamo/git/ReFinED/src/refined/training/fine_tune/fine_tune.py", line 95, in start_fine_tuning_task
    run_fine_tuning_loops(refined=refined, fine_tuning_args=fine_tuning_args,
  File "/Users/yayamamo/git/ReFinED/src/refined/training/fine_tune/fine_tune.py", line 114, in run_fine_tuning_loops
    for step, batch in tqdm(enumerate(training_dataloader), total=len(training_dataloader)):
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/yayamamo/.pyenv/versions/3.12.2/lib/python3.12/site-packages/torch/utils/data/dataloader.py", line 439, in __iter__
    return self._get_iterator()
           ^^^^^^^^^^^^^^^^^^^^
  File "/Users/yayamamo/.pyenv/versions/3.12.2/lib/python3.12/site-packages/torch/utils/data/dataloader.py", line 387, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/yayamamo/.pyenv/versions/3.12.2/lib/python3.12/site-packages/torch/utils/data/dataloader.py", line 1040, in __init__
    w.start()
  File "/Users/yayamamo/.pyenv/versions/3.12.2/lib/python3.12/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
                  ^^^^^^^^^^^^^^^^^
  File "/Users/yayamamo/.pyenv/versions/3.12.2/lib/python3.12/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/yayamamo/.pyenv/versions/3.12.2/lib/python3.12/multiprocessing/context.py", line 289, in _Popen
    return Popen(process_obj)
           ^^^^^^^^^^^^^^^^^^
  File "/Users/yayamamo/.pyenv/versions/3.12.2/lib/python3.12/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/Users/yayamamo/.pyenv/versions/3.12.2/lib/python3.12/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/Users/yayamamo/.pyenv/versions/3.12.2/lib/python3.12/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/Users/yayamamo/.pyenv/versions/3.12.2/lib/python3.12/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle 'Environment' object

@shern2
Copy link

shern2 commented Apr 23, 2024

My guess is that the torch data loader is trying spin up x number of worker processes to prepare the batches of data.
Problem is likely here:
https://github.com/amazon-science/ReFinED/blob/main/src/refined/dataset_reading/entity_linking/wikipedia_dataset.py

You can look it up, I think others also encountered similar issues with lmdb pickling when using multiple workers:
pytorch/vision#689 (comment)

Cheers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants