Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support M1 GPU in FARMReader #2826

Closed
mathislucka opened this issue Jul 15, 2022 · 18 comments · Fixed by #3062
Closed

Support M1 GPU in FARMReader #2826

mathislucka opened this issue Jul 15, 2022 · 18 comments · Fixed by #3062
Assignees
Labels

Comments

@mathislucka
Copy link
Member

Is your feature request related to a problem? Please describe.
Since haystack v1.6 we have support for pytorch 1.12 which also means support for the M1 GPU. However, we currently initialize the device to be either cpu or cuda depending on availability and if the user passes in the use_gpu=True parameter. For GPU use on the M1, pytorch actually uses the mps backend. See: https://pytorch.org/docs/stable/notes/mps.html

If we could allow the users to pass in the actual device into the FARMReader then this might support of GPU training and inference on the M1 possible.

Describe the solution you'd like
Allow the user to pass in devices=[<device>] into FARMReader.__init__ and use these devices in initialize_device_settings. We could make this non-breaking by making this an optional argument to the reader init and the device initialization.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

@mathislucka
Copy link
Member Author

It is actually already there :D

@mathislucka
Copy link
Member Author

Reopening this, as the device is not used for the inferencer. See:

@mathislucka mathislucka reopened this Jul 15, 2022
@mathislucka
Copy link
Member Author

Additionally, currently transformers does not support pytorch 1.12 (see huggingface/transformers#17971 (comment)). When changing the code in inferencer to pass on the mps device. An error is raised during prediction:

Inferencing Samples:   0%|          | 0/1 [00:00<?, ? Batches/s]
Traceback (most recent call last):
  File "/opt/homebrew/Caskroom/miniforge/base/envs/fanal/lib/python3.9/site-packages/haystack/modeling/infer.py", line 520, in _get_predictions_and_aggregate
    logits = self.model.forward(**batch)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/fanal/lib/python3.9/site-packages/haystack/modeling/model/adaptive_model.py", line 477, in forward
    output_tuple = self.language_model.forward(
  File "/opt/homebrew/Caskroom/miniforge/base/envs/fanal/lib/python3.9/site-packages/haystack/modeling/model/language_model.py", line 700, in forward
    output_tuple = self.model(
  File "/opt/homebrew/Caskroom/miniforge/base/envs/fanal/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/fanal/lib/python3.9/site-packages/transformers/models/roberta/modeling_roberta.py", line 841, in forward
    embedding_output = self.embeddings(
  File "/opt/homebrew/Caskroom/miniforge/base/envs/fanal/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/fanal/lib/python3.9/site-packages/transformers/models/roberta/modeling_roberta.py", line 105, in forward
    position_ids = create_position_ids_from_input_ids(input_ids, self.padding_idx, past_key_values_length)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/fanal/lib/python3.9/site-packages/transformers/models/roberta/modeling_roberta.py", line 1574, in create_position_ids_from_input_ids
    incremental_indices = (torch.cumsum(mask, dim=1).type_as(mask) + past_key_values_length) * mask
NotImplementedError: The operator 'aten::cumsum.out' is not current implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.
python-BaseException

@mathislucka
Copy link
Member Author

Also see this for the current state of covered ops for the mps backend:

pytorch/pytorch#77764

@yli223
Copy link

yli223 commented Jul 19, 2022

Hey,

Thanks for sharing this information! I am new to haystack and wondering how to enable GPU in Mac Pro M1? I have PyTorch set up already with torch.backends.mps.is_available() = True. However, I still don't know how to activate it. Can you provide a bit more information?

Best

@sjrl
Copy link
Contributor

sjrl commented Jul 22, 2022

Hey, @yli223 we do not currently support the M1 GPU. We would need to implement the changes explained by @mathislucka above in Haystack. In addition, we need also need to wait for HuggingFace transformers to support PyTorch 1.12 which is required for the M1 GPU to work (more info here huggingface/transformers#17925).

@vblagoje
Copy link
Member

vblagoje commented Aug 18, 2022

Update: the HF PR has been merged to main. Therefore, we can use this feature as soon as we support HF v4.21.2 release (as soon as it gets released). Do we need to add devices optional parameter anywhere else except infer.py @mathislucka @sjrl ?

@sjrl
Copy link
Contributor

sjrl commented Aug 18, 2022

That's great! I would say that anywhere the user passes an option to initialize_device_settings should have the option of passing a list of devices instead. Similar to what is already done in this load function for the Inferencer

if devices is None:
devices, n_gpu = initialize_device_settings(use_cuda=gpu, multi_gpu=False)

where devices is of type

devices: Optional[List[torch.device]] = None,

So what is inconsistent at the moment is that the devices option is only supported in some places in Haystack. And I think we should support it everywhere where the user can pass in the use_gpu boolean.

@vblagoje
Copy link
Member

@sjrl, so what you are saying is that every function, including the component constructor where we currently pass use_gpu should have devices as an optional argument. And second, we should make sure that the deterministic approach to device selection defined in initialize_device_settings, is used in every case where we pass the devices parameter. Correct?

@sjrl
Copy link
Contributor

sjrl commented Aug 18, 2022

so what you are saying is that every function, including the component constructor where we currently pass use_gpu should have devices as an optional argument.

Yes I think this makes sense to help standardize how devices are specified in Haystack.

And second, we should make sure that the deterministic approach to device selection defined in initialize_device_settings, is used in every case where we pass the devices parameter. Correct?

I'm not entirely sure what you mean here. Do you mean we should always use this statement everywhere we have added the devices optional parameter?

 if devices is None: 
     devices, n_gpu = initialize_device_settings(use_cuda=gpu, multi_gpu=False) 

@vblagoje
Copy link
Member

Yes, it seems to be already used everywhere, but we should make sure that it does get used in addition to making sure we provide devices parameter.

@sjrl
Copy link
Contributor

sjrl commented Aug 18, 2022

Yes, it seems to be already used everywhere, but we should make sure that it does get used in addition to making sure we provide devices parameter.

Yes I agree.

@vblagoje
Copy link
Member

Update: although HF has recently added support for devices in pipelines the main blocker for the Haystack deployment on Apple Silicone M1/M2 remains MPS implementation of torch cumsum operator which is used extensively in all HF models.

@vblagoje
Copy link
Member

vblagoje commented Nov 28, 2022

However, seq2seq generative models still don't work (whenever GenerationMixin is used). The error is

NotImplementedError: The operator 'aten::remainder.Tensor_out' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

So now we have to wait for pytorch/pytorch#86806

@laike9m
Copy link

laike9m commented Oct 10, 2023

Hi @vblagoje, the blocking issue has been fixed. May I ask what the current status for M1 GPU support is? At least from the documentation, it didn't mention Apple Silicon support, so I suppose it's still not supported:
https://docs.haystack.deepset.ai/docs/enabling-gpu-acceleration

@vblagoje
Copy link
Member

@laike9m haven't tried it in a while tbh. Having looked at pytorch/pytorch#86806 it seems like it should work now. Please try it out and let us know. If not, I'll get to this task next week or so

@laike9m
Copy link

laike9m commented Oct 10, 2023

Thanks. I can give it a try, where I can find the instructions to enable it? (sorry I'm pretty new to haystack)

@lvdinergy
Copy link

Still getting the error:
NotImplementedError: The operator 'aten::remainder.Tensor_out' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on pytorch/pytorch#77764. As a temporary fix, you can set the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1 to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

Running MacOS Sonoma 14.2.1 (23C71)

I have PyTorch 2.1.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants