Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Wav2Vec2FeatureExtractor] Fix extractor.pad() dtype backwards compatibility #13693

Merged
merged 3 commits into from
Sep 22, 2021
Merged

[Wav2Vec2FeatureExtractor] Fix extractor.pad() dtype backwards compatibility #13693

merged 3 commits into from
Sep 22, 2021

Conversation

anton-l
Copy link
Member

@anton-l anton-l commented Sep 22, 2021

Resolves #13689

This fixes an issue introduced by #13650 with speech feature extractors' tensors being returned as torch.float64 when .pad() is called directly:

from transformers import Wav2Vec2FeatureExtractor
import numpy as np

extractor = Wav2Vec2FeatureExtractor.from_pretrained("facebook/wav2vec2-base-960h")

rand_input = np.ones((100,), dtype=np.float64)
out = extractor.pad([{"input_values": rand_input}], return_tensors="pt")

print(out.dtype) # <- this should be `torch.float32`

This is due to how pytorch converts float numpy arrays (new padding logic) vs python lists (old padding logic):

  • uses torch.float32 for python lists by default: torch.tensor([1.2, 2.3]).dtype # torch.float32
  • np.array([1.2, 2.3]).dtype # np.float64
  • uses source dtype for numpy arrays: torch.tensor(np.array([1.2, 2.3])).dtype # torch.float64

@@ -187,23 +187,6 @@ def pad(
padding_strategy = self._get_padding_strategies(padding=padding, max_length=max_length)

required_input = processed_features[self.model_input_names[0]]
if required_input and not isinstance(required_input[0], np.ndarray):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover logic from the old implementation, required_input[0] is always np.ndarray now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch!

Comment on lines +226 to +227
if value.dtype is np.dtype(np.float64):
value = value.astype(np.float32)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding dtype customization to .pad(), just always use np.float32, until we really need float64 support for some new models.

@patrickvonplaten patrickvonplaten merged commit 75f6641 into huggingface:master Sep 22, 2021
@anton-l anton-l deleted the fix-w2v2-fp64-pad branch September 22, 2021 09:15
Narsil pushed a commit to Narsil/transformers that referenced this pull request Sep 25, 2021
…atibility (huggingface#13693)

* Force dtype, add tests

* Local torch imports

* Remove unused logic (always ndarray)
stas00 pushed a commit to stas00/transformers that referenced this pull request Oct 12, 2021
…atibility (huggingface#13693)

* Force dtype, add tests

* Local torch imports

* Remove unused logic (always ndarray)
Albertobegue pushed a commit to Albertobegue/transformers that referenced this pull request Jan 13, 2022
…atibility (huggingface#13693)

* Force dtype, add tests

* Local torch imports

* Remove unused logic (always ndarray)
Albertobegue pushed a commit to Albertobegue/transformers that referenced this pull request Jan 27, 2022
…atibility (huggingface#13693)

* Force dtype, add tests

* Local torch imports

* Remove unused logic (always ndarray)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

New Wav2Vec2 padding has slightly backward breaking changes
2 participants