Adding support for `fp16` for asr pipeline. #20864

Narsil · 2022-12-21T17:54:15Z

What does this PR do?

Fixes #20862

Many things were considered before settling for this design.

feature_extractor(return_tensors="pt¨, torch_dtype=torch_dtype) . This would have the advantage of being consistent, but not all feature extractors to define this, so it would affect all of them. Then why would we use torch_dtype instead of the more common place dtype which could be applied to TF and flax as well. Also it feels a bit redundant to specify both return_tensors and torch_dtype, it would be a good candidate to fuse both parameters (but outisde the scope of this PR).
AutoFeatureExtractor.from_pretrained(..., torch_dtype=torch_dtype). This would have the advantage of being overall so users don't need to respecify on each call. However we can't specifiy return_tensors="pt" in there either, so for consistency I didn't try to put it there.
ffmpeg_read(..., dtype=dtype) This would be nice to load directly the waveform into fp16 and just let fp16 flow through the feature_extractor. However, whisper in particular uses mel_spectrogram, so using f16 sound might actually damage performance.

In the end, this solution is the simplement I could come up with. Let torch_dtype flow to the pipeline, use it as a regular parameter and convert the output of the feature_extractor after.

This does incur a potentially extra copy but there's no risk of damaging quality of the input.

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

This reverts commit 0b917fc.

HuggingFaceDocBuilderDev · 2022-12-21T19:00:08Z

The documentation is not available anymore as the PR was closed or merged.

accept torch_dtype=fp16). Also we need to use a GPU to actually compute on fp16.

bofenghuang · 2022-12-22T00:04:11Z

src/transformers/pipelines/automatic_speech_recognition.py

                yield item
        else:
            processed = self.feature_extractor(
                inputs, sampling_rate=self.feature_extractor.sampling_rate, return_tensors="pt"
            )
+            if dtype is not None:
+                processed = {k: v.to(dtype=dtype) for k, v in processed.items()}


Hi @Narsil,

I think this works fine for whisper models because they only have a single value input_features.

But in case of other models like wav2vec2, the model have multiple values of different dtypes, input_values which need to be casted from float32 to float16, and attention_mask I'm not sure to keep as int32 or cast to int16

Yes. And as above, if you directly use the to method on processed, it will take care of that for you.

Done. Thanks, TIL

sgugger

Thanks for working on this! My only comment is to make sure to leverage the to method on BatchFeature (if the feature extractor here returns another type, maybe make sure its to method handles dtype arguments) so that checks like not converting int inputs are applied for free.

Otherwise LGTM!

sgugger · 2022-12-22T06:41:25Z

src/transformers/pipelines/automatic_speech_recognition.py

    inputs_len = inputs.shape[0]
    step = chunk_len - stride_left - stride_right
    for i in range(0, inputs_len, step):
        # add start and end paddings to the chunk
        chunk = inputs[i : i + chunk_len]
        processed = feature_extractor(chunk, sampling_rate=feature_extractor.sampling_rate, return_tensors="pt")
+        if dtype is not None:
+            processed = {k: v.to(dtype=dtype) for k, v in processed.items()}


I believe you can call the to directly on processed, which is a BatchFeature and handles dtype in its to method thanks to #20536 (was designed for vision but I think it will apply here too).

sgugger · 2022-12-22T06:41:37Z

src/transformers/pipelines/automatic_speech_recognition.py

@@ -249,7 +253,8 @@ def _sanitize_parameters(self, **kwargs):

        return preprocess_params, {}, postprocess_params

-    def preprocess(self, inputs, chunk_length_s=0, stride_length_s=None, ignore_warning=False):
+    def preprocess(self, inputs, chunk_length_s=0, stride_length_s=None, ignore_warning=False, dtype=None):
+        print(f"Running with dtype {dtype}")


To be cleaned up ;-)

sgugger · 2022-12-22T06:42:14Z

src/transformers/pipelines/automatic_speech_recognition.py

                yield item
        else:
            processed = self.feature_extractor(
                inputs, sampling_rate=self.feature_extractor.sampling_rate, return_tensors="pt"
            )
+            if dtype is not None:
+                processed = {k: v.to(dtype=dtype) for k, v in processed.items()}


Yes. And as above, if you directly use the to method on processed, it will take care of that for you.

* Supporting `fp16` for asr pipeline * Adding test. * Style. * Oops. * Flake8 update ? * Fixing flake8 ? * Revert "Flake8 update ?" This reverts commit 0b917fc. * Style (acctidentally deleted flake8 F401.) * Move to a bigger test (no small whisper model, and s2t doesn't seem to accept torch_dtype=fp16). Also we need to use a GPU to actually compute on fp16. * Using BatchFeature capability.

Narsil added 4 commits December 21, 2022 18:50

Supporting fp16 for asr pipeline

6b3e4d2

Adding test.

defceec

Style.

216852d

Oops.

e6e0feb

Narsil mentioned this pull request Dec 21, 2022

Run AutomaticSpeechRecognitionPipeline with FP16 #20862

Closed

Narsil added 4 commits December 21, 2022 19:22

Flake8 update ?

0b917fc

Fixing flake8 ?

b28fc52

Revert "Flake8 update ?"

5f09fb8

This reverts commit 0b917fc.

Style (acctidentally deleted flake8 F401.)

e0755e5

Move to a bigger test (no small whisper model, and s2t doesn't seem to

9d0f155

accept torch_dtype=fp16). Also we need to use a GPU to actually compute on fp16.

Narsil requested review from sgugger and ArthurZucker December 21, 2022 19:32

bofenghuang reviewed Dec 22, 2022

View reviewed changes

sgugger approved these changes Dec 22, 2022

View reviewed changes

Using BatchFeature capability.

570107a

sgugger approved these changes Dec 23, 2022

View reviewed changes

Narsil merged commit f7f0ec2 into huggingface:main Dec 23, 2022

Narsil deleted the support_fp16_asr branch December 23, 2022 09:18

bofenghuang mentioned this pull request Dec 27, 2022

Run TextGenerationPipeline in FP16 #20912

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding support for `fp16` for asr pipeline. #20864

Adding support for `fp16` for asr pipeline. #20864

Narsil commented Dec 21, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Dec 21, 2022 •

edited

Loading

bofenghuang Dec 22, 2022 •

edited

Loading

sgugger Dec 22, 2022

Narsil Dec 22, 2022

sgugger left a comment

sgugger Dec 22, 2022

Narsil Dec 22, 2022

sgugger Dec 22, 2022

Narsil Dec 22, 2022

sgugger Dec 22, 2022

Adding support for fp16 for asr pipeline. #20864

Adding support for fp16 for asr pipeline. #20864

Conversation

Narsil commented Dec 21, 2022 • edited Loading

What does this PR do?

Before submitting

Who can review?

HuggingFaceDocBuilderDev commented Dec 21, 2022 • edited Loading

bofenghuang Dec 22, 2022 • edited Loading

Choose a reason for hiding this comment

sgugger Dec 22, 2022

Choose a reason for hiding this comment

Narsil Dec 22, 2022

Choose a reason for hiding this comment

sgugger left a comment

Choose a reason for hiding this comment

sgugger Dec 22, 2022

Choose a reason for hiding this comment

Narsil Dec 22, 2022

Choose a reason for hiding this comment

sgugger Dec 22, 2022

Choose a reason for hiding this comment

Narsil Dec 22, 2022

Choose a reason for hiding this comment

sgugger Dec 22, 2022

Choose a reason for hiding this comment

Adding support for `fp16` for asr pipeline. #20864

Adding support for `fp16` for asr pipeline. #20864

Narsil commented Dec 21, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Dec 21, 2022 •

edited

Loading

bofenghuang Dec 22, 2022 •

edited

Loading