Slow inference time using CPU #271

AvivSham · 2023-06-01T10:08:41Z

Hi All,
How are you?
Thank you for your valuable and amazing contribution.
I tried to inference whisper model large-v2 version with this repo and it seems like the inference time is pretty slow. I looked at some of the posted issues and benchmarks and saw that it should be lightning fast so obviously I'm doing something wrong.

Here is the code snip I used:

import time

from faster_whisper import WhisperModel

model_size = "large-v2"

# Run on GPU with FP16
# model = WhisperModel(model_size, device="cuda", compute_type="float16")

# or run on GPU with INT8
# model = WhisperModel(model_size, device="cuda", compute_type="int8_float16")
# or run on CPU with INT8
model = WhisperModel(model_size, device="cpu", cpu_threads=8, compute_type="int8")

for _ in range(10):
    stime = time.time()
    segments, info = model.transcribe(
        "data/foo_foo_sample.wav",
        beam_size=5,
        initial_prompt="focus on the word But."
    )
    print(f"transcribe time took: {time.time() - stime}")

print("Detected language '%s' with probability %f" % (info.language, info.language_probability))

for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

I ran it over multiple short samples <10 sec each using two different systems with int_8 compute_type:

Mac M2 with 32gb ram -> took around 5 sec to transcribe.
EC2 r3.4xlarge machine with Linux OS, 16 CPU cores, and 120gb ram -> took around 10 sec to transcribe

Note that changing the passed arguments did not help either (cpu_threads, beam_size, etc.)

Thank you in advance!

The text was updated successfully, but these errors were encountered:

guillaumekln · 2023-06-01T11:49:40Z

Hi,

First, you don't measure the inference time correctly. You should include the loop for segment in segments in the inference time. See the warning in the README about segments being a generator: https://github.com/guillaumekln/faster-whisper#usage

Second, what are you comparing the inference time against? It can only be slow relative to something else (e.g. slower than openai/whisper or whisper.cpp).

AvivSham · 2023-06-01T12:15:17Z

Thank you for responding.
So if I follow you the total inference time would be even greater than what was reported in my previous message.
I did not make any comparison, but I saw the benchmarks provided in the README in addition to some posted in previous issues. Is receiving ~5 / ~10 sec timing (not including inference according to your explanation) for less than 3 sec file reasonable?

hoonlight · 2023-06-01T12:34:59Z

You may need to reduce the threads. Even on my MacBook, threads greater than 6 resulted in slower results.

#133 (comment)

AvivSham · 2023-06-01T12:44:56Z

@archive-r reducing from 8 to 2 resulted in doubling the time.
It is not clear to me why calling transcribe takes so long since it's not performing the inference itself.

hoonlight · 2023-06-01T13:07:28Z

@archive-r reducing from 8 to 2 resulted in doubling the time.
It is not clear to me why calling transcribe takes so long since it's not performing the inference itself.

what is the sampling rate of the WAV file you used for your test?

AvivSham · 2023-06-01T13:09:27Z

@archive-r reducing from 8 to 2 resulted in doubling the time.
It is not clear to me why calling transcribe takes so long since it's not performing the inference itself.

what is the sampling rate of the WAV file you used for your test?

16 kHz

guillaumekln · 2023-06-01T13:17:48Z

model.transcribe will run the model once to detect the language. In that case the time you measured is mostly expected. The large-v2 model is very expensive to run on the CPU.

So the inference time may be slow relative to your expectation, but it's still much faster than openai/whisper for example. You should use a smaller model if it's still too slow for your usage.

AvivSham · 2023-06-01T13:27:28Z

@guillaumekln Thanks.
What if I have prior about the language? will it help to make it run faster?

guillaumekln · 2023-06-01T13:30:18Z

Sure, the call to model.transcribe will be faster if you set a language. You can set the language code with model.transcribe(..., language="en").

AvivSham · 2023-06-01T14:03:32Z

Indeed, it helped to accelerate the inference!
Thanks! @guillaumekln

zxl777 · 2023-06-05T20:18:41Z

Indeed, it's quite slow. A 2-minute MP3 takes 6 minutes to finish. Meanwhile, OpenAI's Whisper finishes within 2 minutes. Did I do something wrong? Here's the code I used:

        model = WhisperModel("small.en", device="cpu", compute_type="int8")
        segments, info = model.transcribe(self.src_filename, word_timestamps=True, beam_size=5,language="en")

AvivSham closed this as completed Jun 1, 2023

zxl777 mentioned this issue Jun 5, 2023

With CPU, it's quite slow. A 2-minute MP3 takes 6 minutes to finish. #279

Closed

Purfview mentioned this issue Oct 30, 2023

VAD is relatively slow #364

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow inference time using CPU #271

Slow inference time using CPU #271

AvivSham commented Jun 1, 2023

guillaumekln commented Jun 1, 2023

AvivSham commented Jun 1, 2023

hoonlight commented Jun 1, 2023 •

edited

Loading

AvivSham commented Jun 1, 2023

hoonlight commented Jun 1, 2023 •

edited

Loading

AvivSham commented Jun 1, 2023

guillaumekln commented Jun 1, 2023 •

edited

Loading

AvivSham commented Jun 1, 2023

guillaumekln commented Jun 1, 2023

AvivSham commented Jun 1, 2023

zxl777 commented Jun 5, 2023

Slow inference time using CPU #271

Slow inference time using CPU #271

Comments

AvivSham commented Jun 1, 2023

guillaumekln commented Jun 1, 2023

AvivSham commented Jun 1, 2023

hoonlight commented Jun 1, 2023 • edited Loading

AvivSham commented Jun 1, 2023

hoonlight commented Jun 1, 2023 • edited Loading

AvivSham commented Jun 1, 2023

guillaumekln commented Jun 1, 2023 • edited Loading

AvivSham commented Jun 1, 2023

guillaumekln commented Jun 1, 2023

AvivSham commented Jun 1, 2023

zxl777 commented Jun 5, 2023

hoonlight commented Jun 1, 2023 •

edited

Loading

hoonlight commented Jun 1, 2023 •

edited

Loading

guillaumekln commented Jun 1, 2023 •

edited

Loading