Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow inference time using CPU #271

Closed
AvivSham opened this issue Jun 1, 2023 · 11 comments
Closed

Slow inference time using CPU #271

AvivSham opened this issue Jun 1, 2023 · 11 comments

Comments

@AvivSham
Copy link

AvivSham commented Jun 1, 2023

Hi All,
How are you?
Thank you for your valuable and amazing contribution.
I tried to inference whisper model large-v2 version with this repo and it seems like the inference time is pretty slow. I looked at some of the posted issues and benchmarks and saw that it should be lightning fast so obviously I'm doing something wrong.

Here is the code snip I used:

import time

from faster_whisper import WhisperModel

model_size = "large-v2"

# Run on GPU with FP16
# model = WhisperModel(model_size, device="cuda", compute_type="float16")

# or run on GPU with INT8
# model = WhisperModel(model_size, device="cuda", compute_type="int8_float16")
# or run on CPU with INT8
model = WhisperModel(model_size, device="cpu", cpu_threads=8, compute_type="int8")

for _ in range(10):
    stime = time.time()
    segments, info = model.transcribe(
        "data/foo_foo_sample.wav",
        beam_size=5,
        initial_prompt="focus on the word But."
    )
    print(f"transcribe time took: {time.time() - stime}")

print("Detected language '%s' with probability %f" % (info.language, info.language_probability))

for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

I ran it over multiple short samples <10 sec each using two different systems with int_8 compute_type:

  1. Mac M2 with 32gb ram -> took around 5 sec to transcribe.
  2. EC2 r3.4xlarge machine with Linux OS, 16 CPU cores, and 120gb ram -> took around 10 sec to transcribe

Note that changing the passed arguments did not help either (cpu_threads, beam_size, etc.)

Thank you in advance!

@guillaumekln
Copy link
Contributor

Hi,

First, you don't measure the inference time correctly. You should include the loop for segment in segments in the inference time. See the warning in the README about segments being a generator: https://github.com/guillaumekln/faster-whisper#usage

Second, what are you comparing the inference time against? It can only be slow relative to something else (e.g. slower than openai/whisper or whisper.cpp).

@AvivSham
Copy link
Author

AvivSham commented Jun 1, 2023

Thank you for responding.
So if I follow you the total inference time would be even greater than what was reported in my previous message.
I did not make any comparison, but I saw the benchmarks provided in the README in addition to some posted in previous issues. Is receiving ~5 / ~10 sec timing (not including inference according to your explanation) for less than 3 sec file reasonable?

@hoonlight
Copy link
Contributor

hoonlight commented Jun 1, 2023

You may need to reduce the threads. Even on my MacBook, threads greater than 6 resulted in slower results.

#133 (comment)

@AvivSham
Copy link
Author

AvivSham commented Jun 1, 2023

@archive-r reducing from 8 to 2 resulted in doubling the time.
It is not clear to me why calling transcribe takes so long since it's not performing the inference itself.

@hoonlight
Copy link
Contributor

hoonlight commented Jun 1, 2023

@archive-r reducing from 8 to 2 resulted in doubling the time.
It is not clear to me why calling transcribe takes so long since it's not performing the inference itself.

what is the sampling rate of the WAV file you used for your test?

@AvivSham
Copy link
Author

AvivSham commented Jun 1, 2023

@archive-r reducing from 8 to 2 resulted in doubling the time.
It is not clear to me why calling transcribe takes so long since it's not performing the inference itself.

what is the sampling rate of the WAV file you used for your test?

16 kHz

@guillaumekln
Copy link
Contributor

guillaumekln commented Jun 1, 2023

model.transcribe will run the model once to detect the language. In that case the time you measured is mostly expected. The large-v2 model is very expensive to run on the CPU.

So the inference time may be slow relative to your expectation, but it's still much faster than openai/whisper for example. You should use a smaller model if it's still too slow for your usage.

@AvivSham
Copy link
Author

AvivSham commented Jun 1, 2023

@guillaumekln Thanks.
What if I have prior about the language? will it help to make it run faster?

@guillaumekln
Copy link
Contributor

Sure, the call to model.transcribe will be faster if you set a language. You can set the language code with model.transcribe(..., language="en").

@AvivSham
Copy link
Author

AvivSham commented Jun 1, 2023

Indeed, it helped to accelerate the inference!
Thanks! @guillaumekln

@AvivSham AvivSham closed this as completed Jun 1, 2023
@zxl777
Copy link

zxl777 commented Jun 5, 2023

Indeed, it's quite slow. A 2-minute MP3 takes 6 minutes to finish. Meanwhile, OpenAI's Whisper finishes within 2 minutes. Did I do something wrong? Here's the code I used:

        model = WhisperModel("small.en", device="cpu", compute_type="int8")
        segments, info = model.transcribe(self.src_filename, word_timestamps=True, beam_size=5,language="en")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants