-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running whisper.cpp at Scale and in Parallel #1408
Comments
Since you've forked multiple processes and created numerous threads, these threads are now competing for resources with each other. The primary factors limiting your inference speed are the rate of matrix multiplication and memory bandwidth. When threads compete, the on-chip cache is flushed out more frequently, which reduces memory locality and, consequently, lowers the FLOPs during matrix multiplication. Additionally, whenever the operating system switches threads on a CPU core, the contexts of these threads have to be stored and then restored, further decreasing the processing speed.
It's hard to give a straightforward answer. Each input will pass through multiple operators. I'm not sure how work divided up among the threads.
Whisper.cpp provides the capability for full GPU offloading via Metal, which should represent the fastest method for transcribing hundreds of audio files. To utilize this feature, simply compile the latest master branch on your M1 machine. Setting the |
Thanks for the detailed answer! The GNU parallel slowdown makes sense now. I followed the instructions to use the CoreML model and it runs incredibly fast (~3 mins for |
Just want to start off by saying this is an amazing project!
I'm trying to use this for my own needs to transcribe hundreds of audio files. I'm wondering how I can leverage this library, the parallelism features, and my machine to do this task as quickly and efficiently as possible.
I am testing out various commands out on 4 audio files (three of them are ~5-10 mins long, one is over an hour).
I first tried the regular command with 4 threads (default):
ls $processed_audios_dir | $whisper_cpp_exec_path -t 4 -m $whisper_cpp_model_path -f "$processed_audios_dir_name/{}.wav" --output-srt
This took around 6-7 mins on my Mac M1 (~300-400% CPU). Scaling up to 5 and 6 threads didn't seem to do much good on my machine. In fact, it was slower in many instances.
I then tried GNU parallel:
parallel -j+0 $whisper_cpp_exec_path -m $whisper_cpp_model_path -f {} --output-srt ::: $(ls $processed_audios_dir)
However, this took around 8 mins on my Mac M1 (~500-600% CPU).
Given this, I have a few questions:
-t <num_threads
is specified, how is the processing work divided up among the threads? It seems even with multiple input files specified for the -f flag, they're still processed sequentially.The text was updated successfully, but these errors were encountered: