Running whisper.cpp at Scale and in Parallel #1408

nishanthrs · 2023-10-31T21:09:21Z

Just want to start off by saying this is an amazing project!

I'm trying to use this for my own needs to transcribe hundreds of audio files. I'm wondering how I can leverage this library, the parallelism features, and my machine to do this task as quickly and efficiently as possible.

I am testing out various commands out on 4 audio files (three of them are ~5-10 mins long, one is over an hour).

I first tried the regular command with 4 threads (default):
ls $processed_audios_dir | $whisper_cpp_exec_path -t 4 -m $whisper_cpp_model_path -f "$processed_audios_dir_name/{}.wav" --output-srt
This took around 6-7 mins on my Mac M1 (~300-400% CPU). Scaling up to 5 and 6 threads didn't seem to do much good on my machine. In fact, it was slower in many instances.

I then tried GNU parallel:
parallel -j+0 $whisper_cpp_exec_path -m $whisper_cpp_model_path -f {} --output-srt ::: $(ls $processed_audios_dir)
However, this took around 8 mins on my Mac M1 (~500-600% CPU).

Given this, I have a few questions:

Why is GNU parallel running slower in transcribing these audio files? Is it b/c the model has to be loaded multiple times like referenced in this issue?
When -t <num_threads is specified, how is the processing work divided up among the threads? It seems even with multiple input files specified for the -f flag, they're still processed sequentially.
How should the command be configured to run as efficiently as possible on hundreds of audio files?

The text was updated successfully, but these errors were encountered:

bobqianic · 2023-10-31T22:39:34Z

Why is GNU parallel running slower in transcribing these audio files? Is it b/c the model has to be loaded multiple times like referenced in #22?

Since you've forked multiple processes and created numerous threads, these threads are now competing for resources with each other. The primary factors limiting your inference speed are the rate of matrix multiplication and memory bandwidth. When threads compete, the on-chip cache is flushed out more frequently, which reduces memory locality and, consequently, lowers the FLOPs during matrix multiplication. Additionally, whenever the operating system switches threads on a CPU core, the contexts of these threads have to be stored and then restored, further decreasing the processing speed.

When -t <num_threads is specified, how is the processing work divided up among the threads?

It's hard to give a straightforward answer. Each input will pass through multiple operators. I'm not sure how work divided up among the threads.

How should the command be configured to run as efficiently as possible on hundreds of audio files?

Whisper.cpp provides the capability for full GPU offloading via Metal, which should represent the fastest method for transcribing hundreds of audio files. To utilize this feature, simply compile the latest master branch on your M1 machine. Setting the -t parameter to 1 should yield the best performance.

nishanthrs · 2023-11-01T21:26:19Z

Thanks for the detailed answer! The GNU parallel slowdown makes sense now.

I followed the instructions to use the CoreML model and it runs incredibly fast (~3 mins for -t 4)! Thanks for the pointer to the GPU offloading via metal.
Just had a follow-up question on your last point: how would setting the -t parameter to 1 yield the best performance? Is it b/c the new CoreML model leverages the GPU and less threads in the CPU leads to less context switching and competition for resources? It was around the same processing time as -t 4 when I ran it.

bobqianic added the question Further information is requested label Oct 31, 2023

nishanthrs closed this as completed Nov 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running whisper.cpp at Scale and in Parallel #1408

Running whisper.cpp at Scale and in Parallel #1408

nishanthrs commented Oct 31, 2023

bobqianic commented Oct 31, 2023

nishanthrs commented Nov 1, 2023 •

edited

Loading

Running whisper.cpp at Scale and in Parallel #1408

Running whisper.cpp at Scale and in Parallel #1408

Comments

nishanthrs commented Oct 31, 2023

bobqianic commented Oct 31, 2023

nishanthrs commented Nov 1, 2023 • edited Loading

nishanthrs commented Nov 1, 2023 •

edited

Loading