whisper.cpp scripts

In this repo I openly store my scripts I use with whisper.cpp, which is a C++ implementation of OpenAI's Whisper model. The idea is to have a handy, easy-to-follow reference for those times when I need to circle back to using whisper.cpp after a while.

Getting Started

You can skip the Core ML part for non-Apple Silicon devices.

Run the following to clone whisper.cpp repository
```
git submodule init
git submodule update
```
Build it with Core ML support by following its provided instructions.
Download the necessary ggml models with the download script ./download.sh <model>

Build whisper.cpp models from HuggingFace

Set model name and path

MODEL_NAME=small.ru
MODEL_PATH=erlandekh/whisper-small-russian

Patch models/convert-h5-to-coreml.py to allow your model name
Go to whisper.cpp folder and activate VirtualEnv
```
cd whisper.cpp
source venv/bin/activate
```

Download and convert the HuggingFace model to CoreML

models/generate-coreml-model.sh -h5 ${MODEL_NAME} ${MODEL_PATH}

Convert the downloaded model to ggml

python models/convert-pt-to-ggml.py models/hf-${MODEL_NAME}.pt ../whisper/ models/ use-f32
mv models/ggml-model-f32.bin models/ggml-${MODEL_NAME}.bin

Audio extraction

extract_audio.sh

This script extracts the audio track from a video file in a format compatible with whisper.cpp. Ensure ffmpeg is installed for audio extraction. Use Homebrew to install it by running brew install ffmpeg.

Usage:

Execute the script with two arguments: the input video file and the output audio file.
For example: ./extract_audio.sh input_video.mp4 output_audio.wav

Generating subtitles

generate_subs.sh

This script uses whisper.cpp to create subtitles from an audio file. Before usage, configure the script with the location of your whisper.cpp installation and the desired model.

Usage:

Run the script with three arguments: the model, the input audio file and the output SRT file.
For example: ./generate_subs.sh small.en input_audio.wav subtitles.srt

Real-time transcribing

stream.sh

This script relies on whisper.cpp's stream binary to transcribe audio in real-time from an audio capture device.

If you're on Apple Silicon device, you can compile whisper.cpp's stream with Core ML support for better performance. Run the following in your whisper.cpp folder

WHISPER_COREML=1 make stream -j

Before usage, update the script with the location of your whisper.cpp installation, the chosen model, and the audio capturing device. Also, ensure that you have compiled the stream tool in whisper.cpp.

To find the list of available audio capture device IDs, execute the stream binary within whisper.cpp; it will display the supported devices.

Usage:

Run the script with one arguments: the model.
For example: ./stream.sh small.en

Transcribing audio from system output

You can also transcribe live audio straight from your system's output using BlackHole audio loopback driver.

This is how it works

What you need to do:

Open the Audio MIDI Setup utility on your MacOS and create a Multi-Output Device. Make sure to include your preferred speakers and BlackHole as your output devices.
Then, select this new Multi-Output Device as your sound output.
Finally, specify BlackHole as the audio capture device for whisper.cpp.

A quick demo showing transcribing real-time audio from your browser. Don't forget to turn up the volume! 🔊

demo.webm

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
data		data
whisper @ 25639fc		whisper @ 25639fc
whisper.cpp @ fdbfb46		whisper.cpp @ fdbfb46
.editorconfig		.editorconfig
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
download.sh		download.sh
extract_audio.sh		extract_audio.sh
generate_subs.sh		generate_subs.sh
remove_hallucinations.py		remove_hallucinations.py
requirements.txt		requirements.txt
stream.sh		stream.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

whisper.cpp scripts

Getting Started

Build whisper.cpp models from HuggingFace

Audio extraction

Generating subtitles

Real-time transcribing

Transcribing audio from system output

About

Releases

Packages

Languages

tartakynov/whispercpp-scripts

Folders and files

Latest commit

History

Repository files navigation

whisper.cpp scripts

Getting Started

Build whisper.cpp models from HuggingFace

Audio extraction

Generating subtitles

Real-time transcribing

Transcribing audio from system output

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages