This is my refactor of the whisper ASR model by OpenAI. When released the whisper repository had a very complicated function call stack and a lot of boilerplate which was unnecessary. This repo is the result of my exploration of the original whisper codebase during which I was heavily refactoring and removing parts I found redundant. I also added additional comments for clarity (including the dimensions of tensors at each step).
- Original Paper Radford et al. "Robust Speech Recognition via Large-Scale Weak Supervision" 2022.
- Original code OpenAI/whisper
sudo apt update && sudo apt install ffmpeg
git clone https://github.com/brandokoch/annotated_whisper
conda create -n annotated_whisper python=3.10
conda activate annotated_whisper
pip install -r requirements.txt
Inference is ran using the infer.py
script and providing the model type and audio pth. To adjust the inference configuration you can edit the default configuration in whisper/config.py
.
cd repo_dir
python infer.py --model-type medium --audio-pth data/jfk.flac
- Whisper Audio preprocessing deep-dive and step-by-step explanation
- Whisper from scratch in a jupyter notebook
This repository is under an MIT License