Annotated Whisper

This is my refactor of the whisper ASR model by OpenAI. When released the whisper repository had a very complicated function call stack and a lot of boilerplate which was unnecessary. This repo is the result of my exploration of the original whisper codebase during which I was heavily refactoring and removing parts I found redundant. I also added additional comments for clarity (including the dimensions of tensors at each step).

Original Paper Radford et al. "Robust Speech Recognition via Large-Scale Weak Supervision" 2022.
Original code OpenAI/whisper

Ubuntu Installation

sudo apt update && sudo apt install ffmpeg
git clone https://github.com/brandokoch/annotated_whisper
conda create -n annotated_whisper python=3.10 
conda activate annotated_whisper
pip install -r requirements.txt

Usage

Inference is ran using the infer.py script and providing the model type and audio pth. To adjust the inference configuration you can edit the default configuration in whisper/config.py.

cd repo_dir
python infer.py --model-type medium --audio-pth data/jfk.flac

Upcoming

Whisper Audio preprocessing deep-dive and step-by-step explanation
Whisper from scratch in a jupyter notebook

License

This repository is under an MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
whisper		whisper
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
infer.py		infer.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Annotated Whisper

Ubuntu Installation

Usage

Upcoming

License

About

Releases

Packages

Languages

License

brandokoch/annotated_whisper

Folders and files

Latest commit

History

Repository files navigation

Annotated Whisper

Ubuntu Installation

Usage

Upcoming

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages