Skip to content

This is my refactor of the whisper ASR model by OpenAI. When released the whisper repository had a very complicated function call stack and a lot of boilerplate which was unnecessary. This repo is the result of my exploration of the original whisper codebase during which I was heavily refactoring and removing parts I found redundant.

License

Notifications You must be signed in to change notification settings

brandokoch/annotated_whisper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Annotated Whisper

This is my refactor of the whisper ASR model by OpenAI. When released the whisper repository had a very complicated function call stack and a lot of boilerplate which was unnecessary. This repo is the result of my exploration of the original whisper codebase during which I was heavily refactoring and removing parts I found redundant. I also added additional comments for clarity (including the dimensions of tensors at each step).

Ubuntu Installation

sudo apt update && sudo apt install ffmpeg
git clone https://github.com/brandokoch/annotated_whisper
conda create -n annotated_whisper python=3.10 
conda activate annotated_whisper
pip install -r requirements.txt 

Usage

Inference is ran using the infer.py script and providing the model type and audio pth. To adjust the inference configuration you can edit the default configuration in whisper/config.py.

cd repo_dir
python infer.py --model-type medium --audio-pth data/jfk.flac

Upcoming

  • Whisper Audio preprocessing deep-dive and step-by-step explanation
  • Whisper from scratch in a jupyter notebook

License

This repository is under an MIT License

License: MIT

About

This is my refactor of the whisper ASR model by OpenAI. When released the whisper repository had a very complicated function call stack and a lot of boilerplate which was unnecessary. This repo is the result of my exploration of the original whisper codebase during which I was heavily refactoring and removing parts I found redundant.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages