speechcatcher-asr

Speechcatcher

Speechcatcher is an open source toolbox for transcribing and translating speech from media files (audio/video). Speechcatcher models are trained using whisper as teacher and offer compact and small ASR models that run fast on CPUs too:

Speechcatcher CLI

You can find the command line interface here. It can transcribe any media file and can also be used for live transcription with your microphone. In this repository, there is also an overview of all available speechcatcher models.

Data

Scripts to replicate the data gathering can be found in: speechcatcher-data. There also instructions on how to replicate the training procedure with espnet.

Webgui

Speechcatcher also comes with an easy to use webgui. It supports multiple ASR engines: speechcatcher (CPU), subtitle2go (CPU) or whisper (GPU).

Benchmarks

By using models that target a single language, Speechcatcher models aim to be much faster than single-model transcribe systems for multiple languages such as whisper.

See our results here.

Currently the focus is on transcribing German speech. Later, more languages might be added. If you would like to help to expand Speechcatcher, please get in touch!

Citation

If you use speechcatcher models in your research, for now just cite this repository:

@misc{milde2023speechcatcher,
  author = {Milde, Benjamin},
  title = {Speechcatcher},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/speechcatcher-asr/speechcatcher}},
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

speechcatcher-asr

Speechcatcher

Speechcatcher CLI

Data

Webgui

Benchmarks

Citation

Sponsors

Popular repositories Loading

Repositories

People

Top languages

Most used topics