Skip to content
@speechcatcher-asr

speechcatcher-asr

Speechcatcher

Speechcatcher is an open source toolbox for transcribing and translating speech from media files (audio/video). Speechcatcher models are trained using whisper as teacher and offer compact and small ASR models that run fast on CPUs too:

Speechcatcher Teacher/student training

Speechcatcher CLI

You can find the command line interface here. It can transcribe any media file and can also be used for live transcription with your microphone. In this repository, there is also an overview of all available speechcatcher models.

Data

Scripts to replicate the data gathering can be found in: speechcatcher-data. There also instructions on how to replicate the training procedure with espnet.

Webgui

Speechcatcher also comes with an easy to use webgui. It supports multiple ASR engines: speechcatcher (CPU), subtitle2go (CPU) or whisper (GPU).

Benchmarks

By using models that target a single language, Speechcatcher models aim to be much faster than single-model transcribe systems for multiple languages such as whisper.

See our results here.

Currently the focus is on transcribing German speech. Later, more languages might be added. If you would like to help to expand Speechcatcher, please get in touch!

Citation

If you use speechcatcher models in your research, for now just cite this repository:

@misc{milde2023speechcatcher,
  author = {Milde, Benjamin},
  title = {Speechcatcher},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/speechcatcher-asr/speechcatcher}},
}

Sponsors

Speechcatcher is gracefully funded by

Media Tech Lab by Media Lab Bayern (@media-tech-lab)

Popular repositories Loading

  1. speechcatcher speechcatcher Public

    Python 37 5

  2. speechcatcher-data speechcatcher-data Public

    Python 8

  3. speechcatcher-webgui speechcatcher-webgui Public

    Python 4

  4. espnet_streaming_decoder espnet_streaming_decoder Public

    An espnet streaming decoder with a smaller footprint than the entire espnet project

    Python 1 1

  5. .github .github Public

  6. espnet espnet Public

    Forked from espnet/espnet

    End-to-End Speech Processing Toolkit

    Python

Repositories

Showing 8 of 8 repositories

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…