Speaches

Note

This project was previously named faster-whisper-server. I've decided to change the name from faster-whisper-server, as the project has evolved to support more than just transcription.

Speaches

speaches is an OpenAI API-compatible server supporting streaming transcription, translation, and speech generation. Speach-to-Text is powered by faster-whisper and for Text-to-Speech piper and Kokoro are used. This project aims to be Ollama, but for TTS/STT models.

Try it out on the HuggingFace Space

See the documentation for installation instructions and usage: https://speaches-ai.github.io/speaches/

Features:

GPU and CPU support.
Deployable via Docker Compose / Docker
Highly configurable
OpenAI API compatible. All tools and SDKs that work with OpenAI's API should work with speaches.
Streaming support (transcription is sent via SSE as the audio is transcribed. You don't need to wait for the audio to fully be transcribed before receiving it).
- LocalAgreement2 (paper | original implementation) algorithm is used for live transcription.
Live transcription support (audio is sent via websocket as it's generated).
Dynamic model loading / offloading. Just specify which model you want to use in the request and it will be loaded automatically. It will then be unloaded after a period of inactivity.
Text-to-Speech via kokoro(Ranked #1 in the TTS Arena) and piper models.
Coming soon: Audio generation (chat completions endpoint) | OpenAI Documentation
- Generate a spoken audio summary of a body of text (text in, audio out)
- Perform sentiment analysis on a recording (audio in, text out)
- Async speech to speech interactions with a model (audio in, audio out)
Coming soon: Realtime API | OpenAI Documentation

Please create an issue if you find a bug, have a question, or a feature suggestion.

Demo

Streaming Transcription

TODO

Speech Generation

2025-01-12_13-20-58.webm

Live Transcription (using WebSockets)

demo.mp4

Name		Name	Last commit message	Last commit date
Latest commit History 329 Commits
.github/workflows		.github/workflows
configuration		configuration
docs		docs
examples		examples
scripts		scripts
src/speaches		src/speaches
tests		tests
.dockerignore		.dockerignore
.envrc		.envrc
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
Taskfile.yaml		Taskfile.yaml
audio.wav		audio.wav
compose.cpu.yaml		compose.cpu.yaml
compose.cuda-cdi.yaml		compose.cuda-cdi.yaml
compose.cuda.yaml		compose.cuda.yaml
compose.observability.yaml		compose.observability.yaml
compose.yaml		compose.yaml
flake.lock		flake.lock
flake.nix		flake.nix
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
renovate.json		renovate.json
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speaches

Features:

Demo

Streaming Transcription

Speech Generation

Live Transcription (using WebSockets)

About

Releases 1

Packages

Contributors 21

Languages

License

speaches-ai/speaches

Folders and files

Latest commit

History

Repository files navigation

Speaches

Features:

Demo

Streaming Transcription

Speech Generation

Live Transcription (using WebSockets)

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 21

Languages

Packages