A proof-of-concept of a Discord bot able to connect into a voice channel, strip down each voice data, and run through a speech-to-text converter.
- Speech-to-text services available:
- (local, free) OpenAI's Whisper model - (run on cli)
- (local, free) whisper-ctranslate2 as interface to Whisper - (run on cli, faster output than original whisper)
- (remote, paid) OpenAI Whisper API
- (remote, paid - with free tier) Microsoft Azure STT AI - (allow real-time decoding, fastest, but not as accurate)
- Discord.js as a Discord interface
- (for local use) prism-media for media conversion (opus to pcm)
- (for local use) ffmpeg to process audio (pcm to mp3)