Streaming inference from CLI #2300
Replies: 7 comments
-
Take a look at https://github.com/coqui-ai/STT-examples/tree/r1.0/ffmpeg_vad_streaming |
Beta Was this translation helpful? Give feedback.
-
it uses the js vad packages right?I assume the stt itself doesnt support vad natively? |
Beta Was this translation helpful? Give feedback.
-
This particular example uses nodeJS yes.
There are multiple ways to handle voice activity detection. From using a dedicated library like webrtcVAD to not handling VAD at all and streaming continuously. It's specific to your needs so STT doesn't force any particular option for you. |
Beta Was this translation helpful? Give feedback.
-
ahhh i see,Ok ok my bad VAD is another thing itself and STT by default just get whatever is feed into it right(continuous by default)?so we need other function to do the VAD. |
Beta Was this translation helpful? Give feedback.
-
Please don't. Using a shell script to achieve this is not appropriate. I suggest a python script you can call from the shell that handles recording and processing of the audio directly to STT. I wrote listen to do so. It's handles VAD, all supported languages and it's easy to use. ❯ listen --help
usage: listen [-h] [-f FILE] [--aggressive {0,1,2,3}] [-d MIC_DEVICE]
[-w SAVE_WAV]
Transcribe long audio files using webRTC VAD or use the streaming interface
from a microphone
options:
-h, --help show this help message and exit
-f FILE, --file FILE Path to the audio file to run (WAV format)
--aggressive {0,1,2,3}
Determines how aggressive filtering out non-speech is.
(Interger between 0-3)
-d MIC_DEVICE, --mic_device MIC_DEVICE
Device input index (Int) as listed by
pyaudio.PyAudio.get_device_info_by_index(). If not
provided, falls back to PyAudio.get_default_device().
-w SAVE_WAV, --save_wav SAVE_WAV
Path to directory where to save recorded sentences
--debug Show debug info It's a mix of mic_vad_streaming, vad_transcriber and python_websocket_server from STT-examples I'll close your issue and convert it to a discussion about streaming audio from the CLI instead. |
Beta Was this translation helpful? Give feedback.
-
well my requirement is tually quite simple,I just want to have a stt that can run with low latency,24/7,in a daily use case ideally. |
Beta Was this translation helpful? Give feedback.
-
i will give listen* a try.and see if it can work on my setup or not. |
Beta Was this translation helpful? Give feedback.
-
If you have a feature request, then please provide the following information:
A clear and concise description of what the problem is.
I'm curious if there is a *less hackish way to directly use the client cli directly do the inference on a realtime audio stream.
Describe the solution you'd like
Ideally a simple flag on the cli,with sufficient information to it eg,port number.
Describe alternatives you've considered
Directly pipe the stream from ffmpeg to the stt.
Additional context
Beta Was this translation helpful? Give feedback.
All reactions