Streaming inference from CLI #2300

RESDXChgfore9hing · 2022-09-17T15:06:47Z

RESDXChgfore9hing
Sep 17, 2022

If you have a feature request, then please provide the following information:

A clear and concise description of what the problem is.
I'm curious if there is a *less hackish way to directly use the client cli directly do the inference on a realtime audio stream.

Describe the solution you'd like
Ideally a simple flag on the cli,with sufficient information to it eg,port number.

Describe alternatives you've considered
Directly pipe the stream from ffmpeg to the stt.

Additional context

wasertech · 2022-09-23T15:31:41Z

wasertech
Sep 23, 2022
Collaborator

Take a look at https://github.com/coqui-ai/STT-examples/tree/r1.0/ffmpeg_vad_streaming

0 replies

RESDXChgfore9hing · 2022-09-25T12:53:47Z

RESDXChgfore9hing
Sep 25, 2022
Author

it uses the js vad packages right?I assume the stt itself doesnt support vad natively?
so we need handle th vad logic ourselves?

0 replies

wasertech · 2022-09-25T14:27:17Z

wasertech
Sep 25, 2022
Collaborator

it uses the js vad packages right?

This particular example uses nodeJS yes.

I assume the stt itself doesnt support vad natively? so we need handle th vad logic ourselves?

There are multiple ways to handle voice activity detection. From using a dedicated library like webrtcVAD to not handling VAD at all and streaming continuously. It's specific to your needs so STT doesn't force any particular option for you.

0 replies

RESDXChgfore9hing · 2022-09-27T14:48:30Z

RESDXChgfore9hing
Sep 27, 2022
Author

ahhh i see,Ok ok my bad VAD is another thing itself and STT by default just get whatever is feed into it right(continuous by default)?so we need other function to do the VAD.
After re-read the example,i see the rtmp is directly handled by the ffmpeg and then it calls the STT js api to run the inference,
Is there a equivalent to:ffmpeg>STT,cli ?
without the use of api.perhaps the api is just some kind of cli command*builder?

0 replies

wasertech · 2022-09-27T16:35:17Z

wasertech
Sep 27, 2022
Collaborator

Is there a equivalent to:ffmpeg>STT,cli ?

Please don't. Using a shell script to achieve this is not appropriate.

I suggest a python script you can call from the shell that handles recording and processing of the audio directly to STT.

I wrote listen to do so. It's handles VAD, all supported languages and it's easy to use.

❯ listen --help
usage: listen [-h] [-f FILE] [--aggressive {0,1,2,3}] [-d MIC_DEVICE]
                   [-w SAVE_WAV]

Transcribe long audio files using webRTC VAD or use the streaming interface
from a microphone

options:
  -h, --help            show this help message and exit
  -f FILE, --file FILE  Path to the audio file to run (WAV format)
  --aggressive {0,1,2,3}
                        Determines how aggressive filtering out non-speech is.
                        (Interger between 0-3)
  -d MIC_DEVICE, --mic_device MIC_DEVICE
                        Device input index (Int) as listed by
                        pyaudio.PyAudio.get_device_info_by_index(). If not
                        provided, falls back to PyAudio.get_default_device().
  -w SAVE_WAV, --save_wav SAVE_WAV
                        Path to directory where to save recorded sentences
  --debug               Show debug info

It's a mix of mic_vad_streaming, vad_transcriber and python_websocket_server from STT-examples

I'll close your issue and convert it to a discussion about streaming audio from the CLI instead.

0 replies

RESDXChgfore9hing · 2022-09-29T16:35:00Z

RESDXChgfore9hing
Sep 29, 2022
Author

well my requirement is tually quite simple,I just want to have a stt that can run with low latency,24/7,in a daily use case ideally.
Which leads me to think of using shell or any compiled/(interpreted script more native to os).

0 replies

RESDXChgfore9hing · 2022-09-29T16:35:39Z

RESDXChgfore9hing
Sep 29, 2022
Author

i will give listen* a try.and see if it can work on my setup or not.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming inference from CLI #2300

{{title}}

Replies: 7 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Streaming inference from CLI #2300

RESDXChgfore9hing Sep 17, 2022

Replies: 7 comments

wasertech Sep 23, 2022 Collaborator

RESDXChgfore9hing Sep 25, 2022 Author

wasertech Sep 25, 2022 Collaborator

RESDXChgfore9hing Sep 27, 2022 Author

wasertech Sep 27, 2022 Collaborator

RESDXChgfore9hing Sep 29, 2022 Author

RESDXChgfore9hing Sep 29, 2022 Author

RESDXChgfore9hing
Sep 17, 2022

wasertech
Sep 23, 2022
Collaborator

RESDXChgfore9hing
Sep 25, 2022
Author

wasertech
Sep 25, 2022
Collaborator

RESDXChgfore9hing
Sep 27, 2022
Author

wasertech
Sep 27, 2022
Collaborator

RESDXChgfore9hing
Sep 29, 2022
Author

RESDXChgfore9hing
Sep 29, 2022
Author