Voice assistant example - the "command" tool #171

ggerganov · 2022-11-23T07:12:47Z

There seems to be significant interest for a voice assistant application of Whisper, similar to "Ok, Google", "Hey Siri", "Alexa", etc. The existing stream tool is not very applicable for this use case, because the voice assistant commands are usually short (i.e. play some music, turn on the TV, kill all humans, feed the baby, etc), while stream expects a continuous stream of speech.

Therefore, implement a basic command-line tool called command that does the following:

Upon start, asks the person to say a "key phrase". The phrase should be an average sentence that normally takes 2-3 seconds to pronounce. We want to have enough "training" data of the person's voice
If the transcribed text matches the expected phrase, then we "remember" this audio and use it later. Else, we ask to say it again until we succeed
We start listening continuously for voice activity using my VAD detector that I implemented for talk.wasm - I think it works very well given it's simplicity
When we detect speech, we prepend the recorded key-phrase to the last 2-3 seconds of the live audio and transcribe
The result should be: [key phrase][command], so by knowing the key phrase we can extract only the [command]

This should work in Web and Raspberry Pi and thanks to the VAD, it will be energy efficient.
Should be a good starting example for creating a voice assistant.

The text was updated successfully, but these errors were encountered:

ggerganov · 2022-11-25T18:19:21Z

This is now fully functional:

command-0.mp4

Code is in examples/command

Web version: examples/command.wasm

ggerganov · 2022-11-25T22:01:33Z

I think you haven't updated to latest master: git pull

StuartIanNaylor · 2022-11-25T22:05:56Z

Yep I must be fresh on your heels as it was a new install this morning and thought it was latest apols.

Works great

Same as the command-line tool "command", but runs in the browser Also, added helper script "extra/deploy-wasm.sh" and fixed some timing constants for the WASM examples.

ggerganov added the ideas Interesting ideas for experimentation label Nov 23, 2022

ggerganov changed the title ~~Voice assistant example - the command tool~~ Voice assistant example - the "command" tool Nov 23, 2022

ggerganov added a commit that referenced this issue Nov 25, 2022

examples : add "command" tool (#171)

bc88eb1

ggerganov mentioned this issue Nov 25, 2022

is it possible to run openai-whisper ggml model on raspberry pi hardware? #7

Closed

This comment was marked as resolved.

Sign in to view

Repository owner locked and limited conversation to collaborators Nov 27, 2022

ggerganov converted this issue into discussion #190 Nov 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Voice assistant example - the "command" tool #171

Voice assistant example - the "command" tool #171

ggerganov commented Nov 23, 2022 •

edited

Loading

ggerganov commented Nov 25, 2022 •

edited

Loading

This comment was marked as resolved.

ggerganov commented Nov 25, 2022

StuartIanNaylor commented Nov 25, 2022 •

edited

Loading

This issue was moved to a discussion.

This issue was moved to a discussion.

Voice assistant example - the "command" tool #171

Voice assistant example - the "command" tool #171

Comments

ggerganov commented Nov 23, 2022 • edited Loading

ggerganov commented Nov 25, 2022 • edited Loading

This comment was marked as resolved.

ggerganov commented Nov 25, 2022

StuartIanNaylor commented Nov 25, 2022 • edited Loading

This issue was moved to a discussion.

ggerganov commented Nov 23, 2022 •

edited

Loading

ggerganov commented Nov 25, 2022 •

edited

Loading

StuartIanNaylor commented Nov 25, 2022 •

edited

Loading