Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Voice assistant example - the "command" tool #171

Closed
ggerganov opened this issue Nov 23, 2022 · 4 comments
Closed

Voice assistant example - the "command" tool #171

ggerganov opened this issue Nov 23, 2022 · 4 comments
Labels
ideas Interesting ideas for experimentation

Comments

@ggerganov
Copy link
Owner

ggerganov commented Nov 23, 2022

There seems to be significant interest for a voice assistant application of Whisper, similar to "Ok, Google", "Hey Siri", "Alexa", etc. The existing stream tool is not very applicable for this use case, because the voice assistant commands are usually short (i.e. play some music, turn on the TV, kill all humans, feed the baby, etc), while stream expects a continuous stream of speech.

Therefore, implement a basic command-line tool called command that does the following:

  • Upon start, asks the person to say a "key phrase". The phrase should be an average sentence that normally takes 2-3 seconds to pronounce. We want to have enough "training" data of the person's voice
  • If the transcribed text matches the expected phrase, then we "remember" this audio and use it later. Else, we ask to say it again until we succeed
  • We start listening continuously for voice activity using my VAD detector that I implemented for talk.wasm - I think it works very well given it's simplicity
  • When we detect speech, we prepend the recorded key-phrase to the last 2-3 seconds of the live audio and transcribe
  • The result should be: [key phrase][command], so by knowing the key phrase we can extract only the [command]

This should work in Web and Raspberry Pi and thanks to the VAD, it will be energy efficient.
Should be a good starting example for creating a voice assistant.

@ggerganov ggerganov added the ideas Interesting ideas for experimentation label Nov 23, 2022
@ggerganov ggerganov changed the title Voice assistant example - the command tool Voice assistant example - the "command" tool Nov 23, 2022
@ggerganov
Copy link
Owner Author

ggerganov commented Nov 25, 2022

This is now fully functional:

command-0.mp4

Code is in examples/command

Web version: examples/command.wasm

@StuartIanNaylor

This comment was marked as resolved.

@ggerganov
Copy link
Owner Author

I think you haven't updated to latest master: git pull

@StuartIanNaylor
Copy link

StuartIanNaylor commented Nov 25, 2022

Yep I must be fresh on your heels as it was a new install this morning and thought it was latest apols.

Works great

ggerganov added a commit that referenced this issue Nov 26, 2022
Same as the command-line tool "command", but runs in the browser

Also, added helper script "extra/deploy-wasm.sh" and fixed some timing
constants for the WASM examples.
Repository owner locked and limited conversation to collaborators Nov 27, 2022
@ggerganov ggerganov converted this issue into discussion #190 Nov 27, 2022

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
ideas Interesting ideas for experimentation
Projects
None yet
Development

No branches or pull requests

2 participants