-
Notifications
You must be signed in to change notification settings - Fork 54
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit 9618c49
Showing
50 changed files
with
2,517 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,87 @@ | ||
### https://raw.github.com/github/gitignore/f57304e9762876ae4c9b02867ed0cb887316387e/Python.gitignore | ||
|
||
# Byte-compiled / optimized / DLL files | ||
__pycache__/ | ||
*.py[cod] | ||
*$py.class | ||
|
||
# C extensions | ||
*.so | ||
|
||
# Distribution / packaging | ||
.Python | ||
env/ | ||
build/ | ||
develop-eggs/ | ||
dist/ | ||
downloads/ | ||
eggs/ | ||
.eggs/ | ||
lib/ | ||
lib64/ | ||
parts/ | ||
sdist/ | ||
var/ | ||
wheels/ | ||
*.egg-info/ | ||
.installed.cfg | ||
*.egg | ||
|
||
# PyInstaller | ||
# Usually these files are written by a python script from a template | ||
# before PyInstaller builds the exe, so as to inject date/other infos into it. | ||
*.manifest | ||
*.spec | ||
|
||
# Installer logs | ||
pip-log.txt | ||
pip-delete-this-directory.txt | ||
|
||
# Unit test / coverage reports | ||
htmlcov/ | ||
.tox/ | ||
.coverage | ||
.coverage.* | ||
.cache | ||
nosetests.xml | ||
coverage.xml | ||
*,cover | ||
.hypothesis/ | ||
|
||
# Jupyter Notebook | ||
.ipynb_checkpoints | ||
|
||
# pyenv | ||
.python-version | ||
|
||
# celery beat schedule file | ||
celerybeat-schedule | ||
|
||
# SageMath parsed files | ||
*.sage.py | ||
|
||
# dotenv | ||
.env | ||
|
||
# virtualenv | ||
.venv | ||
venv/ | ||
ENV/ | ||
|
||
# Spyder project settings | ||
.spyderproject | ||
|
||
# Rope project settings | ||
.ropeproject | ||
|
||
# mkdocs documentation | ||
/site | ||
|
||
/.idea/ | ||
|
||
/checkpoints/ | ||
.DS_Store | ||
|
||
XY_dev/ | ||
XY_train/ | ||
*.npy |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
# [How to do Real Time Trigger Word Detection with Keras](https://www.dlology.com/blog/how-to-do-real-time-trigger-word-detection-with-keras/). | ||
|
||
Trigger word detection, aka. wake/hot word detection. Like Amazon's "Alexa" or Google Home's "OK, Google" to wake them up. | ||
Will it be cool to build one yourself and run it in **Real-time**? | ||
|
||
In this post, I am going to show you exactly how to build a Keras model to do the same thing from scratch. No third party voice API or network connection required to make it functional. | ||
|
||
Background information is shown in my blog post. | ||
|
||
## How to Run | ||
Require [Python 3.5+](https://www.python.org/ftp/python/3.6.4/python-3.6.4.exe) and [Jupyter notebook](https://jupyter.readthedocs.io/en/latest/install.html) installed | ||
### Clone or download this repo | ||
``` | ||
git clone https://github.com/Tony607/Keras-Trigger-Word | ||
``` | ||
### Install required libraries | ||
`pip3 install -r requirements.txt` | ||
|
||
|
||
### Real-time demo | ||
|
||
In the project directory start a command line, then run command | ||
``` | ||
jupyter notebook | ||
``` | ||
If you are only interested in playing with the pre-trained trigger word model with real-time demo. | ||
In the opened browser window choose | ||
``` | ||
trigger_word_real_time_demo.ipynb | ||
``` | ||
|
||
Optionally if you want to learn about data preparation and model training. Continue on with my [write up](https://www.dlology.com/blog/how-to-do-real-time-trigger-word-detection-with-keras/). In the opened browser window choose this notebook. | ||
``` | ||
Trigger word detection - v1.ipynb | ||
``` | ||
Download the train/dev Data from the releases if you want to follow along the notebook, [Data.zip](https://github.com/Tony607/Keras-Trigger-Word/releases/download/V0.1/Data.zip). Extract | ||
`XY_dev` and `XY_train` folders to the root of the project directory. | ||
|
||
Happy coding! Leave a comment if you have any question. |
Large diffs are not rendered by default.
Oops, something went wrong.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
numpy | ||
keras | ||
h5py | ||
pydub | ||
scipy | ||
matplotlib |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
import matplotlib.pyplot as plt | ||
from scipy.io import wavfile | ||
import os | ||
from pydub import AudioSegment | ||
|
||
# Calculate and plot spectrogram for a wav audio file | ||
def graph_spectrogram(wav_file): | ||
rate, data = get_wav_info(wav_file) | ||
nfft = 200 # Length of each window segment | ||
fs = 8000 # Sampling frequencies | ||
noverlap = 120 # Overlap between windows | ||
nchannels = data.ndim | ||
if nchannels == 1: | ||
pxx, freqs, bins, im = plt.specgram(data, nfft, fs, noverlap = noverlap) | ||
elif nchannels == 2: | ||
pxx, freqs, bins, im = plt.specgram(data[:,0], nfft, fs, noverlap = noverlap) | ||
return pxx | ||
|
||
# Load a wav file | ||
def get_wav_info(wav_file): | ||
rate, data = wavfile.read(wav_file) | ||
return rate, data | ||
|
||
# Used to standardize volume of audio clip | ||
def match_target_amplitude(sound, target_dBFS): | ||
change_in_dBFS = target_dBFS - sound.dBFS | ||
return sound.apply_gain(change_in_dBFS) | ||
|
||
# Load raw audio files for speech synthesis | ||
def load_raw_audio(): | ||
activates = [] | ||
backgrounds = [] | ||
negatives = [] | ||
for filename in os.listdir("./raw_data/activates"): | ||
if filename.endswith("wav"): | ||
activate = AudioSegment.from_wav("./raw_data/activates/"+filename) | ||
activates.append(activate) | ||
for filename in os.listdir("./raw_data/backgrounds"): | ||
if filename.endswith("wav"): | ||
background = AudioSegment.from_wav("./raw_data/backgrounds/"+filename) | ||
backgrounds.append(background) | ||
for filename in os.listdir("./raw_data/negatives"): | ||
if filename.endswith("wav"): | ||
negative = AudioSegment.from_wav("./raw_data/negatives/"+filename) | ||
negatives.append(negative) | ||
return activates, negatives, backgrounds |
Binary file not shown.
Oops, something went wrong.
9618c49
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello Chengwei,
Thank you for sharing your experience on hotword triggering ! Did you manage to successfully train a model using your own dataset ? Did you use the approach explained in the notebook, which looks like this :
If so, what number of epochs/batch_size did you use to train it and did you change parameters for Adam (e.g. learning rate) ?
I'm asking these questions because I'm experimenting with training my own weights and not using the pre-trained model in the notebook. When I do, I get an "inverted" probability for the trigger word. The baseline will be around 0.3 with dips down towards 0 rather than the other way around.
I am using 4000 training examples synthesized with 50 activates and 200 negative utterance.
Thank you in advance for your response :)
Jeremy
9618c49
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi,
Nice to see an implementation which can help to develop a customised network, but I am not able to get it trained on customized data set regardless of how length of data set. Even the accuracy metric reaches to 0.93 and loss metric to 0.23. But the results are very strange that more existing network is trained more the worst it becomes. When tried from a new model, only I get is whenever there is some keyword, prediction fluctuates between 0.1-0.7 as long as word exist. There is something fishy in this model?
Can help be rendered, please?
Thanks