Skip to content

Commit

Permalink
init
Browse files Browse the repository at this point in the history
  • Loading branch information
Tony607 committed Mar 2, 2018
0 parents commit 9618c49
Show file tree
Hide file tree
Showing 50 changed files with 2,517 additions and 0 deletions.
87 changes: 87 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
### https://raw.github.com/github/gitignore/f57304e9762876ae4c9b02867ed0cb887316387e/Python.gitignore

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
env/
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*,cover
.hypothesis/

# Jupyter Notebook
.ipynb_checkpoints

# pyenv
.python-version

# celery beat schedule file
celerybeat-schedule

# SageMath parsed files
*.sage.py

# dotenv
.env

# virtualenv
.venv
venv/
ENV/

# Spyder project settings
.spyderproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

/.idea/

/checkpoints/
.DS_Store

XY_dev/
XY_train/
*.npy
39 changes: 39 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# [How to do Real Time Trigger Word Detection with Keras](https://www.dlology.com/blog/how-to-do-real-time-trigger-word-detection-with-keras/).

Trigger word detection, aka. wake/hot word detection. Like Amazon's "Alexa" or Google Home's "OK, Google" to wake them up.
Will it be cool to build one yourself and run it in **Real-time**?

In this post, I am going to show you exactly how to build a Keras model to do the same thing from scratch. No third party voice API or network connection required to make it functional.

Background information is shown in my blog post.

## How to Run
Require [Python 3.5+](https://www.python.org/ftp/python/3.6.4/python-3.6.4.exe) and [Jupyter notebook](https://jupyter.readthedocs.io/en/latest/install.html) installed
### Clone or download this repo
```
git clone https://github.com/Tony607/Keras-Trigger-Word
```
### Install required libraries
`pip3 install -r requirements.txt`


### Real-time demo

In the project directory start a command line, then run command
```
jupyter notebook
```
If you are only interested in playing with the pre-trained trigger word model with real-time demo.
In the opened browser window choose
```
trigger_word_real_time_demo.ipynb
```

Optionally if you want to learn about data preparation and model training. Continue on with my [write up](https://www.dlology.com/blog/how-to-do-real-time-trigger-word-detection-with-keras/). In the opened browser window choose this notebook.
```
Trigger word detection - v1.ipynb
```
Download the train/dev Data from the releases if you want to follow along the notebook, [Data.zip](https://github.com/Tony607/Keras-Trigger-Word/releases/download/V0.1/Data.zip). Extract
`XY_dev` and `XY_train` folders to the root of the project directory.

Happy coding! Leave a comment if you have any question.
1,797 changes: 1,797 additions & 0 deletions Trigger word detection - v1.ipynb

Large diffs are not rendered by default.

Binary file added audio_examples/chime.wav
Binary file not shown.
Binary file added audio_examples/example_train.wav
Binary file not shown.
Binary file added audio_examples/insert_reference.wav
Binary file not shown.
Binary file added audio_examples/my_audio.wav
Binary file not shown.
Binary file added audio_examples/train_reference.wav
Binary file not shown.
Binary file added chime_output.wav
Binary file not shown.
Binary file added images/date_attention.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/date_attention2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/label_diagram.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/model_trigger.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/music_gen.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/ones_reference.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/poorly_trained_model.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/sound.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/spectrogram.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/train_label.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/train_reference.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/woebot.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added insert_test.wav
Binary file not shown.
Binary file added models/tr_model.h5
Binary file not shown.
Binary file added raw_data/activates/1.wav
Binary file not shown.
Binary file added raw_data/activates/1_act2.wav
Binary file not shown.
Binary file added raw_data/activates/1_act3.wav
Binary file not shown.
Binary file added raw_data/activates/2.wav
Binary file not shown.
Binary file added raw_data/activates/2_act2.wav
Binary file not shown.
Binary file added raw_data/activates/3.wav
Binary file not shown.
Binary file added raw_data/activates/3_act2.wav
Binary file not shown.
Binary file added raw_data/activates/3_act3.wav
Binary file not shown.
Binary file added raw_data/activates/4_act2.wav
Binary file not shown.
Binary file added raw_data/backgrounds/1.wav
Binary file not shown.
Binary file added raw_data/backgrounds/2.wav
Binary file not shown.
Binary file added raw_data/dev/1.wav
Binary file not shown.
Binary file added raw_data/dev/2.wav
Binary file not shown.
Binary file added raw_data/negatives/1.wav
Binary file not shown.
Binary file added raw_data/negatives/1_0.wav
Binary file not shown.
Binary file added raw_data/negatives/2.wav
Binary file not shown.
Binary file added raw_data/negatives/2_1.wav
Binary file not shown.
Binary file added raw_data/negatives/3.wav
Binary file not shown.
Binary file added raw_data/negatives/3_2.wav
Binary file not shown.
Binary file added raw_data/negatives/4.wav
Binary file not shown.
Binary file added raw_data/negatives/4_0.wav
Binary file not shown.
Binary file added raw_data/negatives/5.wav
Binary file not shown.
Binary file added raw_data/negatives/5_1.wav
Binary file not shown.
6 changes: 6 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
numpy
keras
h5py
pydub
scipy
matplotlib
46 changes: 46 additions & 0 deletions td_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
import matplotlib.pyplot as plt
from scipy.io import wavfile
import os
from pydub import AudioSegment

# Calculate and plot spectrogram for a wav audio file
def graph_spectrogram(wav_file):
rate, data = get_wav_info(wav_file)
nfft = 200 # Length of each window segment
fs = 8000 # Sampling frequencies
noverlap = 120 # Overlap between windows
nchannels = data.ndim
if nchannels == 1:
pxx, freqs, bins, im = plt.specgram(data, nfft, fs, noverlap = noverlap)
elif nchannels == 2:
pxx, freqs, bins, im = plt.specgram(data[:,0], nfft, fs, noverlap = noverlap)
return pxx

# Load a wav file
def get_wav_info(wav_file):
rate, data = wavfile.read(wav_file)
return rate, data

# Used to standardize volume of audio clip
def match_target_amplitude(sound, target_dBFS):
change_in_dBFS = target_dBFS - sound.dBFS
return sound.apply_gain(change_in_dBFS)

# Load raw audio files for speech synthesis
def load_raw_audio():
activates = []
backgrounds = []
negatives = []
for filename in os.listdir("./raw_data/activates"):
if filename.endswith("wav"):
activate = AudioSegment.from_wav("./raw_data/activates/"+filename)
activates.append(activate)
for filename in os.listdir("./raw_data/backgrounds"):
if filename.endswith("wav"):
background = AudioSegment.from_wav("./raw_data/backgrounds/"+filename)
backgrounds.append(background)
for filename in os.listdir("./raw_data/negatives"):
if filename.endswith("wav"):
negative = AudioSegment.from_wav("./raw_data/negatives/"+filename)
negatives.append(negative)
return activates, negatives, backgrounds
Binary file added train.wav
Binary file not shown.
Loading

2 comments on commit 9618c49

@jeremydub
Copy link

@jeremydub jeremydub commented on 9618c49 May 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello Chengwei,
Thank you for sharing your experience on hotword triggering ! Did you manage to successfully train a model using your own dataset ? Did you use the approach explained in the notebook, which looks like this :

model = create_model(input_shape=(Tx, n_freq))
opt = Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, decay=0.01)
model.compile(loss='binary_crossentropy', optimizer=opt, metrics=["accuracy"])
# .. creating X, Y 
model.fit(X, Y, epochs=1, batch_size=5)

If so, what number of epochs/batch_size did you use to train it and did you change parameters for Adam (e.g. learning rate) ?

I'm asking these questions because I'm experimenting with training my own weights and not using the pre-trained model in the notebook. When I do, I get an "inverted" probability for the trigger word. The baseline will be around 0.3 with dips down towards 0 rather than the other way around.

I am using 4000 training examples synthesized with 50 activates and 200 negative utterance.

Thank you in advance for your response :)

Jeremy

@aras03qorvo
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,
Nice to see an implementation which can help to develop a customised network, but I am not able to get it trained on customized data set regardless of how length of data set. Even the accuracy metric reaches to 0.93 and loss metric to 0.23. But the results are very strange that more existing network is trained more the worst it becomes. When tried from a new model, only I get is whenever there is some keyword, prediction fluctuates between 0.1-0.7 as long as word exist. There is something fishy in this model?
Can help be rendered, please?
Thanks

Please sign in to comment.