Small-footprint Keyword Spotting with Convolutional Neural Networks

This project focuses on implementing a small-footprint keyword spotting (KWS) engine using convolutional neural networks (CNNs). The goal is to detect specific keywords within speech utterances using machine learning techniques. The project utilizes a reference dataset released by Google called the "Speech Commands Dataset," which consists of 65,000 one-second-long utterances of 30 words collected from thousands of different people. The dataset is released under the Creative Commons 4.0 license.

The project explores different approaches for implementing the KWS engine, including LVCSR-based KWS, Phoneme Recognition-based KWS, and Word Recognition-based KWS. The CNN model proposed by Sainath et al. (2015) is utilized for word recognition, where features are obtained from raw audio data using 40-dimensional log Mel filterbanks coefficients.

Reference Papers

[Sainath15] Tara N. Sainath, Carolina Parada, "Convolutional Neural Networks for Small-footprint Keyword Spotting," INTERSPEECH, Dresden, Germany, September 2015.
[Warden18] Pete Warden, "Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition," arXiv:1804.03209, April 2018.

Dataset Description

The reference dataset used for small-footprint keyword spotting is the "Speech Commands Dataset." It was released in August 2017 and contains 65,000 one-second-long utterances of 30 words. The dataset is collected by AYI and released under the Creative Commons 4.0 license. Additional information about the dataset can be found in the Google blog post: Speech Commands Dataset.

The speech dataset can be downloaded from the following link: Speech Commands Dataset (2.11 GB uncompressed)

Examples of the spectrograms and possible data augmentation tecniques follow:

Project Developments

The project offers several possible developments and experiments, including:

Experimenting with different audio features and coefficients.
Designing custom Mel filterbanks.
Implementing standard/deep CNN architectures with techniques like dropout and regularization.
Investigating recent/new artificial neural network (ANN) architectures, such as autoencoder-based models, attention mechanisms, and inception-based CNN networks.
Conducting a comparison of different architectures based on memory usage and accuracy.

We implement the following pipeline:

Models

The following table shows the parameters of the main architectures used as purely CNN-based models:

Results

Results are shown in the two tables below:

Useful Resources

Recent developments and resources related to keyword spotting and speech recognition:

[Chorowski15] J. K. Chorowski, D. Bahdanau, D. Serdyuk, K. Cho, Y. Bengio, "Attention-Based Models for Speech Recognition," Conference on Neural Information and Processing Systems (NIPS), Montréal, Canada, 2015.
[Tang18] R. Tang and J. Lin, "Deep residual learning for small-footprint keyword spotting," IEEE ICASSP, Calgary, Alberta, Canada, 2018.
[Andrade18] D. C. de Andrade, S. Leo, M. L. D. S. Viana, and C. Bernkopf, "A neural attention model for speech command recognition," arXiv:1808.08929, 2018. PDF Link
White Paper: "Key-Word Spotting - The Base Technology for Speech Analytics" PDF Link

Name		Name	Last commit message	Last commit date
Latest commit History 118 Commits
Bibliography		Bibliography
application		application
figures		figures
sequential_model		sequential_model
utils		utils
.gitignore		.gitignore
README.md		README.md
RIFF_check.py		RIFF_check.py
ViT.py		ViT.py
dataset.py		dataset.py
download_data.py		download_data.py
features_extraction.ipynb		features_extraction.ipynb
model.py		model.py
models.py		models.py
plot_model.ipynb		plot_model.ipynb
requirements.txt		requirements.txt
test.ipynb		test.ipynb
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Small-footprint Keyword Spotting with Convolutional Neural Networks

Reference Papers

Dataset Description

Project Developments

Models

Results

Useful Resources

About

Releases

Packages

Contributors 2

Languages

aidinattar/AudioKWS

Folders and files

Latest commit

History

Repository files navigation

Small-footprint Keyword Spotting with Convolutional Neural Networks

Reference Papers

Dataset Description

Project Developments

Models

Results

Useful Resources

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages