Keyword spotting task for audio files using attention (KWS attention)

Config: Model described in Shan et al., 2018 (Attention-based End-to-End Models for Small-Footprint Keyword Spotting; https://arxiv.org/abs/1803.10916)

Model is ~142k parameters, but can be changed with parameters. Config in main.py:

BATCH_SIZE = 256        (size of batch for learning)
NUM_EPOCHS = 35         (number of epochs to train model)
N_MELS     = 40         (number of mels for melspectrogram)

IN_SIZE = 40            (size of input)
HIDDEN_SIZE = 64        (size of hidden representation in 
KERNEL_SIZE = (20, 5)   (size of kernel for convolution layer in CRNN)
STRIDE = (8, 2)         (size of stride for convolution layer in CRNN)
GRU_NUM_LAYERS = 2      (number of GRU layers in CRNN)
NUM_DIRS = 2            (number of directions in GRU (2 if bidirectional))
NUM_CLASSES = 2         (number of classes (2 for "no word" or "sheila is in audio")

Data for training can be downloaded here: http://download.tensorflow.org/data/speech_commands_v0.01.tar.gz (there is more info on downloading in the notebook)

For learning run "python main.py".

Code supports inferense as described in paper. Install requirements and use "python inference.py path/to/YOUR_AUDIO". It works even on cpu fast and easy. Script generates "path/to/YOUR_AUDIO.pdf" with graph of probabilities of word "Sheila" being said on audio.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
models		models
test_sounds		test_sounds
README.md		README.md
csv_background_noises.csv		csv_background_noises.csv
csv_labels_sheila.csv		csv_labels_sheila.csv
dataset.py		dataset.py
inference.py		inference.py
main.py		main.py
models.py		models.py
my_utils.py		my_utils.py
requirements.txt		requirements.txt
train_val.py		train_val.py
whole_code.ipynb		whole_code.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Keyword spotting task for audio files using attention (KWS attention)

About

Languages

Kirili4ik/kws-attention-pytorch

Folders and files

Latest commit

History

Repository files navigation

Keyword spotting task for audio files using attention (KWS attention)

About

Topics

Resources

Stars

Watchers

Forks

Languages