Metric learning for user-defined KWS

This repository contains the official code for Metric learning for user-defined Keyword spotting. Our code is based on the code voxceleb trainer, which is implemented for the speaker recognition task.

Paper: METRIC LEARNING FOR USER-DEFINED KEYWORD SPOTTING

Project page

If you find our paper useful in your research, please use the following BibTeX entry for citation.

@inproceedings{jung2023metric,
  title={Metric Learning for User-Defined Keyword Spotting},
  author={Jung, Jaemin and Kim, Youkyum and Park, Jihwan and Lim, Youshin and Kim, Byeong-Yeol and Jang, Youngjoon and Chung, Joon Son},
  booktitle={ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={1--5},
  year={2023},
  organization={IEEE}
}

Data preparation

This code is implemented for mono (single-channel) audios in 16kHz. If you need to resample your audios, please utilize ffmpeg.

LibriSpeech Keywords

The dataset size is large, so we share the dataset creation code. (https://github.com/jjm5811/Keyword_FA.git)

Google Speech Commands

The Google Speech Commands datasets are used for these experiments. Follow the instructions on this page to download and prepare the data for training. We used Speech Commands v0.02 (35 keywords in total) for our baseline.

Customize your target keywords

Default target keywords are ['zero', 'one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine']. If you don't set the target keywords, default keywords are treated as the target keywords.

python trainKeywordNet.py --target_keys bed bird sheila ...

Train a new model

Dependencies

conda create --file requirements.txt -n [env_name] -c pytorch -c conda-forge

Training examples

Pre-training (Softmax, AM-Softmax)

python trainKeywordNet.py --save_path [save_path] --augment True --batch_size [batch_size] --trainfunc [trainfunc] --model [ResNet15, ResNet26]

Pre-training (Prototypical, Angular Prototypical)

python trainKeywordNet.py --save_path [save_path] --augment True --metric_batch_size [metric_batch_size] --batch_size 1 --trainfunc [trainfunc] --model [ResNet15, ResNet26]

Fine-tuning (Softmax, AM-Softmax)

python trainKeywordNet.py --fine_tuning --save_path [save_path] --augment True --batch_size [batch_size] --trainfunc [trainfunc] --model [ResNet15, ResNet26] --initial_model [model.pt] --lr 0.00001

Fine-tuning (Prototypical, Angular Prototypical)

python trainKeywordNet.py --fine_tuning --save_path [save_path] --augment True --metric_batch_size [metric_batch_size] --batch_size 1 --trainfunc [trainfunc] --model [ResNet15, ResNet26] --initial_model [model.pt] --lr 0.00001

Testing Examples

python trainKeywordNet.py --eval --save_path [save_path] --initial_model [initial_model] --enroll_num [1, 5, 10 (# of shots)] --enroll_list [/path/to/enroll_list.txt] --test_acc_list [/path/to/test_acc_list.txt]

Implemented loss functions

Softmax (softmax)
Additive Margin Softmax (amsoftmax)
Prototypical (proto)
Angular Prototypical (angleproto)

Implemented models

For the model, res15 from "deep residual learning for small-footprint keyword spotting", R. Tang & J. Lin, 2018, ICASSP is used. Code for the model is based on Honk: CNNs for Keyword Spotting.

ResNet15
ResNet26
ConvMixer

Name		Name	Last commit message	Last commit date
Latest commit History 102 Commits
PT_models		PT_models
loss		loss
models		models
optimizer		optimizer
pytorch_revgrad		pytorch_revgrad
scheduler		scheduler
train_test_lists		train_test_lists
.gitignore		.gitignore
ConfModel.py		ConfModel.py
DatasetLoader.py		DatasetLoader.py
DatasetLoader_classify.py		DatasetLoader_classify.py
KeywordNet.py		KeywordNet.py
LICENSE		LICENSE
README.md		README.md
accuracy.py		accuracy.py
requirements.txt		requirements.txt
rir.npy		rir.npy
run.sh		run.sh
trainKeywordNet.py		trainKeywordNet.py
tuneThreshold.py		tuneThreshold.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Metric learning for user-defined KWS

Data preparation

LibriSpeech Keywords

Google Speech Commands

Customize your target keywords

Train a new model

Dependencies

Training examples

Testing Examples

Implemented loss functions

Implemented models

About

Releases

Packages

Contributors 2

Languages

License

kaistmm/Metric-UD-KWS

Folders and files

Latest commit

History

Repository files navigation

Metric learning for user-defined KWS

Data preparation

LibriSpeech Keywords

Google Speech Commands

Customize your target keywords

Train a new model

Dependencies

Training examples

Testing Examples

Implemented loss functions

Implemented models

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages