Sugar

Efficient Speech Processing Tookit for Automatic Speaker Recognition

The authors' PyTorch implementation and pretrained models of EfficientTDNN.

17 June 2022: EfficientTDNN are published in IEEE/ACM Transactions on Audio, Speech, and Language Processing.

What's New

EfficientTDNN: Efficient Architecture Search for Speaker Recognition [arXiv] [IEEE/ACM TASLP]

Models and Checkpoints

Model	Training Dataset	Link
EfficientTDNN	VoxCeleb2	HuggingFace: mechanicalsea/efficient-tdnn

Requirements and Installation

PyTorch version >= 1.7.1
Python version >= 3.7.9
To install sugar:

git clone https://github.com/mechanicalsea/sugar.git
cd sugar
pip install --editable .

# [Option] Requrements Packages: numpy, scipy, pandas, matplotlib, pyyaml, thop, geatpy, torch, torchvision, torchaudio, scikit-learn

We provide pre-trained models for extracting speaker embeddings via huggingface.

Tutorials

EfficientTDNN

EfficientTDNN Training Scripts

An example of training the TDNN supernet is given as follows.

task=''            # training stage, e.g., supernet, supernet_kernel, supernet_kernel_depth, ...
cycle_step=34120   # a half cycle of learning rate scheduler, e.g., GPU 1: 68248, GPU 2: 34120, GPU 4: 17056
second=48000       # largest stage: 32000, other stages: 48000
initial_weights='' # required to these stages except for the largest stage
logdir=''
python scripts/train_vox2_veri.py --distributed --augment \
      --task supernet_width1 --epochs 64 --cycle-step ${cycle_step} --second ${second} \
      --initial-weights ${initial_weights} \
      --logdir ${logdir}

Load Pre-Trained Models for Inference

import torch
from sugar.models import WrappedModel
wav_input_16khz = torch.randn(1,10000).cuda()

repo_id = "mechanicalsea/efficient-tdnn"
supernet_filename = "depth/depth.torchparams"
subnet_filename = "depth/depth.ecapa-tdnn.3.512.512.512.512.5.3.3.3.1536.bn.tar"
subnet, info = WrappedModel.from_pretrained(repo_id=repo_id, supernet_filename=supernet_filename, subnet_filename=subnet_filename)
subnet = subnet.cuda()
subnet = subnet.eval()

embedding = subnet(wav_input_16khz)

Citing EfficientTDNN

Please, cite EfficientTDNN if you use it for your research or business.

@article{wr-efficienttdnn-2022,
  author={Wang, Rui and Wei, Zhihua and Duan, Haoran and Ji, Shouling and Long, Yang and Hong, Zhen},
  journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing}, 
  title={EfficientTDNN: Efficient Architecture Search for Speaker Recognition}, 
  year={2022},
  volume={30},
  number={},
  pages={2267-2279},
  doi={10.1109/TASLP.2022.3182856}}

Contact Information

For help or issues using EfficientTDNN models, please submit a GitHub issue.

For other communications related to EfficientTDNN, please contact Rui Wang (rwang@tongji.edu.cn).

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
scripts		scripts
sugar		sugar
tutorials/EfficientTDNN		tutorials/EfficientTDNN
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sugar

What's New

Models and Checkpoints

Requirements and Installation

Tutorials

EfficientTDNN Training Scripts

Load Pre-Trained Models for Inference

Citing EfficientTDNN

Contact Information

About

Releases

Packages

Languages

License

mechanicalsea/sugar

Folders and files

Latest commit

History

Repository files navigation

Sugar

What's New

Models and Checkpoints

Requirements and Installation

Tutorials

EfficientTDNN Training Scripts

Load Pre-Trained Models for Inference

Citing EfficientTDNN

Contact Information

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages