WeSpeaker 1.0.0

JiJiJiang released this 12 Nov 10:59

· 166 commits to master since this release

296449a

Highlight

Competitive results: compared with SpeechBrain, ASV-Subtools, etc
Light-weight: clean and simple codes, no Kaldi dependency
Unified IO (UIO): designed for large-scale training data
On-the-fly feature preparation: provide different data augmentation methods
Distributed training: adopted for multi-node multi-GPU scalability
Production ready: support TensorRT or ONNX exporting format, with a triton inference server demo
Pre-trained models: provide the python bindings, and a Hugging face interactive demo on speaker verification

Overall Structure

Recipes

We provide three well-structured recipes:

Speaker Verification: VoxCeleb an CNCeleb (SOTA results)
Speaker Diarization: VoxConverse (An example of using pre-trained speaker model)

Support List

SOTA Models: TDNN-based x-vector, ResNet-based r-vector, and ECAPA_TDNN
Pooling Functions: statistics-based TAP/TSDP/TSTP, and attention-based ASTP
Criteria: standard Softmax, and margin-based A-/AM-/AAM-Softmax
Scoring: Cosine, PLDA, and Score Normalization (AS-Norm)
Metric: EER, minDCF (DET curve), and DER
Online Augmentation: Resample, Noise && RIR, Speed Perturb, and SpecAug
Training strategies: Well-designed learning-rate and margin schedulers, Large margin fine-tuning

Assets 2