This document provides an overview of the performance achieved on key datasets and tasks supported by SpeechBrain.
Model | Checkpoints | HuggingFace | Test-CER |
---|---|---|---|
recipes/AISHELL-1/ASR/CTC/hparams/train_with_wav2vec.yaml |
here | here | 5.06 |
recipes/AISHELL-1/ASR/seq2seq/hparams/train.yaml |
here | - | 7.51 |
recipes/AISHELL-1/ASR/transformer/hparams/train_ASR_transformer.yaml |
here | here | 6.04 |
recipes/AISHELL-1/ASR/transformer/hparams/train_ASR_transformer_with_wav2vect.yaml |
here | here | 5.58 |
Model | Checkpoints | HuggingFace | SI-SNRi |
---|---|---|---|
recipes/Aishell1Mix/separation/hparams/sepformer-aishell1mix2.yaml |
here | - | 13.4dB |
recipes/Aishell1Mix/separation/hparams/sepformer-aishell1mix3.yaml |
here | - | 11.2dB |
Model | Checkpoints | HuggingFace | SI-SNRi |
---|---|---|---|
recipes/BinauralWSJ0Mix/separation/hparams/convtasnet-cross.yaml |
here | - | 12.39dB |
recipes/BinauralWSJ0Mix/separation/hparams/convtasnet-independent.yaml |
here | - | 11.90dB |
recipes/BinauralWSJ0Mix/separation/hparams/convtasnet-parallel-noise.yaml |
here | - | 18.25dB |
recipes/BinauralWSJ0Mix/separation/hparams/convtasnet-parallel-reverb.yaml |
here | - | 6.95dB |
recipes/BinauralWSJ0Mix/separation/hparams/convtasnet-parallel.yaml |
here | - | 16.93dB |
Model | Checkpoints | HuggingFace | Test-sacrebleu |
---|---|---|---|
recipes/CVSS/S2ST/hparams/train_fr-en.yaml |
here | here | 24.47 |
Model | Checkpoints | HuggingFace | Error |
---|---|---|---|
recipes/CommonLanguage/lang_id/hparams/train_ecapa_tdnn.yaml |
here | here | 15.1% |
Model | Checkpoints | HuggingFace | Test-WER |
---|---|---|---|
recipes/CommonVoice/ASR/seq2seq/hparams/train_de.yaml |
here | here | 12.25% |
recipes/CommonVoice/ASR/seq2seq/hparams/train_en.yaml |
here | here | 23.88% |
recipes/CommonVoice/ASR/seq2seq/hparams/train_fr.yaml |
here | here | 14.88% |
recipes/CommonVoice/ASR/seq2seq/hparams/train_it.yaml |
here | here | 17.02% |
recipes/CommonVoice/ASR/seq2seq/hparams/train_rw.yaml |
here | here | 29.22% |
recipes/CommonVoice/ASR/seq2seq/hparams/train_es.yaml |
here | here | 14.77% |
Model | Checkpoints | HuggingFace | Test-WER |
---|---|---|---|
recipes/CommonVoice/ASR/transformer/hparams/train_hf_whisper.yaml |
- | - | 16.96% |
Model | Checkpoints | HuggingFace | valid-PESQ | test-SIG | test-BAK | test-OVRL |
---|---|---|---|---|---|---|
recipes/DNS/enhancement/hparams/sepformer-dns-16k.yaml |
here | here | 2.06 | 2.999 | 3.076 | 2.437 |
Model | Checkpoints | HuggingFace | Test-WER |
---|---|---|---|
recipes/DVoice/ASR/CTC/hparams/train_amh_with_wav2vec.yaml |
here | here | 24.92% |
recipes/DVoice/ASR/CTC/hparams/train_dar_with_wav2vec.yaml |
here | here | 18.28% |
recipes/DVoice/ASR/CTC/hparams/train_fon_with_wav2vec.yaml |
here | here | 9.00% |
recipes/DVoice/ASR/CTC/hparams/train_sw_with_wav2vec.yaml |
here | here | 23.16% |
recipes/DVoice/ASR/CTC/hparams/train_wol_with_wav2vec.yaml |
here | here | 16.05% |
Model | Checkpoints | HuggingFace | WER-Darija | WER-Swahili | WER-Fongbe | Fongbe-Wolof | WER-Amharic |
---|---|---|---|---|---|---|---|
recipes/DVoice/ASR/CTC/hparams/train_multi_with_wav2vec.yaml |
here | - | 13.27% | 29.31% | 10.26% | 21.54% | 31.15% |
Model | Checkpoints | HuggingFace | Accuracy |
---|---|---|---|
recipes/ESC50/classification/hparams/cnn14.yaml |
here | - | 82% |
recipes/ESC50/classification/hparams/conv2d.yaml |
here | - | 75% |
Model | Checkpoints | HuggingFace | Test-sacrebleu |
---|---|---|---|
recipes/Fisher-Callhome-Spanish/ST/transformer/hparams/transformer.yaml |
here | - | 47.31 |
recipes/Fisher-Callhome-Spanish/ST/transformer/hparams/conformer.yaml |
here | - | 48.04 |
Model | Checkpoints | HuggingFace | Test-accuracy |
---|---|---|---|
recipes/Google-speech-commands/hparams/xvect.yaml |
here | here | 97.43% |
recipes/Google-speech-commands/hparams/xvect_leaf.yaml |
here | - | 96.79% |
Model | Checkpoints | HuggingFace | Test-Accuracy |
---|---|---|---|
recipes/IEMOCAP/emotion_recognition/hparams/train_with_wav2vec2.yaml |
here | here | 65.7% |
recipes/IEMOCAP/emotion_recognition/hparams/train.yaml |
here | - | 77.0% |
Model | Checkpoints | HuggingFace | clean-WER | others-WER |
---|---|---|---|---|
recipes/KsponSpeech/ASR/transformer/hparams/conformer_medium.yaml |
here | here | 20.78% | 25.73% |
Model | Checkpoints | HuggingFace | SI-SNR |
---|---|---|---|
recipes/LibriMix/separation/hparams/sepformer-libri2mix.yaml |
here | - | 20.4dB |
recipes/LibriMix/separation/hparams/sepformer-libri3mix.yaml |
here | - | 19.0dB |
Model | Checkpoints | HuggingFace | Test-Precision | Recall | F-Score |
---|---|---|---|---|---|
recipes/LibriParty/VAD/hparams/train.yaml |
here | here | 0.9518 | 0.9437 | 0.9477 |
Model | Checkpoints | HuggingFace | Test_clean-WER | Test_other-WER |
---|---|---|---|---|
recipes/LibriSpeech/ASR/transformer/hparams/conformer_small.yaml |
here | here | 2.49% | 6.10% |
recipes/LibriSpeech/ASR/transformer/hparams/transformer.yaml |
here | here | 2.27% | 5.53% |
recipes/LibriSpeech/ASR/transformer/hparams/conformer_large.yaml |
here | - | 2.01% | 4.52% |
recipes/LibriSpeech/ASR/transformer/hparams/branchformer_large.yaml |
here | - | 2.04% | 4.12% |
recipes/LibriSpeech/ASR/transformer/hparams/hyperconformer_22M.yaml |
here | - | 2.23% | 4.54% |
recipes/LibriSpeech/ASR/transformer/hparams/hyperconformer_8M.yaml |
here | - | 2.55% | 6.61% |
recipes/LibriSpeech/ASR/transformer/hparams/hyperbranchformer_25M.yaml |
- | - | 2.36% | 6.89% |
recipes/LibriSpeech/ASR/transformer/hparams/hyperbranchformer_13M.yaml |
- | - | 2.54% | 6.58% |
recipes/LibriSpeech/ASR/transformer/hparams/train_hf_whisper.yaml |
- | - | ||
recipes/LibriSpeech/ASR/transformer/hparams/bayesspeech.yaml |
here | - | 2.84% | 6.27% |
Model | Checkpoints | HuggingFace | Test_clean-WER | Test_other-WER |
---|---|---|---|---|
recipes/LibriSpeech/ASR/transducer/hparams/conformer_transducer.yaml |
here | - | 2.72% | 6.47% |
Model | Checkpoints | HuggingFace | Test_clean-WER | Test_other-WER |
---|---|---|---|---|
recipes/LibriSpeech/ASR/CTC/hparams/train_hf_wav2vec.yaml |
here | here | 1.65% | 3.67% |
recipes/LibriSpeech/ASR/CTC/hparams/train_hf_wav2vec_transformer_rescoring.yaml |
here | - | 1.57% | 3.37% |
Model | Checkpoints | HuggingFace | PER-Test |
---|---|---|---|
recipes/LibriSpeech/G2P/hparams/hparams_g2p_rnn.yaml |
here | - | 2.72% |
recipes/LibriSpeech/G2P/hparams/hparams_g2p_transformer.yaml |
here | here | 2.89% |
Model | Checkpoints | HuggingFace | Test_clean-WER | Test_other-WER |
---|---|---|---|---|
recipes/LibriSpeech/ASR/seq2seq/hparams/train_BPE_5000.yaml |
here | here | 2.89% | 8.09% |
Model | Checkpoints | HuggingFace | Test-ChER | Test-CER |
---|---|---|---|---|
recipes/MEDIA/ASR/CTC/hparams/train_hf_wav2vec.yaml |
- | here | 7.78% | 4.78% |
Model | Checkpoints | HuggingFace | Test-ChER | Test-CER | Test-CVER |
---|---|---|---|---|---|
recipes/MEDIA/SLU/CTC/hparams/train_hf_wav2vec_full.yaml |
- | here | 7.46% | 20.10% | 31.41% |
recipes/MEDIA/SLU/CTC/hparams/train_hf_wav2vec_relax.yaml |
- | here | 7.78% | 24.88% | 35.77% |
Model | Checkpoints | HuggingFace | Test-PPL | Test_BLEU-4 |
---|---|---|---|---|
recipes/MultiWOZ/response_generation/gpt/hparams/train_gpt.yaml |
here | here | 4.01 | 2.54e-04 |
recipes/MultiWOZ/response_generation/llama2/hparams/train_llama2.yaml |
here | here | 2.90 | 7.45e-04 |
Model | Checkpoints | HuggingFace | L1-Error |
---|---|---|---|
recipes/REAL-M/sisnr-estimation/hparams/pool_sisnrestimator.yaml |
here | here | 1.71dB |
Model | Checkpoints | HuggingFace | SISNRi | SDRi | PESQ | STOI | WER |
---|---|---|---|---|---|---|---|
recipes/RescueSpeech/ASR/noise-robust/hparams/robust_asr_16k.yaml |
here | here | 7.482 | 8.011 | 2.083 | 0.854 | 45.29% |
Model | Checkpoints | HuggingFace | scenario-accuracy | action-accuracy | intent-accuracy |
---|---|---|---|---|---|
recipes/SLURP/NLU/hparams/train.yaml |
here | - | 90.81% | 88.29% | 87.28% |
recipes/SLURP/direct/hparams/train.yaml |
here | - | 81.73% | 77.11% | 75.05% |
recipes/SLURP/direct/hparams/train_with_wav2vec2.yaml |
here | here | 91.24% | 88.47% | 87.55% |
Model | Checkpoints | HuggingFace | Swbd-WER | Callhome-WER | Eval2000-WER |
---|---|---|---|---|---|
recipes/Switchboard/ASR/CTC/hparams/train_with_wav2vec.yaml |
- | here | 8.76% | 14.67% | 11.78% |
recipes/Switchboard/ASR/seq2seq/hparams/train_BPE_2000.yaml |
- | here | 16.90% | 25.12% | 20.71% |
recipes/Switchboard/ASR/transformer/hparams/transformer.yaml |
- | here | 9.80% | 17.89% | 13.94% |
Model | Checkpoints | HuggingFace | Test-PER |
---|---|---|---|
recipes/TIMIT/ASR/CTC/hparams/train.yaml |
here | - | 14.78% |
recipes/TIMIT/ASR/seq2seq/hparams/train.yaml |
here | - | 14.07% |
recipes/TIMIT/ASR/seq2seq/hparams/train_with_wav2vec2.yaml |
here | - | 8.04% |
recipes/TIMIT/ASR/transducer/hparams/train.yaml |
here | - | 14.12% |
recipes/TIMIT/ASR/transducer/hparams/train_wav2vec.yaml |
here | - | 8.91% |
Model | Checkpoints | HuggingFace | Test-WER_No_LM |
---|---|---|---|
recipes/Tedlium2/ASR/transformer/hparams/branchformer_large.yaml |
here | here | 8.11% |
Model | Checkpoints | HuggingFace | Accuracy |
---|---|---|---|
recipes/UrbanSound8k/SoundClassification/hparams/train_ecapa_tdnn.yaml |
here | here | 75.4% |
Model | Checkpoints | HuggingFace | PESQ |
---|---|---|---|
recipes/Voicebank/dereverb/MetricGAN-U/hparams/train_dereverb.yaml |
here | - | 2.07 |
recipes/Voicebank/dereverb/spectral_mask/hparams/train.yaml |
here | - | 2.35 |
Model | Checkpoints | HuggingFace | Test-PER |
---|---|---|---|
recipes/Voicebank/ASR/CTC/hparams/train.yaml |
here | - | 10.12% |
Model | Checkpoints | HuggingFace | PESQ | COVL | test-WER |
---|---|---|---|---|---|
recipes/Voicebank/MTL/ASR_enhance/hparams/robust_asr.yaml |
here | here | 3.05 | 3.74 | 2.80 |
Model | Checkpoints | HuggingFace | PESQ |
---|---|---|---|
recipes/Voicebank/enhance/MetricGAN/hparams/train.yaml |
here | here | 3.15 |
recipes/Voicebank/enhance/SEGAN/hparams/train.yaml |
here | - | 2.38 |
recipes/Voicebank/enhance/spectral_mask/hparams/train.yaml |
here | - | 2.65 |
Model | Checkpoints | HuggingFace | EER |
---|---|---|---|
recipes/VoxCeleb/SpeakerRec/hparams/train_ecapa_tdnn.yaml |
here | here | 0.80% |
recipes/VoxCeleb/SpeakerRec/hparams/train_x_vectors.yaml |
here | here | 3.23% |
recipes/VoxCeleb/SpeakerRec/hparams/train_resnet.yaml |
here | here | 0.95% |
Model | Checkpoints | HuggingFace | Accuracy |
---|---|---|---|
recipes/VoxLingua107/lang_id/hparams/train_ecapa.yaml |
here | here | 93.3% |
Model | Checkpoints | HuggingFace | SI-SNR |
---|---|---|---|
recipes/WHAMandWHAMR/separation/hparams/sepformer-wham.yaml |
here | here | 16.5 |
recipes/WHAMandWHAMR/separation/hparams/sepformer-whamr.yaml |
here | here | 14.0 |
Model | Checkpoints | HuggingFace | SI-SNR | PESQ |
---|---|---|---|---|
recipes/WHAMandWHAMR/enhancement/hparams/sepformer-wham.yaml |
here | here | 14.4 | 3.05 |
recipes/WHAMandWHAMR/enhancement/hparams/sepformer-whamr.yaml |
here | here | 10.6 | 2.84 |
Model | Checkpoints | HuggingFace | SI-SNRi |
---|---|---|---|
recipes/WSJ0Mix/separation/hparams/convtasnet.yaml |
here | - | 14.8dB |
recipes/WSJ0Mix/separation/hparams/dprnn.yaml |
here | - | 18.5dB |
recipes/WSJ0Mix/separation/hparams/resepformer.yaml |
here | here | 18.6dB |
recipes/WSJ0Mix/separation/hparams/sepformer.yaml |
here | here | 22.4dB |
recipes/WSJ0Mix/separation/hparams/skim.yaml |
here | here | 18.1dB |
Model | Checkpoints | HuggingFace | EDER |
---|---|---|---|
recipes/ZaionEmotionDataset/emotion_diarization/hparams/train.yaml |
here | here | 30.2% |
Model | Checkpoints | HuggingFace | Test-accuracy |
---|---|---|---|
recipes/fluent-speech-commands/direct/hparams/train.yaml |
here | - | 99.60% |
Model | Checkpoints | HuggingFace | Accuracy-Test_real |
---|---|---|---|
recipes/timers-and-such/decoupled/hparams/train_TAS_LM.yaml |
here | - | 46.8% |
recipes/timers-and-such/direct/hparams/train.yaml |
here | here | 77.5% |
recipes/timers-and-such/direct/hparams/train_with_wav2vec2.yaml |
here | - | 94.0% |
recipes/timers-and-such/multistage/hparams/train_TAS_LM.yaml |
here | - | 72.6% |