Skip to content

igormq/speech2text

Repository files navigation

Speech2Text

Implementation of "An open-source end-to-end ASR system for Brazilian Portuguese using DNNs built from newly assembled corpora" by Igor Quintanilha, Luiz Wagner Pereira Biscainho, and Sergio Lima Netto. (submitted).

Requirements

  • pytorch >= 1.0.1
  • cudatoolkit >= 9.0
  • torchvision
  • torchaudio
  • ignite
  • pyyaml
  • wget
  • num2words
  • unidecode
  • editdistance
  • ctcdecode

Datasets

All datasets can be found here.

Acoustic models

AM Trained on Method WER Download
DeepSpeech 2 BRSD v2 Scratch 52.55% (2.42%) Link
DeepSpeech 2 BRSD v2 Fine-tuned 47.41% (1.73%) Link

Language models

Language model* RP Size LapsBM BRTD
word 3-gram 25 1.9G 173.79 161.29
word 5-gram 42 7.8G 136.50 135.12
char 5-gram 5 41M <=2,334.48 <=2,694.51
char 10-gram 10 4.7G <=271.86$ <=323.71
char 15-gram* 15 5.4G <=239.59$ <=198.49
char 20-gram* 20 8.8G <=227.84$ <=189.53

*All models were trained using KenLM. More detailed information in the paper.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published