Skip to content

v1.13.0

Latest
Compare
Choose a tag to compare
@julianmack julianmack released this 15 Nov 16:58

Release notes

This release adds low-latency end-pointing to detect the end of utterance. It also caps the latency of the beam search finals at 1.25 seconds which significantly reduces both the finals' latency and the user-perceived tail latencies without impacting WER. Finally, this release also speeds up beam decoding on-GPU by up to 10x.

This release adds:

  • End-pointing (docs)
  • Capping of the delay between partials and finals via --beam_final_emission_thresh=1.25
  • A batched implementation of the on-GPU beam decoder
  • Support for training models in character based languages (tested in Mandarin). This required:
    • small tokenizer changes
    • support for calculating character error rate (CER) and mixture error rate (MER)

This release also:

  • Improves the scheduling of the delay penalty by waiting until the validation WER has dropped before this kicks in (docs)
  • Reduces the startup time at the beginning of training by adding a noise-data cache and speeding up both json parsing and tokenization
  • Deprecates the 49M param testing model configuration and makes the 85M param base model the default for training. See supported models
  • Improves the usability of the live demo client (docs)
  • Fixes the emission latency estimation for the beam decoder
  • Improves logging during training and evaluation
  • Filters utterances shorter than min_duration: 0.05s during training

Summary of changes to default args

  • --delay_penalty="linear_schedule" instead of "wer_schedule"
  • --val_batch_size=1024 instead of 256
  • --beam_final_emission_thresh=1.25 added to cap the finals' latency during beam decoding
  • YAML config: Adds min_duration: 0.05 seconds to filter out short utterances during training
  • YAML config: Adds error_rate: word which determines the error rate calculated and must be one of {wer|word, cer|char, mer|mixture}