Release notes
This release adds low-latency end-pointing to detect the end of utterance. It also caps the latency of the beam search finals at 1.25 seconds which significantly reduces both the finals' latency and the user-perceived tail latencies without impacting WER. Finally, this release also speeds up beam decoding on-GPU by up to 10x.
This release adds:
- End-pointing (docs)
- Capping of the delay between partials and finals via
--beam_final_emission_thresh=1.25
- A batched implementation of the on-GPU beam decoder
- Support for training models in character based languages (tested in Mandarin). This required:
- small tokenizer changes
- support for calculating character error rate (CER) and mixture error rate (MER)
This release also:
- Improves the scheduling of the delay penalty by waiting until the validation WER has dropped before this kicks in (docs)
- Reduces the startup time at the beginning of training by adding a noise-data cache and speeding up both json parsing and tokenization
- Deprecates the 49M param
testing
model configuration and makes the 85M parambase
model the default for training. See supported models - Improves the usability of the live demo client (docs)
- Fixes the emission latency estimation for the beam decoder
- Improves logging during training and evaluation
- Filters utterances shorter than
min_duration: 0.05
s during training
Summary of changes to default args
--delay_penalty="linear_schedule"
instead of"wer_schedule"
--val_batch_size=1024
instead of 256--beam_final_emission_thresh=1.25
added to cap the finals' latency during beam decoding- YAML config: Adds
min_duration: 0.05
seconds to filter out short utterances during training - YAML config: Adds
error_rate: word
which determines the error rate calculated and must be one of{wer|word, cer|char, mer|mixture}