Release notes

This release adds low-latency end-pointing to detect the end of utterance. It also caps the latency of the beam search finals at 1.25 seconds which significantly reduces both the finals' latency and the user-perceived tail latencies without impacting WER. Finally, this release also speeds up beam decoding on-GPU by up to 10x.

This release adds:

End-pointing (docs)
Capping of the delay between partials and finals via --beam_final_emission_thresh=1.25
A batched implementation of the on-GPU beam decoder
Support for training models in character based languages (tested in Mandarin). This required:
- small tokenizer changes
- support for calculating character error rate (CER) and mixture error rate (MER)

This release also:

Improves the scheduling of the delay penalty by waiting until the validation WER has dropped before this kicks in (docs)
Reduces the startup time at the beginning of training by adding a noise-data cache and speeding up both json parsing and tokenization
Deprecates the 49M param testing model configuration and makes the 85M param base model the default for training. See supported models
Improves the usability of the live demo client (docs)
Fixes the emission latency estimation for the beam decoder
Improves logging during training and evaluation
Filters utterances shorter than min_duration: 0.05s during training

Summary of changes to default args

--delay_penalty="linear_schedule" instead of "wer_schedule"
--val_batch_size=1024 instead of 256
--beam_final_emission_thresh=1.25 added to cap the finals' latency during beam decoding
YAML config: Adds min_duration: 0.05 seconds to filter out short utterances during training
YAML config: Adds error_rate: word which determines the error rate calculated and must be one of {wer|word, cer|char, mer|mixture}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.13.0

Release notes

This release adds:

This release also:

Summary of changes to default args