- Update to PyTorch 1.10
- Refactor subampling layer for Transformer based encoders
- Adaptive SpecAug
- Update LibriSpeech & GigaSpeech results
- Add AED decoder rescoring for CTC beam search
- Add LibriMix recipe
- Refactor tokenizer for online tokenizing
- Update ngram_rescore.py to lm_rescore.py
- Support segment processing in {archive,extract}_wav.py
- Add LM shallow fusion for transducer beam search
- Add gigaspeech recipe
- Add aps.tokenizer and aps.io module
- Add command to export TorchScript model
- Add real time speech enhancement demo code (C++ & Python, Transformer & DFSMN)
- Re-impl transducer beam search
- Add multi-cn recipe
- Use multi-processing for some commands
- Kick off streaming ASR and real-time SSE
- Add streaming features: (i)STFT, fsmn|rnn|conv1d|conv2d ASR encoders
- Add encode/decode function in EnhTransform class
- Done with streaming transformer/conformer
- Add c++ source code (FFT, STFT, Wav IO ...)
- CHiME4 recipe
- Update implementation of DPRNN
- Add SepFormer & DFSMN & json based online dataloader (for SSE tasks)
- Begin to add torchscript export testing (dccrn, dcunet, dfsmn, tcn, transformer ...)
- Change the output of the STFT transform
- Setup WHAM recipe
- Add support for training across the node (multi-nodes)
- Add DEMUCS model
- Setup aishell_v2 & DNS recipe
- Fix bugs with PyTorch 1.8
- Fix bugs in CTC beam search
- Remove duplicated layernorm in Conformer
- Update result to WSJ recipe
- Unify distributed/non-distributed training command
- Add stitcher for chunk-wise evaluation in SE/SS task
- Add CTC alignment command
- Unify ASR base class (for CTC & Attention & RNNT AM)
- Make the repository public
- Intergrate CTC score & Enable end detection during beam search
- CTC beam & greedy search
- Rewrite speed perturb transform
- Gradient accumulation & checkpoint average in trainer
- Rewrite delta feature transform
- Ngram LM shallow fusion
- BPTT dataloader for LM
- Tune and rewrite the beam search decoding
- Make fbank & mfcc compatible with kaldi
- Add reduction option to configure asr objective computation
- Tune and refactor code of transformer/conformer
- Stop detector in trainer
- Add beam search module for ASR
- Change the backend to for WER computation
- Add speed perturb transform layer
- Rewrite beam search for transformer & transducer
- Add jit LSTM implementation
- Refactor the transformer codes
- Add conformer code
- Add torchscript examples
- Use dynamic import & Python decorator
- Preemphasis in STFT
- Apex trainer
- Add flake8 & shellcheck in github workflow
- Update documents
- Refactor ASR's encoder code
- Optimize LM dataloader
- Add command to evaluation SE/SS performance
- Pre-norm & Post-norm & Relative position encoding for Transformer
- Refactor code of separation task (make it clear and simple)
- Refactor implementation of Transformer
- Using python hints
- Add docker file to setup environment
- Setup github workflow
- Test cases for ASR networks & tasks
- Fix shallow fusion
- Add librispeech recipe
- Make STFT compatiable with librosa (using stft_mode)
- Setup CI
- Add aishell_v1, wsj0_2mix, chime4, timit recipes
- Test cases for dataloader, task module
- Add DenseUnet
- Fix network implementation for SE/SS tasks
- Add DCCRN
- Distributed training for SE/SS task
- Move learning rate scheduler to a single file
- Change to absolute import
- Refactor trainer package (support both horovod and PyTorch's DDP)
- Test cases for transform module
- Dataloader for on-the-fly data simulation
- Inference command for SE/SS
- Add WA loss in task package
- Network implementation update for SE/SS task (add CRN)
- Document draft
Refactor the structure to merge the repository that works on speech enhancement/separation
- Create task module for different tasks (AM, LM, SE & SS)
- Add documents
- Add time/frequency loss function for SE/SS task
- DCUnet & DPRNN & TasNet ...
- MVDR front-end with Transformer AM
- Fix LM dataloader & Add Transformer LM
- Fix transducer training & decoding
- Update STFT implementation
- Add shallow fusion
- Multi-channel front-end: fixed beamformer variants
- Initial code of RNN transducer
- Unsupervised training experiments on CHiME4
- Refactor code of Transformer and fix bugs
- FSMN encoder
- Add dataloader for enhancement/separation task
- SpecAugment
- Early stop in Trainer
- Joint attention & CTC training
- Global cmvn in feature transform
- Support distributed training
- Multi-channel front-end: fixed beamformer (google & amazon)
- Initial code of Transformer
- Vectorized & Batch beam search for decoding
- Support mfcc extraction & cmvn & feature splicing in the feature transform
- TDNN encoder
- RNN LM training & Ngram rescore
- Schedule sampling during training
- Multi-head attention variants
- Multi-channel front-end: MVDR beamformer
- WER evaluation
- Raw audio dataloader
- Create asr transform module to handle feature extraction
- Update the Trainer
Work on another speech enhancement & separation repository
- Single & Multi-channel feature extraction (IPD, DF, ...)
- Time & Frequency domain objective function
- Network variants in speech enhancement/separation
- Training and inference commands
- Dataloader for the offline data pairs and on-the-fly data simulation
Create the repository and make the first commit to synchronize local data
Local coding on LAS (listen, attend and spell) experiments
- First version of the feature transform for ASR task
- Kaldi feature dataloader
- RNN encoder & decoder
- Attention variants (context & dot & location aware)
- Training & Decoding command