aps/CHANGELOG.md at master · funcwj/aps · GitHub

Overview of the Monthly Update

2021/11

Update to PyTorch 1.10
Refactor subampling layer for Transformer based encoders
Adaptive SpecAug
Update LibriSpeech & GigaSpeech results
Add AED decoder rescoring for CTC beam search
Add LibriMix recipe
Refactor tokenizer for online tokenizing
Update ngram_rescore.py to lm_rescore.py

2021/10

Support segment processing in {archive,extract}_wav.py
Add LM shallow fusion for transducer beam search
Add gigaspeech recipe
Add aps.tokenizer and aps.io module

2021/09

Add command to export TorchScript model
Add real time speech enhancement demo code (C++ & Python, Transformer & DFSMN)
Re-impl transducer beam search
Add multi-cn recipe
Use multi-processing for some commands

2021/08

Kick off streaming ASR and real-time SSE
Add streaming features: (i)STFT, fsmn|rnn|conv1d|conv2d ASR encoders
Add encode/decode function in EnhTransform class
Done with streaming transformer/conformer
Add c++ source code (FFT, STFT, Wav IO ...)
CHiME4 recipe

2021/07

Update implementation of DPRNN
Add SepFormer & DFSMN & json based online dataloader (for SSE tasks)
Begin to add torchscript export testing (dccrn, dcunet, dfsmn, tcn, transformer ...)
Change the output of the STFT transform

2021/06

Setup WHAM recipe
Add support for training across the node (multi-nodes)
Add DEMUCS model

2021/05

Setup aishell_v2 & DNS recipe
Fix bugs with PyTorch 1.8

2021/04

Fix bugs in CTC beam search
Remove duplicated layernorm in Conformer
Update result to WSJ recipe

2021/03

Unify distributed/non-distributed training command
Add stitcher for chunk-wise evaluation in SE/SS task
Add CTC alignment command
Unify ASR base class (for CTC & Attention & RNNT AM)

2021/02

Make the repository public
Intergrate CTC score & Enable end detection during beam search
CTC beam & greedy search
Rewrite speed perturb transform

2020/01

Gradient accumulation & checkpoint average in trainer
Rewrite delta feature transform
Ngram LM shallow fusion
BPTT dataloader for LM
Tune and rewrite the beam search decoding
Make fbank & mfcc compatible with kaldi
Add reduction option to configure asr objective computation
Tune and refactor code of transformer/conformer

2020/12

Stop detector in trainer
Add beam search module for ASR
Change the backend to for WER computation
Add speed perturb transform layer
Rewrite beam search for transformer & transducer
Add jit LSTM implementation
Refactor the transformer codes
Add conformer code

2020/11

Add torchscript examples
Use dynamic import & Python decorator
Preemphasis in STFT
Apex trainer
Add flake8 & shellcheck in github workflow
Update documents
Refactor ASR's encoder code
Optimize LM dataloader

2020/10

Add command to evaluation SE/SS performance
Pre-norm & Post-norm & Relative position encoding for Transformer
Refactor code of separation task (make it clear and simple)
Refactor implementation of Transformer
Using python hints

2020/09

Add docker file to setup environment
Setup github workflow
Test cases for ASR networks & tasks
Fix shallow fusion
Add librispeech recipe
Make STFT compatiable with librosa (using stft_mode)

2020/08

Setup CI
Add aishell_v1, wsj0_2mix, chime4, timit recipes
Test cases for dataloader, task module
Add DenseUnet
Fix network implementation for SE/SS tasks

2020/07

Add DCCRN
Distributed training for SE/SS task
Move learning rate scheduler to a single file
Change to absolute import
Refactor trainer package (support both horovod and PyTorch's DDP)
Test cases for transform module
Dataloader for on-the-fly data simulation

2020/06

Inference command for SE/SS
Add WA loss in task package
Network implementation update for SE/SS task (add CRN)
Document draft

2020/05

Refactor the structure to merge the repository that works on speech enhancement/separation

Create task module for different tasks (AM, LM, SE & SS)
Add documents
Add time/frequency loss function for SE/SS task
DCUnet & DPRNN & TasNet ...

2020/04

MVDR front-end with Transformer AM
Fix LM dataloader & Add Transformer LM
Fix transducer training & decoding
Update STFT implementation
Add shallow fusion

2020/03

Multi-channel front-end: fixed beamformer variants
Initial code of RNN transducer
Unsupervised training experiments on CHiME4
Refactor code of Transformer and fix bugs
FSMN encoder

2020/02

Add dataloader for enhancement/separation task
SpecAugment

2019/12

Early stop in Trainer
Joint attention & CTC training
Global cmvn in feature transform
Support distributed training
Multi-channel front-end: fixed beamformer (google & amazon)

2019/11

Initial code of Transformer
Vectorized & Batch beam search for decoding
Support mfcc extraction & cmvn & feature splicing in the feature transform
TDNN encoder
RNN LM training & Ngram rescore
Schedule sampling during training
Multi-head attention variants
Multi-channel front-end: MVDR beamformer
WER evaluation

2019/10

Raw audio dataloader
Create asr transform module to handle feature extraction
Update the Trainer

2019/05 - 2019/09

Work on another speech enhancement & separation repository

Single & Multi-channel feature extraction (IPD, DF, ...)
Time & Frequency domain objective function
Network variants in speech enhancement/separation
Training and inference commands
Dataloader for the offline data pairs and on-the-fly data simulation

2019/06

Create the repository and make the first commit to synchronize local data

2019/04

Local coding on LAS (listen, attend and spell) experiments

First version of the feature transform for ASR task
Kaldi feature dataloader
RNN encoder & decoder
Attention variants (context & dot & location aware)
Training & Decoding command