Code for paper: Efficiently Fusing Pretrained Acoustic and Linguistic Encoders for Low-resource Speech Recognition
We only provide key files of our model, w2v-cif-bert
, which can be reimplement based on fairseq.
If you have any questions on the reimplementation, please consult yicheng2016@ia.ac.cn.
- 2021.5.14
Following others' requirement of the baselines used in our paper, we reveal the implementation of
w2v-seq2seq
andw2v-nar
(relative scripts are inbaselines/*
). NOTE: These codes are based on the out-of-date commit(23d8502bdde88a3e58e0910e2ee49834f8478b39 upstream/master)of Fairseq without testing in the new one.
Please cite as:
@article{yi2021efficiently,
title={Efficiently Fusing Pretrained Acoustic and Linguistic Encoders for Low-Resource Speech Recognition},
author={Yi, Cheng and Zhou, Shiyu and Xu, Bo},
journal={IEEE Signal Processing Letters},
volume={28},
pages={788--792},
year={2021},
publisher={IEEE}
}