-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Refer to the README.md to each eg directory for description.
- Loading branch information
Showing
75 changed files
with
5,787 additions
and
97 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
### LibriCSS integrated recipe | ||
|
||
This is a Kaldi recipe for the LibriCSS data, providing diarization and | ||
ASR on mixed single-channel and separated audio inputs. | ||
|
||
#### Data | ||
We use the LibriCSS data released with the following paper: | ||
``` | ||
@article{Chen2020ContinuousSS, | ||
title={Continuous Speech Separation: Dataset and Analysis}, | ||
author={Z. Chen and T. Yoshioka and Liang Lu and T. Zhou and Zhong Meng and Yi Luo and J. Wu and J. Li}, | ||
journal={ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, | ||
year={2020} | ||
} | ||
``` | ||
For the official data and code, check out [the official repo](https://github.com/chenzhuo1011/libri_css). | ||
|
||
#### Recipe details | ||
This recipe addresses the problem of speech recognition in a meeting-like | ||
scenario, where multiple overlapping speakers may be present, and the | ||
number of speakers is not known beforehand. | ||
|
||
We provide recipes for 2 scenarios: | ||
1. `s5_mono`: This is a single channel diarization + ASR recipe which takes as the | ||
input a long single-channel recording containing mixed audio. It then performs SAD, | ||
diarization, and ASR on it and outputs speaker-attributed transcriptions, | ||
which are then evaluated with cpWER (similar to CHiME6 Track 2). | ||
2. `s5_css`: This pipeline uses a speech separation module at the beginning, | ||
so the input is 2-3 separated audio streams. We assume that the separation is | ||
window-based, so that the same speaker may be split across different streams in | ||
different windows, thus making diarization necessary. | ||
|
||
#### Pretrained models for diarization and ASR | ||
For ease of reproduction, we include the training for both modules in the | ||
recipe. We also provide pretrained models for both diarization and ASR | ||
systems. | ||
|
||
* SAD: CHiME-6 baseline TDNN-Stats SAD available [here](http://kaldi-asr.org/models/m12). | ||
* Speaker diarization: CHiME-6 baseline x-vector + AHC diarizer, trained on VoxCeleb | ||
with simulated RIRs available [here](http://kaldi-asr.org/models/m12). | ||
* ASR: We used the chain model trained on 960h clean LibriSpeech training data available | ||
[here](http://kaldi-asr.org/models/m13). It was then additionally fine-tuned for 1 | ||
epoch on LibriSpeech + simulated RIRs. For LM, we trained a TDNN-LSTM language model | ||
for rescoring. All of these models are available at this | ||
[Google Drive link](https://drive.google.com/file/d/13ceXdK6oAUuUyxn7kjQVVqpe8r6Sc7ds/view?usp=sharing). | ||
|
||
#### Speech separation | ||
The speech separation module has not been provided. If you want to use the | ||
`s5_css` recipe, check out [this tutorial](https://desh2608.github.io/pages/jsalt/) for | ||
instructions on how to plug in your component into the pipeline. | ||
|
||
If you found this recipe useful for your experiments, consider citing: | ||
|
||
``` | ||
@article{Raj2021Integration, | ||
title={Integration of speech separation, diarization, and recognition for multi-speaker meetings: | ||
System description, Comparison, and Analysis}, | ||
author={D.Raj and P.Denisov and Z.Chen and H.Erdogan and Z.Huang and M.He and S.Watanabe and | ||
J.Du and T.Yoshioka and Y.Luo and N.Kanda and J.Li and S.Wisdom and J.Hershey}, | ||
journal={IEEE Spoken Language Technology Workshop 2021}, | ||
year={2021} | ||
} | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
# you can change cmd.sh depending on what type of queue you are using. | ||
# If you have no queueing system and want to run on a local machine, you | ||
# can change all instances 'queue.pl' to run.pl (but be careful and run | ||
# commands one by one: most recipes will exhaust the memory on your | ||
# machine). queue.pl works with GridEngine (qsub). slurm.pl works | ||
# with slurm. Different queues are configured differently, with different | ||
# queue names and different ways of specifying things like memory; | ||
# to account for these differences you can create and edit the file | ||
# conf/queue.conf to match your queue's configuration. Search for | ||
# conf/queue.conf in http://kaldi-asr.org/doc/queue.html for more information, | ||
# or search for the string 'default_config' in utils/queue.pl or utils/slurm.pl. | ||
|
||
export train_cmd="retry.pl queue.pl --mem 2G" | ||
export decode_cmd="queue.pl --mem 4G" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
--use-energy=false | ||
--sample-frequency=16000 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
# config for high-resolution MFCC features, intended for neural network training. | ||
# Note: we keep all cepstra, so it has the same info as filterbank features, | ||
# but MFCC is more easily compressible (because less correlated) which is why | ||
# we prefer this method. | ||
--use-energy=false # use average of log energy, not energy. | ||
--sample-frequency=16000 | ||
--num-mel-bins=40 | ||
--num-ceps=40 | ||
--low-freq=40 | ||
--high-freq=-400 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
# configuration file for apply-cmvn-online, used in the script ../local/run_online_decoding.sh |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../callhome_diarization/v1/diarization |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../s5_mono/local |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
export KALDI_ROOT=`pwd`/../../.. | ||
[ -f $KALDI_ROOT/tools/env.sh ] && . $KALDI_ROOT/tools/env.sh | ||
export PATH=$PWD/utils/:$KALDI_ROOT/tools/openfst/bin:$KALDI_ROOT/tools/sctk/bin:$PWD:$PATH | ||
export PATH=$PWD/dscore:$PATH | ||
export PYTHONPATH="${PYTHONPATH}:$PWD/dscore" | ||
[ ! -f $KALDI_ROOT/tools/config/common_path.sh ] && echo >&2 "The standard file $KALDI_ROOT/tools/config/common_path.sh is not present -> Exit!" && exit 1 | ||
. $KALDI_ROOT/tools/config/common_path.sh | ||
export LC_ALL=C | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../../scripts/rnnlm |
Oops, something went wrong.