-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"formosa_speech" recipe and database for Taiwanese Mandarin speech recognition #2474
Merged
Merged
Changes from 111 commits
Commits
Show all changes
112 commits
Select commit
Hold shift + click to select a range
fb28c8e
Merge pull request #1 from kaldi-asr/master
yfliao 54521e9
first checkin
fe1fb71
correct folder structure
cd7b777
second cmmmit
a8152b2
update README.md
ed22027
Update README.md
yfliao 1a8d2b0
Update README.md
yfliao 0ae3159
Update README.md
yfliao 9ebbfa6
update README.md
74fbfb0
Update README.md
yfliao df3b5e0
Update README.md
yfliao d3cb7a7
Update README.md
yfliao db0b0d5
Update README.md
yfliao b43ff5a
add some instructions
167b8ab
Update README.md
yfliao 866adb4
Update README.md
yfliao 97f9a18
Update README.md
yfliao 89741af
Update README.md
yfliao 3230add
Update README.md
yfliao 9506be4
Update README.md
yfliao cf0595b
change all step switches from "false" to "true"
62e6b7c
remove unwanted files
651cba9
correct the steps and utilis links
61c34d6
Update README.md
yfliao 4e15a99
Update README.md
yfliao 4e74211
Update README.md
yfliao bce909a
Update README.md
yfliao aca06d6
Update README.md
yfliao 0ef4c7f
Update README.md
yfliao 6857581
Update README.md
yfliao 07e7713
Update README.md
yfliao 9bdece8
Update README.md
yfliao 5acf4fc
Update README.md
yfliao 334ba97
Update README.md
yfliao 8bfb322
Update README.md
yfliao b6281f2
Update README.md
yfliao fb2ec45
Update README.md
yfliao 0a52e92
Update README.md
yfliao d2f3690
Update README.md
yfliao 1054ea8
Update README.md
yfliao 850ab85
Update README.md
yfliao e8e1528
Delete gmm.config
yfliao 4431c18
change name to formosa
509247b
Merge branch 'master' of https://github.com/yfliao/kaldi
15e1546
Delete README.md
yfliao ea7b932
Update README.txt
yfliao d2a5a47
add instructions and reference results to run.sh
69816a5
remove scripts that don't have confirmed results yet
a499e1f
Update README.txt
yfliao 5d9560c
clean up/modifications and add the results of chain model
ddf5c2c
clean up/modifications and add the results of chain model
aac9184
clean up/modification and add the results of chain model
5caab51
add statistics of training and test sets.
1978379
add experimental settings and a brief descritption of Taiwanese langu…
349f891
add experimental settings for chain model
c469d3d
Update README.txt
yfliao 0d063ae
Update README.txt
yfliao 1f52406
Update README.txt
yfliao 6756e94
Update RESULTS
yfliao 978df65
add cleanup script and results
f928e1d
correct the folder naming
2f91195
correct folder naming
34c3410
correct folder naming
34ee2d5
correct text preparation (remove utt. id) for LM training
5e1f41b
Merge branch 'master' into master
yfliao 41b8c4f
correct name of the test set
yfliao 81c04c3
Update run.sh
yfliao 00915d7
Update run_tdnn_1a.sh
yfliao f2487e8
Update run_tdnn.sh
yfliao 6cfe116
Delete run_tdnn_lstm.sh
yfliao ddacb22
correct mismatch naming
yfliao a5f060d
correct mismatch naming
yfliao 6e673cf
Merge branch 'master' of https://github.com/yfliao/kaldi
yfliao 5031b3f
cleanup redundant and make the codeing style consistent with recipes
87a66d9
Merge branch 'master' of https://github.com/yfliao/kaldi
0593eaa
cleanup codes
d13872d
cleanup, add "--stage" option, remvoe overdoing and change variables …
4408079
cleanup
e738d93
cleanup
2ace7b6
Merge branch 'master' into master
yfliao cd0e281
Update README.txt
yfliao 77c47b1
add $train_stage to have control on nnet3 and chain model training
29fed04
Update README.txt
yfliao 1946977
change cmd.sh from queue.pl to run.pl to avoid gridengine settings
a048759
switch back to use "queue.pl" in "cmd.sh", to be consistent with main…
a96bd73
change gpu option to "--use-gpu wait" for runnning "run.pl"
86aa3f6
change the default "--train_stage" to "7" for running local/chain/run…
d4b785f
add scripts for decoding eval set
90d2f25
add scripts for decoding eval data set
9e84ecc
switch execution command to "run.pl"
7535d5e
add comments
fb850f0
commented out local/nnet3/run_tdnn.sh to simplify the codes, since ch…
ef31f7d
should call "run_eval_ivector_common.sh" instead of "run_ivector_comm…
4755ca1
commented out "local/nnet3/run_tdnn.sh" by default, since the chain m…
87d3f6b
fix the scripts to extract recognition reults from exp/*/*/decode_eva…
7dfed8b
corrected typos, recips --> recipe
a3e3f8b
revert commit
alex-ht 909eb45
Merge pull request #3 from yfliao/alex-ht-patch-1
alex-ht 4dcea98
remove redundant code
alex-ht 002cdd1
simplify local/prepare_data.sh
alex-ht 75c8204
syntax error
alex-ht 5b0ff75
1. add local/run_cleanup_segmentation.sh 2. some minor fix
alex-ht 5740120
alex-patch-2 (#4)
alex-ht 07cbefe
training on cleaned data
alex-ht 00b7968
training on cleaned data
alex-ht 2940d43
Update README.txt
yfliao c29d221
Update README.txt
yfliao c88f9a0
Update README.txt
yfliao 04d3752
Update README.txt
yfliao 56d5611
Update README.txt
yfliao 13b2281
Update README.txt
yfliao d026c7a
some advice from Povey. (#5)
alex-ht File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
### Welcome to the demo recipe of the Formosa Speech in the Wild (FSW) Project ### | ||
|
||
The language habits of Taiwanese people are different from other Mandarin speakers (both accents and cultures) [1]. Especially Tainwaese use tranditional Chinese characters, i.e., 繁體中文). To address this issue, a Taiwanese speech corpus collection project "Formosa Speech in the Wild (FSW)" was initiated in 2017 to improve the development of Taiwanese-specific speech recognition techniques. | ||
|
||
FSW corpus will be a large-scale database of real-Life/multi-gene Taiwanese Spontaneous speech collected and transcribed from various sources (radio, TV, open courses, etc.). To demostrate that this database is a reasonable data resource for Taiwanese spontaneous speech recognition research, a baseline recipe is provied here for everybody, especially students, to develop their own systems easily and quickly. | ||
|
||
This recipe is based on the "NER-Trs-Vol1" corpus (about 150 hours broadcast radio speech selected from FSW). For more details, please visit: | ||
* Formosa Speech in the Wild (FSW) project (https://sites.google.com/speech.ntut.edu.tw/fsw) | ||
|
||
If you want to apply the NER-Trs-Vol1 corpus, please contact Yuan-Fu Liao (廖元甫) via "yfliao@mail.ntut.edu.tw". This corpus is only for non-commercial research/education use and will be distributed via our GitLab server in https://speech.nchc.org.tw. | ||
|
||
Any bug, errors, comments or suggestions are very welcomed. | ||
|
||
Yuan-Fu Liao (廖元甫) | ||
Associate Professor | ||
Department of electronic Engineering, | ||
National Taipei University of Technology | ||
http://www.ntut.edu.tw/~yfliao | ||
yfliao@mail.ntut.edu.tw | ||
|
||
............ | ||
[1] The languages of Taiwan consist of several varieties of languages under families of the Austronesian languages and the Sino-Tibetan languages. Taiwanese Mandarin, Hokkien, Hakka and Formosan languages are used by 83.5%, 81.9%, 6.6% and 1.4% of the population respectively (2010). Given the prevalent use of Taiwanese Hokkien, the Mandarin spoken in Taiwan has been to a great extent influenced by it. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
# | ||
# Reference results | ||
# | ||
# Experimental settings: | ||
# | ||
# training set: show CS, BG, DA, QG, SR, SY and WK, in total 18977 utt., 1,088,948 words | ||
# test set: show JZ, GJ, KX and YX, in total 2112 utt., 135,972 words | ||
# | ||
# lexicon: 274,036 words | ||
# phones (IPA): 196 (tonal) | ||
# | ||
# tdnn: 6 layers * 850 Relu neurons | ||
# Features: 43-dim MFCCs * 5 frames + 100-dim ivector (with LDA) | ||
# chain: 6 layers * 625 Relu neurons | ||
# Features: 43-dim MFCCs * 3 frames + 100-dim ivector (with LDA) | ||
# | ||
|
||
# | ||
# WER: | ||
# | ||
|
||
%WER 61.32 [ 83373 / 135972, 5458 ins, 19156 del, 58759 sub ] exp/mono/decode_test/wer_11_0.0 | ||
%WER 41.00 [ 55742 / 135972, 6725 ins, 12763 del, 36254 sub ] exp/tri1/decode_test/wer_15_0.0 | ||
%WER 40.41 [ 54948 / 135972, 7366 ins, 11505 del, 36077 sub ] exp/tri2/decode_test/wer_14_0.0 | ||
%WER 38.67 [ 52574 / 135972, 6855 ins, 11250 del, 34469 sub ] exp/tri3a/decode_test/wer_15_0.0 | ||
%WER 35.70 [ 48546 / 135972, 7197 ins, 9717 del, 31632 sub ] exp/tri4a/decode_test/wer_17_0.0 | ||
%WER 32.11 [ 43661 / 135972, 6112 ins, 10185 del, 27364 sub ] exp/tri5a/decode_test/wer_17_0.5 | ||
%WER 31.36 [ 42639 / 135972, 6846 ins, 8860 del, 26933 sub ] exp/tri5a_cleaned/decode_test/wer_17_0.5 | ||
%WER 24.43 [ 33218 / 135972, 5524 ins, 7583 del, 20111 sub ] exp/nnet3/tdnn_sp/decode_test/wer_12_0.0 | ||
%WER 23.95 [ 32568 / 135972, 4457 ins, 10271 del, 17840 sub ] exp/chain/tdnn_1a_sp/decode_test/wer_10_0.0 | ||
%WER 23.54 [ 32006 / 135972, 4717 ins, 8644 del, 18645 sub ] exp/chain/tdnn_1b_sp/decode_test/wer_10_0.0 | ||
|
||
# | ||
# CER: | ||
# | ||
|
||
%WER 54.09 [ 116688 / 215718, 4747 ins, 24510 del, 87431 sub ] exp/mono/decode_test/cer_10_0.0 | ||
%WER 32.61 [ 70336 / 215718, 5866 ins, 16282 del, 48188 sub ] exp/tri1/decode_test/cer_13_0.0 | ||
%WER 32.10 [ 69238 / 215718, 6186 ins, 15772 del, 47280 sub ] exp/tri2/decode_test/cer_13_0.0 | ||
%WER 30.40 [ 65583 / 215718, 6729 ins, 13115 del, 45739 sub ] exp/tri3a/decode_test/cer_12_0.0 | ||
%WER 27.53 [ 59389 / 215718, 6311 ins, 13008 del, 40070 sub ] exp/tri4a/decode_test/cer_15_0.0 | ||
%WER 24.21 [ 52232 / 215718, 6425 ins, 11543 del, 34264 sub ] exp/tri5a/decode_test/cer_15_0.0 | ||
%WER 23.41 [ 50492 / 215718, 6645 ins, 10997 del, 32850 sub ] exp/tri5a_cleaned/decode_test/cer_17_0.0 | ||
%WER 17.07 [ 36829 / 215718, 4734 ins, 9938 del, 22157 sub ] exp/nnet3/tdnn_sp/decode_test/cer_12_0.0 | ||
%WER 16.83 [ 36305 / 215718, 4772 ins, 10810 del, 20723 sub ] exp/chain/tdnn_1a_sp/decode_test/cer_9_0.0 | ||
%WER 16.44 [ 35459 / 215718, 4216 ins, 11278 del, 19965 sub ] exp/chain/tdnn_1b_sp/decode_test/cer_10_0.0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
# you can change cmd.sh depending on what type of queue you are using. | ||
# If you have no queueing system and want to run on a local machine, you | ||
# can change all instances 'queue.pl' to run.pl (but be careful and run | ||
# commands one by one: most recipes will exhaust the memory on your | ||
# machine). queue.pl works with GridEngine (qsub). slurm.pl works | ||
# with slurm. Different queues are configured differently, with different | ||
# queue names and different ways of specifying things like memory; | ||
# to account for these differences you can create and edit the file | ||
# conf/queue.conf to match your queue's configuration. Search for | ||
# conf/queue.conf in http://kaldi-asr.org/doc/queue.html for more information, | ||
# or search for the string 'default_config' in utils/queue.pl or utils/slurm.pl. | ||
|
||
export train_cmd="run.pl --mem 2G" | ||
export decode_cmd="run.pl --mem 4G" | ||
export mkgraph_cmd="run.pl --mem 8G" | ||
|
||
#export train_cmd="queue.pl --mem 2G" | ||
#export decode_cmd="queue.pl --mem 4G" | ||
#export mkgraph_cmd="queue.pl --mem 8G" | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
beam=11.0 # beam for decoding. Was 13.0 in the scripts. | ||
first_beam=8.0 # beam for 1st-pass decoding in SAT. | ||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
--use-energy=false # only non-default option. | ||
--sample-frequency=16000 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
# config for high-resolution MFCC features, intended for neural network training. | ||
# Note: we keep all cepstra, so it has the same info as filterbank features, | ||
# but MFCC is more easily compressible (because less correlated) which is why | ||
# we prefer this method. | ||
--use-energy=false # use average of log energy, not energy. | ||
--sample-frequency=16000 # Switchboard is sampled at 8kHz | ||
--num-mel-bins=40 # similar to Google's setup. | ||
--num-ceps=40 # there is no dimensionality reduction. | ||
--low-freq=40 # low cutoff frequency for mel bins | ||
--high-freq=-200 # high cutoff frequently, relative to Nyquist of 8000 (=3800) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
# configuration file for apply-cmvn-online, used when invoking online2-wav-nnet3-latgen-faster. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
--sample-frequency=16000 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,145 @@ | ||
#!/bin/bash | ||
# | ||
# Copyright 2018, Yuan-Fu Liao, National Taipei University of Technology, yfliao@mail.ntut.edu.tw | ||
# | ||
# Before you run this recipe, please apply, download and put or make a link of the corpus under this folder (folder name: "NER-Trs-Vol1-Eval"). | ||
# For more detail, please check: | ||
# 1. Formosa Speech in the Wild (FSW) project (https://sites.google.com/speech.ntut.edu.tw/fsw/home/corpus) | ||
# 2. Formosa Speech Recognition Challenge (FSW) 2018 (https://sites.google.com/speech.ntut.edu.tw/fsw/home/challenge) | ||
stage=-2 | ||
train_stage=-10 | ||
num_jobs=20 | ||
|
||
# shell options | ||
set -e -o pipefail | ||
|
||
. ./cmd.sh | ||
. ./utils/parse_options.sh | ||
|
||
# configure number of jobs running in parallel, you should adjust these numbers according to your machines | ||
# data preparation | ||
if [ $stage -le -2 ]; then | ||
|
||
# Data Preparation | ||
echo "$0: Data Preparation" | ||
local/prepare_eval_data.sh || exit 1; | ||
|
||
fi | ||
|
||
# Now make MFCC plus pitch features. | ||
# mfccdir should be some place with a largish disk where you | ||
# want to store MFCC features. | ||
mfccdir=mfcc | ||
|
||
# mfcc | ||
if [ $stage -le -1 ]; then | ||
|
||
echo "$0: making mfccs" | ||
for x in eval; do | ||
steps/make_mfcc_pitch.sh --cmd "$train_cmd" --nj $num_jobs data/$x exp/make_mfcc/$x $mfccdir || exit 1; | ||
steps/compute_cmvn_stats.sh data/$x exp/make_mfcc/$x $mfccdir || exit 1; | ||
utils/fix_data_dir.sh data/$x || exit 1; | ||
done | ||
|
||
fi | ||
|
||
# mono | ||
if [ $stage -le 0 ]; then | ||
|
||
# Monophone decoding | ||
( | ||
steps/decode.sh --cmd "$decode_cmd" --config conf/decode.config --nj $num_jobs \ | ||
exp/mono/graph data/eval exp/mono/decode_eval | ||
) | ||
|
||
fi | ||
|
||
# tri1 | ||
if [ $stage -le 1 ]; then | ||
|
||
# decode tri1 | ||
( | ||
steps/decode.sh --cmd "$decode_cmd" --config conf/decode.config --nj $num_jobs \ | ||
exp/tri1/graph data/eval exp/tri1/decode_eval | ||
) | ||
|
||
fi | ||
|
||
# tri2 | ||
if [ $stage -le 2 ]; then | ||
|
||
# decode tri2 | ||
( | ||
steps/decode.sh --cmd "$decode_cmd" --config conf/decode.config --nj $num_jobs \ | ||
exp/tri2/graph data/eval exp/tri2/decode_eval | ||
) | ||
|
||
fi | ||
|
||
# tri3a | ||
if [ $stage -le 3 ]; then | ||
|
||
# decode tri3a | ||
( | ||
steps/decode.sh --cmd "$decode_cmd" --nj $num_jobs --config conf/decode.config \ | ||
exp/tri3a/graph data/eval exp/tri3a/decode_eval | ||
) | ||
|
||
fi | ||
|
||
# tri4 | ||
if [ $stage -le 4 ]; then | ||
|
||
# decode tri4a | ||
( | ||
steps/decode_fmllr.sh --cmd "$decode_cmd" --nj $num_jobs --config conf/decode.config \ | ||
exp/tri4a/graph data/eval exp/tri4a/decode_eval | ||
) | ||
|
||
fi | ||
|
||
# tri5 | ||
if [ $stage -le 5 ]; then | ||
|
||
# decode tri5 | ||
( | ||
steps/decode_fmllr.sh --cmd "$decode_cmd" --nj $num_jobs --config conf/decode.config \ | ||
exp/tri5a/graph data/eval exp/tri5a/decode_eval || exit 1; | ||
) | ||
|
||
fi | ||
|
||
# nnet3 tdnn models | ||
# commented out by default, since the chain model is usually faster and better | ||
if [ $stage -le 6 ]; then | ||
|
||
# train_stage=99 | ||
# echo "$0: evaluate nnet3 model" | ||
# local/nnet3/run_tdnn.sh --stage $train_stage | ||
|
||
fi | ||
|
||
# chain model | ||
if [ $stage -le 7 ]; then | ||
|
||
train_stage=99 | ||
echo "$0: evaluate chain model" | ||
local/chain/run_tdnn.sh --stage $train_stage | ||
|
||
fi | ||
|
||
# getting results (see RESULTS file) | ||
if [ $stage -le 10 ]; then | ||
|
||
echo "$0: extract the results" | ||
rm -f eval-decoding-results.log | ||
touch eval-decoding-results.log | ||
for x in exp/*/decode_eval/log; do [ -d $x ] && grep NER $x/*.log | grep -v LOG | grep -v WARNING >> eval-decoding-results.log; done | ||
for x in exp/*/*/decode_eval/log; do [ -d $x ] && grep NER $x/*.log | grep -v LOG | grep -v WARNING >> eval-decoding-results.log; done | ||
|
||
fi | ||
|
||
# finish | ||
echo "$0: all done" | ||
|
||
exit 0; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
tuning/run_tdnn_1b.sh |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please have it as queue.pl when you check it in (since if you run using run.pl on a grid setup, it's not immediately obvious and is harmful to the machines.)