-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Multi-database English LVCSR recipe #771
Conversation
Two minor suggestions.. When ever you copy a script it would be very convenient to provide the the source path e.g. # This script was copied from egs/swbd/s5c/local/format_acronyms_dict.py (commit d8b196951c1cf3437b3fa6cd76edbbc0542b3db9)
# Minor modifications were made. This is convenient as people familiar with the original scripts can skip unnecessary parts. |
Thanks for the feedback!
Since some of the data prep scripts use relative paths, I'd rather not do this. Is the issue that it's difficult to pick out the non-database specific scripts (like
I will do this! |
@guoguo12 has anyone tried to train this multi-database yet? Will training on a set this large scale linearly- i.e. can I expect to train for about 35x the duration of the tedlium 120 set? Thanks! |
Approximately linearly. If you're talking about neural net training with On Mon, May 30, 2016 at 5:31 PM, D notifications@github.com wrote:
|
@guoguo12 Just from my experience with g2p, both for training and applying, it's safer to use --encoding utf-8, it will avoid some strange behavior. |
@guoguo12 When you think the recipe is ready for intermediate review please let us know so that we can go through them more carefully. |
@vince62s: Thanks! This is true even if the lexicon is pure ASCII? @vijayaditya: We actually just finished the final HMM-GMM training step. I've squashed and pushed all of my work to guoguo12:multi-recipe. On the CLSP Grid, my work is at Here's the speaker-independent WER for eval2000 after the final HMM-GMM step:
This is a bit better than the 32.2% WER achieved by fisher_swbd at roughly the same step (link). I'm still waiting on the speaker-adapted decode to finish; I'll post an update when it's done. Important: I adapted existing Kaldi conventions to work with the multi-database situation. You can read about the design decisions I made in the README. Also, here's a concise chart outlining the exact training recipe: I'll probably add it to the README once it's finalized. After refining the HMM-GMM training steps (as needed), the next step is nnet3/chain. I would be copying |
@guoguo12 I would recommend evaluating the recipe on all the test sets of interest from the beginning as it will help prevent tuning the recipe to one test set. You could evaluate the systems on tedlium test sets, librispeech test sets, rt03 and hub'00. |
@guoguo12 I see that the next stage you plan to execute is nnet3/chain. I would recommend building a TDNN acoustic model trained with cross-entropy and sMBR criteria. This is our most stable recipe and usually helps us figure out if the nnet recipes are working fine. Other nnet3 recipes don't work out of the box for new databases. We previously had problems with BLSTM acoustic models and TDNN+chain acoustic models in some recipes (e.g. Tedlium or AMI). |
@@ -0,0 +1,4 @@ | |||
--use-energy=false | |||
--sample-frequency=16000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The feature configuration file is ideally same for all the databases, unless you are doing some very specific pre-processing. Are you planning to add this feature ? If not, I would strongly recommend just maintaining one conf/mfcc.conf
file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WSJ, AMI, ICSI, Tedlium are 16khz, but fisher and switchboard are 8khz. Since most applications/research is done on the 8khz part, I'd suggest we rather limit the feature banks to 8khz, as opposed to upsampling this data to 16khz.
On a second thought, would it be better to use one mfcc.conf
and downsample the 16khz data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Korbinian, I think it's better to use a sox command to downsample the data
in the wav file, rather than messing with the mel-banks high-freq, which
could be harder for users to get right (e.g. also needs to be done in the
mfcc_hires.conf). Or (even better), downsample them beforehand and dump
them to disk, which might cause less I/O later on.
By messing with the mel high-freq you'd get a different energy than if you
downsampled the signal.
Also, the downsampling beforehand would probably be more efficient if there
are multiple passes of mfcc extraction (which there are).
On Mon, Jun 27, 2016 at 11:05 AM, Korbinian notifications@github.com
wrote:
In egs/multi_en/s5/conf/ami_ihm/mfcc.conf
#771 (comment):@@ -0,0 +1,4 @@
+--use-energy=false
+--sample-frequency=16000WSJ, AMI, ICSI, Tedlium are 16khz, but fisher and switchboard are 8khz.
Since most applications/research is done on the 8khz part, I'd suggest we
rather limit the feature banks to 8khz, as opposed to upsampling this data
to 16khz.—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/kaldi-asr/kaldi/pull/771/files/d13323180b98a83622b0cb193a3a5cd042468609#r68625847,
or mute the thread
https://github.com/notifications/unsubscribe/ADJVu1llR4jOH8czmmR5jrSEIJOWSht-ks5qQBDSgaJpZM4IZ3x7
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danpovey: So you'd recommend switching to downsampling and retraining the HMM-GMM models from scratch before starting TDNN training?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might make people's lives easier later on-- so yes.
On Mon, Jun 27, 2016 at 12:49 PM, Allen Guo notifications@github.com
wrote:
In egs/multi_en/s5/conf/ami_ihm/mfcc.conf
#771 (comment):@@ -0,0 +1,4 @@
+--use-energy=false
+--sample-frequency=16000@danpovey https://github.com/danpovey: So you'd recommend switching to
downsampling and retraining the HMM-GMM models from scratch before starting
TDNN training?—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/kaldi-asr/kaldi/pull/771/files/d13323180b98a83622b0cb193a3a5cd042468609#r68643294,
or mute the thread
https://github.com/notifications/unsubscribe/ADJVuwRCr5al9xNMdcfH9g0uisGZ7rmBks5qQCkygaJpZM4IZ3x7
.
Once your GMM-HMM systems are well tuned please let us know, we can go through a second round of reviews. It would also enable @tomkocse to start working on his multi-condition recipe. |
Yep, will do. I'm currently redoing tri5 (the last GMM-HMM step). Here are the results from tri4, across multiple test sets (as requested):
For fun (and at @sikoried's suggestion), I also decoded the Librispeech test set using a Librispeech LM (small, tg):
These are perhaps slightly worse than expected (11.2%, seen here). |
@xiaohui-zhang you might have some insights in the case of librispeech test set, based on your pronunciation dictionary experiments. |
Allen, can you remind us how you got the lexicon and word list for this On Fri, Jul 8, 2016 at 9:40 PM, Vijayaditya Peddinti <
|
It's CMUDict with all remaining OOVs across all training corpora
|
@guoguo12 I would be also curious to see what the Tedlium test set gives in this set up ...when you get a chance. |
OK. We could definitely do a bit better using Samuel's method that he's On Fri, Jul 8, 2016 at 9:51 PM, Allen Guo notifications@github.com wrote:
|
@vince62s, here are the Tedlium test set results for tri4:
I decoded these using the standard trigram LM for this recipe, which is trained on Fisher/SWBD. |
Allen, could you please get in the habit of putting baselines in these On Tue, Jul 12, 2016 at 11:00 AM, Allen Guo notifications@github.com
|
Sure. The comparable result from the Tedlium recipe is 20.3% WER (link), so this is worse. I would predict that the LM is mostly to blame. |
OK. It would probably make sense eventually to build graphs with the Dan On Tue, Jul 12, 2016 at 11:18 AM, Allen Guo notifications@github.com
|
@@ -0,0 +1,56 @@ | |||
#!/bin/bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this script local/remove_dup_utts.sh should be deleted as it now lives in utils/data/.
@naxingyu Would you be able to take this recipe to the nnet3 stage ? Most of the GPUs on our cluster will be busy for the next 2 weeks and it would be good to see how the results look sooner than that. |
vijay, it may not be obvious to him what changes you refer to here. On Wed, Aug 31, 2016 at 12:44 PM, Vijayaditya Peddinti <
|
Vijay, Dan: sorry this slipped my radar. Which changes? On Aug 31, 2016 12:26, "Daniel Povey" notifications@github.com wrote:
|
I'll get it done later today! |
@vijayaditya I pushed the changes @danpovey requested. Good to go now? Appreciate that you take over the nnet2/3 experiments, they're quite time consuming to run and you guys have a better handle on when they fit on the cluster... |
@vijayaditya, can we have someone run this on our grid before I merge it? Or is already on our grid somewhere? |
IIRC Guoguo ran these experiments on our cluster. I will try to find the --Vijay On Sat, Sep 3, 2016 at 12:36 PM, Daniel Povey notifications@github.com
|
I had run it on the grid, and posted the location a few posts up, and Allen On Sep 3, 2016 17:23, "Vijayaditya Peddinti" notifications@github.com
|
I can't find where you posted the location. |
On clsp: |
Should these be commented in the run.sh?
On Mon, Sep 5, 2016 at 5:04 PM, Korbinian notifications@github.com wrote:
|
and should the 'exit 0' be in the middle of the run.sh? On Mon, Sep 5, 2016 at 5:15 PM, Daniel Povey dpovey@gmail.com wrote:
|
Sorry, these were leftovers from the last run after I had made some adjustments. |
I think it's OK now, but we need to figure out how to squash it at least to some extent- it might be a bit more complicated since it's a multi-author PR. |
@guoguo12 As the owner of this fork/repo, can you squash the commits as indicated by Dan? |
@danpovey: If you enable GitHub's squash merge feature, you should be able to squash it on merge. |
I think it's even enabled :) On Mon, Sep 5, 2016 at 6:57 PM, Allen Guo notifications@github.com wrote:
|
If I do it that way, I doubt that the authorship info will be correct. On Mon, Sep 5, 2016 at 6:59 PM, jtrmal notifications@github.com wrote:
|
Squashed to three commits (original recipe, revision with Tedlium 2, proofreading). |
Thanks! Merging. |
See #699.