-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dropout schedule in nnet3 training scripts #1247
Comments
@danpovey @vimalmanohar I think adding this function into the existing dropout is interesting : |
@danpovey could you give me some guidance on how to set a random matrix by row in kaldi? .... I saw a cumatrix function |
I assume what you want is a matrix where each row is randomly all zeros or
all ones.
I would first set a random vector with dimension == the NumCols() of the
matrix to random zeroes and ones using a combination of SetRandUniform(),
Add() and ApplyHeaviside().
You can create a matrix with 1 row if the relevant functions are not
available in class CuVector)... and then use CopyColsFromVec() to copy it
to the matrix.
Dan
…On Tue, Dec 6, 2016 at 5:46 PM, Gaofeng Cheng ***@***.***> wrote:
@danpovey <https://github.com/danpovey> could you give me some guidance
on how to set a random matrix by row in kaldi? .... I saw a cumatrix
function ApplyHeaviside, you could tell the function name you will use
for realizing this funcion, and I can do it myslf.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1247 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADJVu2qPkamevXkybp7B7mN2MOedmo4jks5rFhACgaJpZM4LDr60>
.
|
[note: to some extent this is a response to discussions that have been happening by email or on @vimalmanohar's repo.] The basic situation is that @GaofengCheng has been doing a lot of experiments investigating how to do dropout in BLSTMs and different dropout schedules, and is getting some really nice improvements (around 1% absolute); and I believe his best current setup is based on just putting conventional dropout after the rp_t component (the component that combines the 'r' and 'p' matrices in projected LSTMs).... [BTW, @GaofengCheng, you might want to try putting it just on the 'r' or 'p' parts if you haven't tried that already... that may require a bit of messing around with dim-range components. It's possible to split things apart using dim-range nodes, and then append them back together using Apend]. I have been thinking about the best next-steps to take with regards to this dropout-schedule stuff, and getting it merged to master in the nicest way. @vijayaditya, you may want to chime in if you disagree with this plan. |
@danpovey I added the dropout on the input of 'rp' , i.e. before LSTM projection, .... but I can try on the output of rp right now and see the effect(this may better than on the input of rp, because the dropout effect will do directly on the LSTM gates)... @vimalmanohar as for the dropout place, you can ref |
Oh OK, so I guess the dropout is on 'm_t', because that's where 'rp' gets
its input projection from (and I think m_t is not used anywhere else). In
the proposed scheme, this could be accomplished by setting 'dropout=m', and
of course writing the appropriate code.
…On Thu, Dec 22, 2016 at 5:03 PM, Gaofeng Cheng ***@***.***> wrote:
@danpovey <https://github.com/danpovey> I added the dropout on the input
of 'rp' , i.e. before LSTM projection, .... but I can try on the output of
rp right now and see the effect(this may better than on the input of rp,
because the dropout effect will do directly on the LSTM gates)...
@vimalmanohar <https://github.com/vimalmanohar> as for the dropout place,
you can ref lstm.py in vimalmanohar#8
<vimalmanohar#8>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1247 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADJVu3e9YgW2AKDNeur8AUpP_RV1E51hks5rKx3GgaJpZM4LDr60>
.
|
@danpovey yes... input of lstm dropout is m_t |
On Thu, Dec 22, 2016 at 7:30 PM Daniel Povey ***@***.***> wrote:
[note: to some extent this is a response to discussions that have been
happening by email or on @vimalmanohar <https://github.com/vimalmanohar>'s
repo.]
The basic situation is that @GaofengCheng
<https://github.com/GaofengCheng> has been doing a lot of experiments
investigating how to do dropout in BLSTMs and different dropout schedules,
and is getting some really nice improvements (around 1% absolute); and I
believe his best current setup is based on just putting conventional
dropout after the rp_t component (the component that combines the 'r' and
'p' matrices in projected LSTMs).... [BTW, @GaofengCheng
<https://github.com/GaofengCheng>, you might want to try putting it just
on the 'r' or 'p' parts if you haven't tried that already... that may
require a bit of messing around with dim-range components. It's possible to
split things apart using dim-range nodes, and then append them back
together using Apend].
I have been thinking about the best next-steps to take with regards to
this dropout-schedule stuff, and getting it merged to master in the nicest
way.
I think @vimalmanohar <https://github.com/vimalmanohar> should be in
charge of this since he is kind of taking the lead on the nnet3
python-script maintenance and development. What I'm thinking is we could
just use what we've learned from @GaofengCheng
<https://github.com/GaofengCheng>'s experiments but (if Vimal feels it is
best) modify the python code from a clean start if that is more conducive
to getting things done fast. [Also, I think @GaofengCheng
<https://github.com/GaofengCheng> was using the pre-xconfig scripts,
which we shouldn't be messing with at this point.]
What I'm thinking, @vimalmanohar <https://github.com/vimalmanohar>, is
that we can give the various LSTM xconfig classes a string-valued component
called 'dropout', defaulting to None, which you would set to 'rp' to do
dropout as Gaofeng is currently recommending (i.e. on the output of the
'rp' component). We need to make sure this works in the new, 'fast' LSTM
component as well as the old one.
Seems reasonable. @GaofengCheng can add this to his PR after he tests out
the fast LSTM component. I can help with the xconfig modifications if
needed.
The use of a string-valued config will mean this is extensible to any new
setup that Gaofeng comes up with.
Since we were not seeing great results for the 'whole-frame' dropout,
let's not consider merging any of that just yet; we'll merge it to master
if it turns out to give a benefit in some setup.
@vijayaditya <https://github.com/vijayaditya>, you may want to chime in
if you disagree with this plan.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1247 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEATV1juMr1UZp7kPc-mzeDejLLqWk8mks5rKxYJgaJpZM4LDr60>
.
--
Vimal Manohar
PhD Student
Electrical & Computer Engineering
Johns Hopkins University
|
How to add dropout module in TDNN script |
local/nnet3/run_tdnn3.sh: creating neural net configs Traceback (most recent call last): Xconfig error adding new layer on TDNN model |
Looks like you added a space between dropout-per-dim- and continous.
…On Thu, Nov 1, 2018 at 8:53 AM xiaowang ***@***.***> wrote:
https://github.com/kaldi-asr/kaldi/blob/master/egs/swbd/s5c/local/chain/tuning/run_tdnn_7q.sh
https://github.com/kaldi-asr/kaldi/blob/master/egs/swbd/s5c/local/chain/tuning/run_tdnn_7p.sh
https://github.com/kaldi-asr/kaldi/blob/master/egs/swbd/s5c/local/chain/tuning/run_tdnn_7o.sh
https://github.com/kaldi-asr/kaldi/blob/master/egs/swbd/s5c/local/chain/tuning/run_tdnn_7q.sh
https://github.com/kaldi-asr/kaldi/blob/master/egs/swbd/s5c/local/chain/tuning/run_tdnn_7p.sh
https://github.com/kaldi-asr/kaldi/blob/master/egs/swbd/s5c/local/chain/tuning/run_tdnn_7o.sh
local/nnet3/run_tdnn3.sh: creating neural net configs
tree-info exp/tri5a_sp_ali/tree
steps/nnet3/xconfig_to_configs.py --xconfig-file
exp/nnet3/tdnn_sp_2/configs/network.xconfig --config-dir
exp/nnet3/tdnn_sp_2/configs/
ERROR:root:***Exception caught while parsing the following xconfig line:
*** relu-batchnorm-dropout-layer name=tdnn1 l2-regularize=0.004
dropout-proportion=0.0 dropout-per-dim=true dropout-per-dim-
continuous=true dim=850
Traceback (most recent call last):
File "steps/nnet3/xconfig_to_configs.py", line 333, in
main()
File "steps/nnet3/xconfig_to_configs.py", line 323, in main
all_layers = xparser.read_xconfig_file(args.xconfig_file, existing_layers)
File "steps/libs/nnet3/xconfig/parser.py", line 189, in read_xconfig_file
this_layer = xconfig_line_to_object(line, existing_layers)
File "steps/libs/nnet3/xconfig/parser.py", line 96, in
xconfig_line_to_object
return config_to_layer[first_token](first_token, key_to_value, prev_layers)
File "steps/libs/nnet3/xconfig/basic_layers.py", line 706, in *init*
XconfigLayerBase.*init*(self, first_token, key_to_value, prev_names)
File "steps/libs/nnet3/xconfig/basic_layers.py", line 68, in *init*
self.set_configs(key_to_value, all_layers)
File "steps/libs/nnet3/xconfig/basic_layers.py", line 97, in set_configs
"" .format(key, value, self.layer_type, configs))
RuntimeError: Configuration value continuous=true was not expected in
layer of type relu-batchnorm-dropout-layer; allowed configs with their
defaults: self-repair-scale->1e-05 l2-regularize->"" add-log-stddev->False
ng-linear-options->"" bias-stddev->"" bottleneck-dim->-1
dropout-per-dim->False dim->-1 max-change->0.75 ng-affine-options->""
learning-rate-factor->"" dropout-per-dim-continuous->False input->"[-1]"
dropout-proportion->0.5 target-rms->1.0
Xconfig error adding new layer on TDNN model
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#1247 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADJVu1SluBwSgaIb--5mY7K8PVuTeORGks5uqu7egaJpZM4LDr60>
.
|
Thank you, the previous problem has been solved.However, there is a problem with decoding. |
I suggest to cd to src/, do "make depend -j 10" and "make -j 10" to minimize the chance of compilation errors, and try again. If that doesn't work, get it in gdb and show me a stack trace: gdb --args (program) (args), then "r", then "bt" when it crashes. E.g.
|
yuyin@yuyin-Super-Server: Did not solve the problem, gdb will not use |
when in gdb, type 'run' and when/if it crashes, type 'bt' and paste the
output of that command -- that is what dan is looking for.
y.
…On Fri, Nov 2, 2018 at 9:00 AM xiaowang ***@***.***> wrote:
I suggest to cd to src/, do "make depend -j 10" and "make -j 10" to
minimize the chance of compilation errors, and try again. If that doesn't
work, get it in gdb and show me a stack trace: gdb --args (program) (args),
then "r", then "bt" when it crashes. E.g.
gdb --args nnet3-latgen-faster --online-ivectors=scp:exp/n.....
(gdb) r
...
(gdb) bt
***@***.***:/kaldi-trunk1/egs/aishell/s5$ gdb --args
nnet3-latgen-faster
--online-ivectors=scp:exp/nnet3/ivectors_dev/ivector_online.scp
--online-ivector-period=10 --frames-per-chunk=50 --extra-left-context=0
--extra-right-context=0 --extra-left-context-initial=-1
--extra-right-context-final=-1 --minimize=false --max-active=7000
--min-active=200 --beam=15.0 --lattice-beam=8.0 --acoustic-scale=0.1
--allow-partial=true --word-symbol-table=exp/tri5a/graph/words.txt
exp/nnet3/tdnn_sp_2/final.mdl exp/tri5a/graph/HCLG.fst "ark,s,cs:apply-cmvn
--norm-means=false --norm-vars=false
--utt2spk=ark:data/dev_hires/split40/17/utt2spk
scp:data/dev_hires/split40/17/cmvn.scp
scp:data/dev_hires/split40/17/feats.scp ark:- |" "ark:|gzip -c
>exp/nnet3/tdnn_sp_2/decode_dev/lat.17.gz"
GNU gdb (Ubuntu 7.11.1-0ubuntu116.5) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:
http://www.gnu.org/software/gdb/documentation/.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
nnet3-latgen-faster: 没有那个文件或目录.
(gdb)
Did not solve the problem, gdb will not use
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1247 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AKisX7N_Z9DlTPdBPKaDWFUOidT2kjtBks5urEHSgaJpZM4LDr60>
.
|
yuyin@yuyin-Super-Server:~/kaldi-trunk1$ g++ -g -o nnet3-latgen-faster nnet3-latgen-faster.cc Does GDB support shell scripts?
yuyin@yuyin-Super-Server:~/kaldi-trunk1$ g++ -g -o nnet3-latgen-faster nnet3-latgen-faster.cc Does GDB support shell scripts? |
I think you are confusing g++ and gdb. |
I know dan, but I won't use gdb. |
yuyin@yuyin-Super-Server: (gdb) r --online-ivectors=scp:exp/nnet3/ivectors_dev/ivector_online.scp --online-ivector-period=10 --frames-per-chunk=50 --extra-left-context=0 --extra-right-context=0 --extra-left-context-initial=-1 --extra-right-context-final=-1 --minimize=false --max-active=7000 --min-active=200 --beam=15.0 --lattice-beam=8.0 --acoustic-scale=0.1 --allow-partial=true --word-symbol-table=exp/tri5a/graph/words.txt exp/nnet3/tdnn_sp_2/final.mdl exp/tri5a/graph/HCLG.fst "ark,s,cs:apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/dev_hires/split40/17/utt2spk scp:data/dev_hires/split40/17/cmvn.scp scp:data/dev_hires/split40/17/feats.scp ark:- |" "ark:|gzip -c >exp/nnet3/tdnn_sp_2/decode_dev/lat.17.gz" [ Stack-Trace: ] ERROR (nnet3-latgen-faster[5.5.88~3-8e30f]:Input():kaldi-io.cc:756) Error opening input stream exp/nnet3/tdnn_sp_2/final.mdl [ Stack-Trace: ] |
Is the training file final.mdl wrong? |
you are running it from a different directory, probably |
Please get someone local to help you. We are busy and we don't have time to deal with people who don't know basic things like how to use a debugger, and there must be people in your lab who know this stuff. |
thank you.The problem has been solved because the previous model is not updated |
I want to ask which papers are used in the dropout algorithm on kaldi. |
There are different forms available. If you are asking about the one used
in the TDNN-F scripts, which is continuous and shared across time, look at
my publications page, it may possibly be described in the paper on
factorized TDNNs with Gaofeng Cheng as a co-author.
There is also more conventional dropout.
Dan
…On Thu, Nov 8, 2018 at 8:50 PM xiaowang ***@***.***> wrote:
I want to ask which papers are used in the dropout algorithm on kaldi.
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#1247 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADJVu58GP5QMU4SU5kYpitmHKaNi60dDks5utN9ggaJpZM4LDr60>
.
|
Yes, about TDNN. |
Recently, @GaofengCheng has been doing some interesting experiments with dropout and BLSTMs, and getting nice improvements. He was using a dropout schedule in which you start with zero dropout, ramp up to 0.2, and then go back to zero at the very end.
I have been thinking about the best and most flexible way to support general dropout schedules in the training scripts. @vimalmanohar, since you are now the 'owner' of the python training scripts, it would be best if you take this on.
Here is my proposal.
Firstly, the --set-dropout-proportion (or whatever it is) option to
nnet3*-copy
is (or should be) deprecated. The way I want to do this is by adding an option to the '--edits-config' file. See ReadEditConfig() in nnet-utils.h. The option should have the following documentation in the comment there:The documentation for the python-training-script option would read something like the following:
I suggest to turn this into a command-line option to nnet3-copy or nnet3-am-copy that looks like the following, to avoid having to create lots of little config files:
The double-quotes are just a bit of paranoia, to avoid bash globbing in case a file like 'name=lstmX' exists, but of course this does avoid some directory I/O.
I'd be OK with placing the parsing of the option to the inner part of the python code even if this means it's done multiple times, if this helps keep the code structure clean; I don't think the time taken is significant in the overall scheme of things.
The text was updated successfully, but these errors were encountered: