Dropout schedule in nnet3 training scripts #1247

danpovey · 2016-12-04T20:28:42Z

Recently, @GaofengCheng has been doing some interesting experiments with dropout and BLSTMs, and getting nice improvements. He was using a dropout schedule in which you start with zero dropout, ramp up to 0.2, and then go back to zero at the very end.

I have been thinking about the best and most flexible way to support general dropout schedules in the training scripts. @vimalmanohar, since you are now the 'owner' of the python training scripts, it would be best if you take this on.

Here is my proposal.

Firstly, the --set-dropout-proportion (or whatever it is) option to nnet3*-copy is (or should be) deprecated. The way I want to do this is by adding an option to the '--edits-config' file. See ReadEditConfig() in nnet-utils.h. The option should have the following documentation in the comment there:

     set-dropout-proportion [name=<name-pattern>] proportion=<dropout-proportion>
        Sets the dropout rates for any components of type DropoutComponent whose
        names match the given <name-pattern> (e.g. lstm*).  <name-pattern> defaults to "*".

The documentation for the python-training-script option would read something like the following:

   parser.add_argument("--trainer.dropout-schedule", type=str, 
           dest='dropout_schedule', default='',
          help="""Use this to specify the dropout schedule.  You specify
        a piecewise linear function on the domain [0,1], where 0 is the start
        and 1 is the end of training; the function-argument (x) rises linearly with
        the amount of data you have seen, not iteration number (this improves
        invariance to num-jobs-{initial-final}).  E.g. '0,0.2,0' means 0 at the
        start; 0.2 after seeing half the data; and 0 at the end.  You may
        specify the x-value of selected points, e.g. '0,0.2@0.25,0' means
        that the 0.2 dropout-proportion is reached a quarter of the way through the
        data.   The start/end x-values are at x=0/x=1, and other unspecified x-values
        are interpolated between known x-values.  You may specify different rules
        for different component-name patterns using 'pattern1=func1 pattern2=func2',
        e.g. 'relu*=0,0.1,0 lstm*=0,0.2,0'.  More general should precede less general
       patterns, as they are applied sequentially.""")

I suggest to turn this into a command-line option to nnet3-copy or nnet3-am-copy that looks like the following, to avoid having to create lots of little config files:

--edits-config='echo "set-dropout-proportion name=lstm* proportion=0.113"; echo "set-dropout-proportion name=tdnn* proportion=0.575"|'

The double-quotes are just a bit of paranoia, to avoid bash globbing in case a file like 'name=lstmX' exists, but of course this does avoid some directory I/O.
I'd be OK with placing the parsing of the option to the inner part of the python code even if this means it's done multiple times, if this helps keep the code structure clean; I don't think the time taken is significant in the overall scheme of things.

The text was updated successfully, but these errors were encountered:

GaofengCheng · 2016-12-05T02:15:36Z

@danpovey @vimalmanohar I think adding this function into the existing dropout is interesting :
supporting schedule [0, 0.2, 0] and[0, 1.0, 0] during one training.
We could control the monotony of specific dropout components.

GaofengCheng · 2016-12-07T01:46:41Z

@danpovey could you give me some guidance on how to set a random matrix by row in kaldi? .... I saw a cumatrix function ApplyHeaviside, you could tell the function name you will use for realizing this funcion, and I can do it myslf.

danpovey · 2016-12-07T01:52:38Z

I assume what you want is a matrix where each row is randomly all zeros or all ones. I would first set a random vector with dimension == the NumCols() of the matrix to random zeroes and ones using a combination of SetRandUniform(), Add() and ApplyHeaviside(). You can create a matrix with 1 row if the relevant functions are not available in class CuVector)... and then use CopyColsFromVec() to copy it to the matrix. Dan

…

On Tue, Dec 6, 2016 at 5:46 PM, Gaofeng Cheng ***@***.***> wrote: @danpovey <https://github.com/danpovey> could you give me some guidance on how to set a random matrix by row in kaldi? .... I saw a cumatrix function ApplyHeaviside, you could tell the function name you will use for realizing this funcion, and I can do it myslf. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1247 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADJVu2qPkamevXkybp7B7mN2MOedmo4jks5rFhACgaJpZM4LDr60> .

danpovey · 2016-12-23T00:29:50Z

[note: to some extent this is a response to discussions that have been happening by email or on @vimalmanohar's repo.]

The basic situation is that @GaofengCheng has been doing a lot of experiments investigating how to do dropout in BLSTMs and different dropout schedules, and is getting some really nice improvements (around 1% absolute); and I believe his best current setup is based on just putting conventional dropout after the rp_t component (the component that combines the 'r' and 'p' matrices in projected LSTMs).... [BTW, @GaofengCheng, you might want to try putting it just on the 'r' or 'p' parts if you haven't tried that already... that may require a bit of messing around with dim-range components. It's possible to split things apart using dim-range nodes, and then append them back together using Apend].

I have been thinking about the best next-steps to take with regards to this dropout-schedule stuff, and getting it merged to master in the nicest way.
I think @vimalmanohar should be in charge of this since he is kind of taking the lead on the nnet3 python-script maintenance and development. What I'm thinking is we could just use what we've learned from @GaofengCheng's experiments but (if Vimal feels it is best) modify the python code from a clean start if that is more conducive to getting things done fast. [Also, I think @GaofengCheng was using the pre-xconfig scripts, which we shouldn't be messing with at this point.]
What I'm thinking, @vimalmanohar, is that we can give the various LSTM xconfig classes a string-valued component called 'dropout', defaulting to None, which you would set to 'rp' to do dropout as Gaofeng is currently recommending (i.e. on the output of the 'rp' component). We need to make sure this works in the new, 'fast' LSTM component as well as the old one. The use of a string-valued config will mean this is extensible to any new setup that Gaofeng comes up with.
Since we were not seeing great results for the 'whole-frame' dropout, let's not consider merging any of that just yet; we'll merge it to master if it turns out to give a benefit in some setup.

@vijayaditya, you may want to chime in if you disagree with this plan.

GaofengCheng · 2016-12-23T01:03:01Z

@danpovey I added the dropout on the input of 'rp' , i.e. before LSTM projection, .... but I can try on the output of rp right now and see the effect(this may better than on the input of rp, because the dropout effect will do directly on the LSTM gates)... @vimalmanohar as for the dropout place, you can ref lstm.py in vimalmanohar#8

danpovey · 2016-12-23T01:15:40Z

Oh OK, so I guess the dropout is on 'm_t', because that's where 'rp' gets its input projection from (and I think m_t is not used anywhere else). In the proposed scheme, this could be accomplished by setting 'dropout=m', and of course writing the appropriate code.

…

On Thu, Dec 22, 2016 at 5:03 PM, Gaofeng Cheng ***@***.***> wrote: @danpovey <https://github.com/danpovey> I added the dropout on the input of 'rp' , i.e. before LSTM projection, .... but I can try on the output of rp right now and see the effect(this may better than on the input of rp, because the dropout effect will do directly on the LSTM gates)... @vimalmanohar <https://github.com/vimalmanohar> as for the dropout place, you can ref lstm.py in vimalmanohar#8 <vimalmanohar#8> — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1247 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADJVu3e9YgW2AKDNeur8AUpP_RV1E51hks5rKx3GgaJpZM4LDr60> .

GaofengCheng · 2016-12-23T01:19:57Z

@danpovey yes... input of lstm dropout is m_t

vimalmanohar · 2016-12-23T15:09:16Z

On Thu, Dec 22, 2016 at 7:30 PM Daniel Povey ***@***.***> wrote: [note: to some extent this is a response to discussions that have been happening by email or on @vimalmanohar <https://github.com/vimalmanohar>'s repo.] The basic situation is that @GaofengCheng <https://github.com/GaofengCheng> has been doing a lot of experiments investigating how to do dropout in BLSTMs and different dropout schedules, and is getting some really nice improvements (around 1% absolute); and I believe his best current setup is based on just putting conventional dropout after the rp_t component (the component that combines the 'r' and 'p' matrices in projected LSTMs).... [BTW, @GaofengCheng <https://github.com/GaofengCheng>, you might want to try putting it just on the 'r' or 'p' parts if you haven't tried that already... that may require a bit of messing around with dim-range components. It's possible to split things apart using dim-range nodes, and then append them back together using Apend]. I have been thinking about the best next-steps to take with regards to this dropout-schedule stuff, and getting it merged to master in the nicest way. I think @vimalmanohar <https://github.com/vimalmanohar> should be in charge of this since he is kind of taking the lead on the nnet3 python-script maintenance and development. What I'm thinking is we could just use what we've learned from @GaofengCheng <https://github.com/GaofengCheng>'s experiments but (if Vimal feels it is best) modify the python code from a clean start if that is more conducive to getting things done fast. [Also, I think @GaofengCheng <https://github.com/GaofengCheng> was using the pre-xconfig scripts, which we shouldn't be messing with at this point.] What I'm thinking, @vimalmanohar <https://github.com/vimalmanohar>, is that we can give the various LSTM xconfig classes a string-valued component called 'dropout', defaulting to None, which you would set to 'rp' to do dropout as Gaofeng is currently recommending (i.e. on the output of the 'rp' component). We need to make sure this works in the new, 'fast' LSTM component as well as the old one.

Seems reasonable. @GaofengCheng can add this to his PR after he tests out the fast LSTM component. I can help with the xconfig modifications if needed.

The use of a string-valued config will mean this is extensible to any new setup that Gaofeng comes up with. Since we were not seeing great results for the 'whole-frame' dropout, let's not consider merging any of that just yet; we'll merge it to master if it turns out to give a benefit in some setup. @vijayaditya <https://github.com/vijayaditya>, you may want to chime in if you disagree with this plan. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1247 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEATV1juMr1UZp7kPc-mzeDejLLqWk8mks5rKxYJgaJpZM4LDr60> .

-- Vimal Manohar PhD Student Electrical & Computer Engineering Johns Hopkins University

13265170340 · 2018-11-01T03:01:50Z

Recently, @GaofengCheng has been doing some interesting experiments with dropout and BLSTMs, and getting nice improvements. He was using a dropout schedule in which you start with zero dropout, ramp up to 0.2, and then go back to zero at the very end.

I have been thinking about the best and most flexible way to support general dropout schedules in the training scripts. @vimalmanohar, since you are now the 'owner' of the python training scripts, it would be best if you take this on.

Here is my proposal.

Firstly, the --set-dropout-proportion (or whatever it is) option to nnet3*-copy is (or should be) deprecated. The way I want to do this is by adding an option to the '--edits-config' file. See ReadEditConfig() in nnet-utils.h. The option should have the following documentation in the comment there:
     set-dropout-proportion [name=<name-pattern>] proportion=<dropout-proportion>
        Sets the dropout rates for any components of type DropoutComponent whose
        names match the given <name-pattern> (e.g. lstm*).  <name-pattern> defaults to "*".
The documentation for the python-training-script option would read something like the following:
   parser.add_argument("--trainer.dropout-schedule", type=str, 
           dest='dropout_schedule', default='',
          help="""Use this to specify the dropout schedule.  You specify
        a piecewise linear function on the domain [0,1], where 0 is the start
        and 1 is the end of training; the function-argument (x) rises linearly with
        the amount of data you have seen, not iteration number (this improves
        invariance to num-jobs-{initial-final}).  E.g. '0,0.2,0' means 0 at the
        start; 0.2 after seeing half the data; and 0 at the end.  You may
        specify the x-value of selected points, e.g. '0,0.2@0.25,0' means
        that the 0.2 dropout-proportion is reached a quarter of the way through the
        data.   The start/end x-values are at x=0/x=1, and other unspecified x-values
        are interpolated between known x-values.  You may specify different rules
        for different component-name patterns using 'pattern1=func1 pattern2=func2',
        e.g. 'relu*=0,0.1,0 lstm*=0,0.2,0'.  More general should precede less general
       patterns, as they are applied sequentially.""")
I suggest to turn this into a command-line option to nnet3-copy or nnet3-am-copy that looks like the following, to avoid having to create lots of little config files:
--edits-config='echo "set-dropout-proportion name=lstm* proportion=0.113"; echo "set-dropout-proportion name=tdnn* proportion=0.575"|'
The double-quotes are just a bit of paranoia, to avoid bash globbing in case a file like 'name=lstmX' exists, but of course this does avoid some directory I/O.
I'd be OK with placing the parsing of the option to the inner part of the python code even if this means it's done multiple times, if this helps keep the code structure clean; I don't think the time taken is significant in the overall scheme of things.

How to add dropout module in TDNN script

GaofengCheng · 2018-11-01T03:05:17Z

https://github.com/kaldi-asr/kaldi/blob/master/egs/swbd/s5c/local/chain/tuning/run_tdnn_7q.sh
https://github.com/kaldi-asr/kaldi/blob/master/egs/swbd/s5c/local/chain/tuning/run_tdnn_7p.sh
https://github.com/kaldi-asr/kaldi/blob/master/egs/swbd/s5c/local/chain/tuning/run_tdnn_7o.sh

13265170340 · 2018-11-01T03:09:29Z

https://github.com/kaldi-asr/kaldi/blob/master/egs/swbd/s5c/local/chain/tuning/run_tdnn_7q.sh
https://github.com/kaldi-asr/kaldi/blob/master/egs/swbd/s5c/local/chain/tuning/run_tdnn_7p.sh
https://github.com/kaldi-asr/kaldi/blob/master/egs/swbd/s5c/local/chain/tuning/run_tdnn_7o.sh

Thank you

13265170340 · 2018-11-01T12:53:37Z

https://github.com/kaldi-asr/kaldi/blob/master/egs/swbd/s5c/local/chain/tuning/run_tdnn_7q.sh
https://github.com/kaldi-asr/kaldi/blob/master/egs/swbd/s5c/local/chain/tuning/run_tdnn_7p.sh
https://github.com/kaldi-asr/kaldi/blob/master/egs/swbd/s5c/local/chain/tuning/run_tdnn_7o.sh

local/nnet3/run_tdnn3.sh: creating neural net configs
tree-info exp/tri5a_sp_ali/tree
steps/nnet3/xconfig_to_configs.py --xconfig-file exp/nnet3/tdnn_sp_2/configs/network.xconfig --config-dir exp/nnet3/tdnn_sp_2/configs/
ERROR:root:***Exception caught while parsing the following xconfig line:
*** relu-batchnorm-dropout-layer name=tdnn1 l2-regularize=0.004 dropout-proportion=0.0 dropout-per-dim=true dropout-per-dim- continuous=true dim=850

Traceback (most recent call last):
File "steps/nnet3/xconfig_to_configs.py", line 333, in
main()
File "steps/nnet3/xconfig_to_configs.py", line 323, in main
all_layers = xparser.read_xconfig_file(args.xconfig_file, existing_layers)
File "steps/libs/nnet3/xconfig/parser.py", line 189, in read_xconfig_file
this_layer = xconfig_line_to_object(line, existing_layers)
File "steps/libs/nnet3/xconfig/parser.py", line 96, in xconfig_line_to_object
return config_to_layer[first_token](first_token, key_to_value, prev_layers)
File "steps/libs/nnet3/xconfig/basic_layers.py", line 706, in init
XconfigLayerBase.init(self, first_token, key_to_value, prev_names)
File "steps/libs/nnet3/xconfig/basic_layers.py", line 68, in init
self.set_configs(key_to_value, all_layers)
File "steps/libs/nnet3/xconfig/basic_layers.py", line 97, in set_configs
"" .format(key, value, self.layer_type, configs))
RuntimeError: Configuration value continuous=true was not expected in layer of type relu-batchnorm-dropout-layer; allowed configs with their defaults: self-repair-scale->1e-05 l2-regularize->"" add-log-stddev->False ng-linear-options->"" bias-stddev->"" bottleneck-dim->-1 dropout-per-dim->False dim->-1 max-change->0.75 ng-affine-options->"" learning-rate-factor->"" dropout-per-dim-continuous->False input->"[-1]" dropout-proportion->0.5 target-rms->1.0

Xconfig error adding new layer on TDNN model

danpovey · 2018-11-01T16:13:39Z

Looks like you added a space between dropout-per-dim- and continous.

…

On Thu, Nov 1, 2018 at 8:53 AM xiaowang ***@***.***> wrote: https://github.com/kaldi-asr/kaldi/blob/master/egs/swbd/s5c/local/chain/tuning/run_tdnn_7q.sh https://github.com/kaldi-asr/kaldi/blob/master/egs/swbd/s5c/local/chain/tuning/run_tdnn_7p.sh https://github.com/kaldi-asr/kaldi/blob/master/egs/swbd/s5c/local/chain/tuning/run_tdnn_7o.sh https://github.com/kaldi-asr/kaldi/blob/master/egs/swbd/s5c/local/chain/tuning/run_tdnn_7q.sh https://github.com/kaldi-asr/kaldi/blob/master/egs/swbd/s5c/local/chain/tuning/run_tdnn_7p.sh https://github.com/kaldi-asr/kaldi/blob/master/egs/swbd/s5c/local/chain/tuning/run_tdnn_7o.sh local/nnet3/run_tdnn3.sh: creating neural net configs tree-info exp/tri5a_sp_ali/tree steps/nnet3/xconfig_to_configs.py --xconfig-file exp/nnet3/tdnn_sp_2/configs/network.xconfig --config-dir exp/nnet3/tdnn_sp_2/configs/ ERROR:root:***Exception caught while parsing the following xconfig line: *** relu-batchnorm-dropout-layer name=tdnn1 l2-regularize=0.004 dropout-proportion=0.0 dropout-per-dim=true dropout-per-dim- continuous=true dim=850 Traceback (most recent call last): File "steps/nnet3/xconfig_to_configs.py", line 333, in main() File "steps/nnet3/xconfig_to_configs.py", line 323, in main all_layers = xparser.read_xconfig_file(args.xconfig_file, existing_layers) File "steps/libs/nnet3/xconfig/parser.py", line 189, in read_xconfig_file this_layer = xconfig_line_to_object(line, existing_layers) File "steps/libs/nnet3/xconfig/parser.py", line 96, in xconfig_line_to_object return config_to_layer[first_token](first_token, key_to_value, prev_layers) File "steps/libs/nnet3/xconfig/basic_layers.py", line 706, in *init* XconfigLayerBase.*init*(self, first_token, key_to_value, prev_names) File "steps/libs/nnet3/xconfig/basic_layers.py", line 68, in *init* self.set_configs(key_to_value, all_layers) File "steps/libs/nnet3/xconfig/basic_layers.py", line 97, in set_configs "" .format(key, value, self.layer_type, configs)) RuntimeError: Configuration value continuous=true was not expected in layer of type relu-batchnorm-dropout-layer; allowed configs with their defaults: self-repair-scale->1e-05 l2-regularize->"" add-log-stddev->False ng-linear-options->"" bias-stddev->"" bottleneck-dim->-1 dropout-per-dim->False dim->-1 max-change->0.75 ng-affine-options->"" learning-rate-factor->"" dropout-per-dim-continuous->False input->"[-1]" dropout-proportion->0.5 target-rms->1.0 Xconfig error adding new layer on TDNN model — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#1247 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADJVu1SluBwSgaIb--5mY7K8PVuTeORGks5uqu7egaJpZM4LDr60> .

13265170340 · 2018-11-02T05:33:04Z

steps/nnet3/decode.sh --nj 40 --cmd run.pl --online-ivector-dir exp/nnet3/ivectors_dev exp/tri5a/graph data/dev_hires exp/nnet3/tdnn_sp_2/decode_dev
steps/nnet3/decode.sh: feature type is raw
bash: line 1: 46146 Segmentation fault      (core dumped) ( nnet3-latgen-faster --online-ivectors=scp:exp/nnet3/ivectors_dev/ivector_online.scp --online-ivector-period=10 --frames-per-chunk=50 --extra-left-context=0 --extra-right-context=0 --extra-left-context-initial=-1 --extra-right-context-final=-1 --minimize=false --max-active=7000 --min-active=200 --beam=15.0 --lattice-beam=8.0 --acoustic-scale=0.1 --allow-partial=true --word-symbol-table=exp/tri5a/graph/words.txt exp/nnet3/tdnn_sp_2/final.mdl exp/tri5a/graph/HCLG.fst "ark,s,cs:apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/dev_hires/split40/34/utt2spk scp:data/dev_hires/split40/34/cmvn.scp scp:data/dev_hires/split40/34/feats.scp ark:- |" "ark:|gzip -c >exp/nnet3/tdnn_sp_2/decode_dev/lat.34.gz" ) 2>> exp/nnet3/tdnn_sp_2/decode_dev/log/decode.34.log >> exp/nnet3/tdnn_sp_2/decode_dev/log/decode.34.log

LOG (nnet3-latgen-faster[5.5.88~3-8e30f]:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 13 orphan nodes.
LOG (nnet3-latgen-faster[5.5.88~3-8e30f]:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 20 orphan components.
LOG (nnet3-latgen-faster[5.5.88~3-8e30f]:Collapse():nnet-utils.cc:1378) Added 7 components, removed 20
apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/dev_hires/split40/1/utt2spk scp:data/dev_hires/split40/1/cmvn.scp scp:data/dev_hires/split40/1/feats.scp ark:-

Thank you, the previous problem has been solved.However, there is a problem with decoding.

danpovey · 2018-11-02T05:36:56Z

I suggest to cd to src/, do "make depend -j 10" and "make -j 10" to minimize the chance of compilation errors, and try again. If that doesn't work, get it in gdb and show me a stack trace: gdb --args (program) (args), then "r", then "bt" when it crashes. E.g.

gdb --args nnet3-latgen-faster --online-ivectors=scp:exp/n.....
(gdb) r
...
(gdb) bt

13265170340 · 2018-11-02T12:59:50Z

I suggest to cd to src/, do "make depend -j 10" and "make -j 10" to minimize the chance of compilation errors, and try again. If that doesn't work, get it in gdb and show me a stack trace: gdb --args (program) (args), then "r", then "bt" when it crashes. E.g.
gdb --args nnet3-latgen-faster --online-ivectors=scp:exp/n.....
(gdb) r
...
(gdb) bt

yuyin@yuyin-Super-Server:/kaldi-trunk1/egs/aishell/s5$ gdb --args nnet3-latgen-faster --online-ivectors=scp:exp/nnet3/ivectors_dev/ivector_online.scp --online-ivector-period=10 --frames-per-chunk=50 --extra-left-context=0 --extra-right-context=0 --extra-left-context-initial=-1 --extra-right-context-final=-1 --minimize=false --max-active=7000 --min-active=200 --beam=15.0 --lattice-beam=8.0 --acoustic-scale=0.1 --allow-partial=true --word-symbol-table=exp/tri5a/graph/words.txt exp/nnet3/tdnn_sp_2/final.mdl exp/tri5a/graph/HCLG.fst "ark,s,cs:apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/dev_hires/split40/17/utt2spk scp:data/dev_hires/split40/17/cmvn.scp scp:data/dev_hires/split40/17/feats.scp ark:- |" "ark:|gzip -c >exp/nnet3/tdnn_sp_2/decode_dev/lat.17.gz"
GNU gdb (Ubuntu 7.11.1-0ubuntu116.5) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:
http://www.gnu.org/software/gdb/documentation/.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
nnet3-latgen-faster: 没有那个文件或目录.
(gdb)

Did not solve the problem, gdb will not use

jtrmal · 2018-11-02T13:02:38Z

when in gdb, type 'run' and when/if it crashes, type 'bt' and paste the output of that command -- that is what dan is looking for. y.

…

On Fri, Nov 2, 2018 at 9:00 AM xiaowang ***@***.***> wrote: I suggest to cd to src/, do "make depend -j 10" and "make -j 10" to minimize the chance of compilation errors, and try again. If that doesn't work, get it in gdb and show me a stack trace: gdb --args (program) (args), then "r", then "bt" when it crashes. E.g. gdb --args nnet3-latgen-faster --online-ivectors=scp:exp/n..... (gdb) r ... (gdb) bt ***@***.***:/kaldi-trunk1/egs/aishell/s5$ gdb --args nnet3-latgen-faster --online-ivectors=scp:exp/nnet3/ivectors_dev/ivector_online.scp --online-ivector-period=10 --frames-per-chunk=50 --extra-left-context=0 --extra-right-context=0 --extra-left-context-initial=-1 --extra-right-context-final=-1 --minimize=false --max-active=7000 --min-active=200 --beam=15.0 --lattice-beam=8.0 --acoustic-scale=0.1 --allow-partial=true --word-symbol-table=exp/tri5a/graph/words.txt exp/nnet3/tdnn_sp_2/final.mdl exp/tri5a/graph/HCLG.fst "ark,s,cs:apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/dev_hires/split40/17/utt2spk scp:data/dev_hires/split40/17/cmvn.scp scp:data/dev_hires/split40/17/feats.scp ark:- |" "ark:|gzip -c >exp/nnet3/tdnn_sp_2/decode_dev/lat.17.gz" GNU gdb (Ubuntu 7.11.1-0ubuntu116.5) 7.11.1 Copyright (C) 2016 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/. Find the GDB manual and other documentation resources online at: http://www.gnu.org/software/gdb/documentation/. For help, type "help". Type "apropos word" to search for commands related to "word"... nnet3-latgen-faster: 没有那个文件或目录. (gdb) Did not solve the problem, gdb will not use — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#1247 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AKisX7N_Z9DlTPdBPKaDWFUOidT2kjtBks5urEHSgaJpZM4LDr60> .

13265170340 · 2018-11-02T13:17:17Z

yuyin@yuyin-Super-Server:~/kaldi-trunk1$ g++ -g -o nnet3-latgen-faster nnet3-latgen-faster.cc
g++: error: nnet3-latgen-faster.cc: 没有那个文件或目录
g++: fatal error: no input files

Does GDB support shell scripts?

when in gdb, type 'run' and when/if it crashes, type 'bt' and paste the output of that command -- that is what dan is looking for. y.
…
On Fri, Nov 2, 2018 at 9:00 AM xiaowang @.***> wrote: I suggest to cd to src/, do "make depend -j 10" and "make -j 10" to minimize the chance of compilation errors, and try again. If that doesn't work, get it in gdb and show me a stack trace: gdb --args (program) (args), then "r", then "bt" when it crashes. E.g. gdb --args nnet3-latgen-faster --online-ivectors=scp:exp/n..... (gdb) r ... (gdb) bt @.***Super-Server:/kaldi-trunk1/egs/aishell/s5$ gdb --args nnet3-latgen-faster --online-ivectors=scp:exp/nnet3/ivectors_dev/ivector_online.scp --online-ivector-period=10 --frames-per-chunk=50 --extra-left-context=0 --extra-right-context=0 --extra-left-context-initial=-1 --extra-right-context-final=-1 --minimize=false --max-active=7000 --min-active=200 --beam=15.0 --lattice-beam=8.0 --acoustic-scale=0.1 --allow-partial=true --word-symbol-table=exp/tri5a/graph/words.txt exp/nnet3/tdnn_sp_2/final.mdl exp/tri5a/graph/HCLG.fst "ark,s,cs:apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/dev_hires/split40/17/utt2spk scp:data/dev_hires/split40/17/cmvn.scp scp:data/dev_hires/split40/17/feats.scp ark:- |" "ark:|gzip -c >exp/nnet3/tdnn_sp_2/decode_dev/lat.17.gz" GNU gdb (Ubuntu 7.11.1-0ubuntu116.5) 7.11.1 Copyright (C) 2016 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/. Find the GDB manual and other documentation resources online at: http://www.gnu.org/software/gdb/documentation/. For help, type "help". Type "apropos word" to search for commands related to "word"... nnet3-latgen-faster: 没有那个文件或目录. (gdb) Did not solve the problem, gdb will not use — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#1247 (comment)>, or mute the thread https://github.com/notifications/unsubscribe-auth/AKisX7N_Z9DlTPdBPKaDWFUOidT2kjtBks5urEHSgaJpZM4LDr60 .

yuyin@yuyin-Super-Server:~/kaldi-trunk1$ g++ -g -o nnet3-latgen-faster nnet3-latgen-faster.cc
g++: error: nnet3-latgen-faster.cc: 没有那个文件或目录
g++: fatal error: no input files

Does GDB support shell scripts?

jtrmal · 2018-11-02T13:22:35Z

I think you are confusing g++ and gdb.

13265170340 · 2018-11-02T13:34:45Z

I think you are confusing g++ and gdb.

I know dan, but I won't use gdb.

13265170340 · 2018-11-02T14:07:54Z

I suggest to cd to src/, do "make depend -j 10" and "make -j 10" to minimize the chance of compilation errors, and try again. If that doesn't work, get it in gdb and show me a stack trace: gdb --args (program) (args), then "r", then "bt" when it crashes. E.g.
gdb --args nnet3-latgen-faster --online-ivectors=scp:exp/n.....
(gdb) r
...
(gdb) bt

yuyin@yuyin-Super-Server:/kaldi-trunk1/src/nnet3bin$ gdb nnet3-latgen-faster
GNU gdb (Ubuntu 7.11.1-0ubuntu116.5) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:
http://www.gnu.org/software/gdb/documentation/.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from nnet3-latgen-faster...done.

(gdb) r --online-ivectors=scp:exp/nnet3/ivectors_dev/ivector_online.scp --online-ivector-period=10 --frames-per-chunk=50 --extra-left-context=0 --extra-right-context=0 --extra-left-context-initial=-1 --extra-right-context-final=-1 --minimize=false --max-active=7000 --min-active=200 --beam=15.0 --lattice-beam=8.0 --acoustic-scale=0.1 --allow-partial=true --word-symbol-table=exp/tri5a/graph/words.txt exp/nnet3/tdnn_sp_2/final.mdl exp/tri5a/graph/HCLG.fst "ark,s,cs:apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/dev_hires/split40/17/utt2spk scp:data/dev_hires/split40/17/cmvn.scp scp:data/dev_hires/split40/17/feats.scp ark:- |" "ark:|gzip -c >exp/nnet3/tdnn_sp_2/decode_dev/lat.17.gz"
Starting program: /home/yuyin/kaldi-trunk1/src/nnet3bin/nnet3-latgen-faster --online-ivectors=scp:exp/nnet3/ivectors_dev/ivector_online.scp --online-ivector-period=10 --frames-per-chunk=50 --extra-left-context=0 --extra-right-context=0 --extra-left-context-initial=-1 --extra-right-context-final=-1 --minimize=false --max-active=7000 --min-active=200 --beam=15.0 --lattice-beam=8.0 --acoustic-scale=0.1 --allow-partial=true --word-symbol-table=exp/tri5a/graph/words.txt exp/nnet3/tdnn_sp_2/final.mdl exp/tri5a/graph/HCLG.fst "ark,s,cs:apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/dev_hires/split40/17/utt2spk scp:data/dev_hires/split40/17/cmvn.scp scp:data/dev_hires/split40/17/feats.scp ark:- |" "ark:|gzip -c >exp/nnet3/tdnn_sp_2/decode_dev/lat.17.gz"
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
/home/yuyin/kaldi-trunk1/src/nnet3bin/nnet3-latgen-faster --online-ivectors=scp:exp/nnet3/ivectors_dev/ivector_online.scp --online-ivector-period=10 --frames-per-chunk=50 --extra-left-context=0 --extra-right-context=0 --extra-left-context-initial=-1 --extra-right-context-final=-1 --minimize=false --max-active=7000 --min-active=200 --beam=15.0 --lattice-beam=8.0 --acoustic-scale=0.1 --allow-partial=true --word-symbol-table=exp/tri5a/graph/words.txt exp/nnet3/tdnn_sp_2/final.mdl exp/tri5a/graph/HCLG.fst 'ark,s,cs:apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/dev_hires/split40/17/utt2spk scp:data/dev_hires/split40/17/cmvn.scp scp:data/dev_hires/split40/17/feats.scp ark:- |' 'ark:|gzip -c >exp/nnet3/tdnn_sp_2/decode_dev/lat.17.gz'
ERROR (nnet3-latgen-faster[5.5.88~3-8e30f]:Input():kaldi-io.cc:756) Error opening input stream exp/nnet3/tdnn_sp_2/final.mdl

[ Stack-Trace: ]
kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi::FatalMessageLogger::~FatalMessageLogger()
kaldi::Input::Input(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, bool*)
main
__libc_start_main
_start

ERROR (nnet3-latgen-faster[5.5.88~3-8e30f]:Input():kaldi-io.cc:756) Error opening input stream exp/nnet3/tdnn_sp_2/final.mdl

[ Stack-Trace: ]
kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi::MessageLogger::~MessageLogger()
kaldi::FatalMessageLogger::~FatalMessageLogger()
kaldi::Input::Input(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, bool*)
main
__libc_start_main
_start

13265170340 · 2018-11-02T14:10:10Z

Is the training file final.mdl wrong?

jtrmal · 2018-11-02T14:11:02Z

you are running it from a different directory, probably

danpovey · 2018-11-02T18:09:06Z

Please get someone local to help you. We are busy and we don't have time to deal with people who don't know basic things like how to use a debugger, and there must be people in your lab who know this stuff.

13265170340 · 2018-11-03T00:54:38Z

thank you.The problem has been solved because the previous model is not updated

13265170340 · 2018-11-09T01:50:20Z

I want to ask which papers are used in the dropout algorithm on kaldi.

danpovey · 2018-11-09T02:04:05Z

There are different forms available. If you are asking about the one used in the TDNN-F scripts, which is continuous and shared across time, look at my publications page, it may possibly be described in the paper on factorized TDNNs with Gaofeng Cheng as a co-author. There is also more conventional dropout. Dan

…

On Thu, Nov 8, 2018 at 8:50 PM xiaowang ***@***.***> wrote: I want to ask which papers are used in the dropout algorithm on kaldi. — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#1247 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADJVu58GP5QMU4SU5kYpitmHKaNi60dDks5utN9ggaJpZM4LDr60> .

13265170340 · 2018-11-09T02:19:11Z

There are different forms available. If you are asking about the one used in the TDNN-F scripts, which is continuous and shared across time, look at my publications page, it may possibly be described in the paper on factorized TDNNs with Gaofeng Cheng as a co-author. There is also more conventional dropout. Dan
…
On Thu, Nov 8, 2018 at 8:50 PM xiaowang @.***> wrote: I want to ask which papers are used in the dropout algorithm on kaldi. — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#1247 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADJVu58GP5QMU4SU5kYpitmHKaNi60dDks5utN9ggaJpZM4LD

There are different forms available. If you are asking about the one used in the TDNN-F scripts, which is continuous and shared across time, look at my publications page, it may possibly be described in the paper on factorized TDNNs with Gaofeng Cheng as a co-author. There is also more conventional dropout. Dan
…
On Thu, Nov 8, 2018 at 8:50 PM xiaowang @.***> wrote: I want to ask which papers are used in the dropout algorithm on kaldi. — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#1247 (comment)>, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu58GP5QMU4SU5kYpitmHKaNi60dDks5utN9ggaJpZM4LDr60 .

Yes, about TDNN.

vimalmanohar mentioned this issue Dec 5, 2016

Adding dropout schedule option to nnet3 #1248

Merged

danpovey closed this as completed Jan 28, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dropout schedule in nnet3 training scripts #1247

Dropout schedule in nnet3 training scripts #1247

danpovey commented Dec 4, 2016

GaofengCheng commented Dec 5, 2016

GaofengCheng commented Dec 7, 2016

danpovey commented Dec 7, 2016 via email

danpovey commented Dec 23, 2016

GaofengCheng commented Dec 23, 2016

danpovey commented Dec 23, 2016 via email

GaofengCheng commented Dec 23, 2016

vimalmanohar commented Dec 23, 2016 via email

13265170340 commented Nov 1, 2018

GaofengCheng commented Nov 1, 2018

13265170340 commented Nov 1, 2018

13265170340 commented Nov 1, 2018

danpovey commented Nov 1, 2018 via email

13265170340 commented Nov 2, 2018 •

edited by danpovey

Loading

danpovey commented Nov 2, 2018

13265170340 commented Nov 2, 2018

jtrmal commented Nov 2, 2018 via email

13265170340 commented Nov 2, 2018

jtrmal commented Nov 2, 2018

13265170340 commented Nov 2, 2018

13265170340 commented Nov 2, 2018

13265170340 commented Nov 2, 2018

jtrmal commented Nov 2, 2018

danpovey commented Nov 2, 2018

13265170340 commented Nov 3, 2018

13265170340 commented Nov 9, 2018

danpovey commented Nov 9, 2018 via email

13265170340 commented Nov 9, 2018

Dropout schedule in nnet3 training scripts #1247

Dropout schedule in nnet3 training scripts #1247

Comments

danpovey commented Dec 4, 2016

GaofengCheng commented Dec 5, 2016

GaofengCheng commented Dec 7, 2016

danpovey commented Dec 7, 2016 via email

danpovey commented Dec 23, 2016

GaofengCheng commented Dec 23, 2016

danpovey commented Dec 23, 2016 via email

GaofengCheng commented Dec 23, 2016

vimalmanohar commented Dec 23, 2016 via email

13265170340 commented Nov 1, 2018

GaofengCheng commented Nov 1, 2018

13265170340 commented Nov 1, 2018

13265170340 commented Nov 1, 2018

danpovey commented Nov 1, 2018 via email

13265170340 commented Nov 2, 2018 • edited by danpovey Loading

danpovey commented Nov 2, 2018

13265170340 commented Nov 2, 2018

jtrmal commented Nov 2, 2018 via email

13265170340 commented Nov 2, 2018

jtrmal commented Nov 2, 2018

13265170340 commented Nov 2, 2018

13265170340 commented Nov 2, 2018

13265170340 commented Nov 2, 2018

jtrmal commented Nov 2, 2018

danpovey commented Nov 2, 2018

13265170340 commented Nov 3, 2018

13265170340 commented Nov 9, 2018

danpovey commented Nov 9, 2018 via email

13265170340 commented Nov 9, 2018

13265170340 commented Nov 2, 2018 •

edited by danpovey

Loading