-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding dropout schedule option to nnet3 #1248
Conversation
9571832
to
d055533
Compare
d055533
to
8d26ce0
Compare
@GaofengCheng @pegahgh Please review this PR and test this in your experiments. |
Thanks.
I will test it now.
…On Wed, Dec 7, 2016 at 1:47 PM, Vimal Manohar ***@***.***> wrote:
@GaofengCheng <https://github.com/GaofengCheng> @pegahgh
<https://github.com/pegahgh> Please review this PR and test this in your
experiments.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1248 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AI-PtDdRGEjFvzxdIBoTsyYNEl78rZWcks5rFv9QgaJpZM4LEkLv>
.
|
@vimalmanohar Thanks Vimal, I'm preparing to PR tensorflow+kaldi right now, after this I will continue my experiments based on your PR and review. |
Conflicts: egs/wsj/s5/steps/libs/nnet3/train/common.py
|
||
logger.info("On iteration {0}, learning rate is {1}" | ||
"{dropout_info}{shrink_info}.".format( | ||
iter, learning_rate(iter, num_jobs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change this to learning_rate!
dropout_proportions, raw_model_string) | ||
dropout_info_str = ', {0}'.format(", ".join(dropout_info)) | ||
|
||
shrink_info_str = ' and shrink value is {0}'.format(shrinkage_value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unrelated, but can we just not print the shrink info if it's 1.0?
I am testing new dropout schedule method on one of my old models with dropout component and it is still running and the training looks fine. |
@@ -530,6 +731,30 @@ def __init__(self): | |||
Note: we implemented it in such a way that it | |||
doesn't increase the effective learning | |||
rate.""") | |||
self.parser.add_argument("--trainer.dropout-schedule", type=str, | |||
dest='dropout_schedule', default='', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default value here better be None, when run steps/nnet3/chain/train.py
, your dropout default type is None:
if args.dropout_schedule is not None:
dropout_schedule = common_train_lib.parse_dropout_option(
num_archives_to_process, args.dropout_schedule)
not null string, this will conflict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed this.
@vimalmanohar in the old kaldi version, we'd remove dropout components before the max_combine_iter, do you do this under the xconfigs? or just adding dropout all the way to the end of training? |
We don't do that.. it is possible, but I'd have to think about how to do it
the right way. It's not really important since the dropout scale is zero
by the end of training in the setups we envisage.
…On Sun, Dec 11, 2016 at 6:54 PM, Gaofeng Cheng ***@***.***> wrote:
@vimalmanohar <https://github.com/vimalmanohar> in the old kaldi version,
we'd remove dropout components before the max_combine_iter, do you do this
under the xconfigs? or just adding dropout all the way to the end of
training?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1248 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ADJVu6Ppv7bIyJ9erlYqvzk7cXz3pfOBks5rHLd4gaJpZM4LEkLv>
.
|
@danpovey I'm testing on TDNN+LSTM AMI sdm, hoping to get good results |
@vijayaditya @danpovey |
Nice results. Any idea how it compares to element wise dropout from before ?
Vijay
…On Dec 14, 2016 05:45, "Gaofeng Cheng" ***@***.***> wrote:
@vijayaditya <https://github.com/vijayaditya> @danpovey
<https://github.com/danpovey>
AMI SMD1 TDNN+LSTM with frame dropout, ***@***.******@***.******@***.***,0,
I have revised some scripts to simulate my earlier experiments:
results as follow:
%WER 36.7 | 14655 94484 | 66.8 17.7 15.6 3.5 36.7 64.2 | 0.642 |
exp/sdm1/chain_cleaned/tdnn_lstm1i_4epoch_dp_test21_sp_bi_
ihmali_ld5/decode_dev/ascore_10/dev_hires_o4.ctm.filt.sys
%WER 39.9 | 14069 89978 | 63.9 21.0 15.1 3.8 39.9 63.5 | 0.630 |
exp/sdm1/chain_cleaned/tdnn_lstm1i_4epoch_dp_test21_sp_bi_
ihmali_ld5/decode_eval/ascore_9/eval_hires_o4.ctm.filt.sys
``
exp/sdm1/chain_cleaned/tdnn_lstm1i_4epoch_dp_test21_sp_bi_ihmali_ld5:
num-iters=87 nj=2..12 num-params=43.4M dim=40+100->3741
combine=-0.15->-0.13 xent:train/valid[57,86,final]=
(-2.33,-1.55,-1.54/-2.52,-2.11,-2.10) logprob:train/valid[57,86,
final]=(-0.218,-0.123,-0.118/-0.276,-0.248,-0.246)
baseline: `37.6 40.9` from RESULTS_SDM
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1248 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ADtwoKOO3KWaXEPf5IW_6sECdiX8Cktgks5rH_LfgaJpZM4LEkLv>
.
|
"order of data fractions.", value_x_pair) | ||
raise ValueError | ||
|
||
dropout_values.append(num_archives, float(dropout_proportion)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vimalmanohar there should be dropout_values.append((num_archives, float(dropout_proportion)))
I'd like to get the parts of this dropout stuff that we know we'll want, checked in, while leaving the uncertain parts till later. |
value_x_pair = parts[i].split('@') | ||
if len(value_x_pair) == 1: | ||
# Dropout proportion at half of training | ||
dropout_proportion = float(value_x_pair) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change this into dropout_proportion = float(value_x_pair[0])
to avoid crash
in the option. | ||
|
||
Arguments: | ||
dropout_option: may have different rules for different component-name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are starting with the most complex case here and not describing the most simple, common case where there are no names.
This documentation probably requires an example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is about the python code for the dropout...
Right now it's rather unclear and I think the organization could be improved.
In the top-level training function you know how many archives are to be processed, and you know how many archives have been processed.
You could define a function like this:
def get_dropout_edit_string(proportion_processed,
dropout_opt):
"""This function returns a command that will (as part
of a pipe) edit a raw nnet3 model to set the
dropout values appropriately. If dropout_opt is
empty it will return the empty string. Otherwise
it will return a string like:
"| nnet3-copy --edits='...'", where the ... contains
one or more commands of type set-dropout-proportion
that set the dropout proportion for different components.
Please see documentation for --trainer.dropout-schedule for more info."""
Then all the top-level trainer code has to do is to call get_dropout_edit_string(num_archives_processed * 1.0 / num_archives_to_process, args.dropout_schedule)
and pass the resulting string into train_one_iteration.
That substantially simplifies the code that has to interact with the dropout schedule, and it means that there are no hard-to-explain interfaces (which your code currently has).
The script has to parse it each time but it doesn't matter, speed is not a concern here.
@vimalmanohar, I'm sure you're busy but don't lose track of this issue... also there was another comment in another thread needing your attention, search your email for "deprecated"; and of course some comments to address in the PR on discriminative training [RE deleting models]... I know this is in addition to your work on diarization etc.... |
get_dropout_edit_string will have to go through what these functions do
anyways, i.e. call all these functions to parse the pattern-function pairs,
parse each function for each pattern and then get the dropout value for a
particular fraction. It can be simplified a little maybe if the functions
are modified to use the data proportions directly instead of requiring a
num_archives argument.
On Sat, Jan 14, 2017 at 1:13 AM Daniel Povey ***@***.***> wrote:
***@***.**** commented on this pull request.
This is about the python code for the dropout...
Right now it's rather unclear and I think the organization could be
improved.
In the top-level training function you know how many archives are to be
processed, and you know how many archives have been processed.
You could define a function like this:
def get_dropout_edit_string(proportion_processed,
dropout_opt):
"""This function returns a command that will (as part
of a pipe) edit a raw nnet3 model to set the
dropout values appropriately. If dropout_opt is
empty it will return the empty string. Otherwise
it will return a string like:
"| nnet3-copy --edits='...'", where the ... contains
one or more commands of type set-dropout-proportion
that set the dropout proportion for different components.
Please see documentation for --trainer.dropout-schedule for more info."""
Then all the top-level trainer code has to do is to call
get_dropout_edit_string(num_archives_processed * 1.0 /
num_archives_to_process, args.dropout_schedule)
and pass the resulting string into train_one_iteration.
That substantially simplifies the code that has to interact with the
dropout schedule, and it means that there are no hard-to-explain interfaces
(which your code currently has).
The script has to parse it each time but it doesn't matter, speed is not a
concern here.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1248 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEATVx8ak2_G0-TxTi4TVJqIXIBQBNQOks5rSGdsgaJpZM4LEkLv>
.
--
Vimal Manohar
PhD Student
Electrical & Computer Engineering
Johns Hopkins University
|
Yes, the only real simplification in terms of the quantity of code is to
use the proportions directly
But the point is that you are no longer exposing a complex and
hard-to-document interface to the rest of the program, and there is no
longer a need to document so clearly what all those quantities are, since
they are segregated behind a single simpler interface. They then behave
more like the internal workings of a function. They should still be
documented to some extent, of course.
On Tue, Jan 17, 2017 at 1:19 PM, Vimal Manohar <notifications@github.com>
wrote:
… get_dropout_edit_string will have to go through what these functions do
anyways, i.e. call all these functions to parse the pattern-function pairs,
parse each function for each pattern and then get the dropout value for a
particular fraction. It can be simplified a little maybe if the functions
are modified to use the data proportions directly instead of requiring a
num_archives argument.
On Sat, Jan 14, 2017 at 1:13 AM Daniel Povey ***@***.***>
wrote:
> ***@***.**** commented on this pull request.
>
> This is about the python code for the dropout...
> Right now it's rather unclear and I think the organization could be
> improved.
>
> In the top-level training function you know how many archives are to be
> processed, and you know how many archives have been processed.
> You could define a function like this:
>
> def get_dropout_edit_string(proportion_processed,
> dropout_opt):
> """This function returns a command that will (as part
> of a pipe) edit a raw nnet3 model to set the
> dropout values appropriately. If dropout_opt is
> empty it will return the empty string. Otherwise
> it will return a string like:
> "| nnet3-copy --edits='...'", where the ... contains
> one or more commands of type set-dropout-proportion
> that set the dropout proportion for different components.
> Please see documentation for --trainer.dropout-schedule for more info."""
>
> Then all the top-level trainer code has to do is to call
> get_dropout_edit_string(num_archives_processed * 1.0 /
> num_archives_to_process, args.dropout_schedule)
> and pass the resulting string into train_one_iteration.
> That substantially simplifies the code that has to interact with the
> dropout schedule, and it means that there are no hard-to-explain
interfaces
> (which your code currently has).
> The script has to parse it each time but it doesn't matter, speed is not
a
> concern here.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#1248 (review)
>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AEATVx8ak2_G0-
TxTi4TVJqIXIBQBNQOks5rSGdsgaJpZM4LEkLv>
> .
>
--
Vimal Manohar
PhD Student
Electrical & Computer Engineering
Johns Hopkins University
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1248 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ADJVu5H1oDYK8ui5MB0thmabLCm9Wvjbks5rTQYygaJpZM4LEkLv>
.
|
Vimal, I see that you've changed the code, but it still isn't the way I was asking for it to be. What I was asking for is that if you just pass around the dropout option itself, and have an interface that says "here's the dropout option (maybe an empty string) and here's the proportion of data that we saw; give us the command to edit the nnet (or maybe an empty string)". Then the interface is just one function taking a string and a float, and returning one string; and as far as the rest of the program is concerned there is nothing else that needs to be documented. |
... also, I notice that this PR contains an example script, and its naming is not right (and it may depend on other stuff that's not in this PR, like per-frame dropout). I'd prefer to merge your changes RE dropout-proportion first, and later worry about Gaofeng's changes. |
Actually, I think it would be helpful to put all the internal functions needed for this in a separate module, say dropout_schedule.py, and then have just the top-level function be imported [and hence re-exported, I imagine] by common_lib.py, e.g. from dropout_scheule.py import get_dropout_command this will stop common_lib.py from blowing up in size too much. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made the suggested changes.
@@ -17,12 +17,14 @@ | |||
import shutil | |||
|
|||
import libs.common as common_lib | |||
import libs.nnet3.train.dropout_schedule as dropout_schedule | |||
from dropout_schedule import * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better if you just imported get_dropout_edit_string, because that's the only function we need from there, and if you just import the one function it's clear that that's the only one that's the real interface. You could rename all the others with underscores at the start of their names (assuming they really are internal to the module and assuming that's what the Google style guide recommends in such circumstances).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this, I know you have many things to do... just some small comments to address and then we'll be good to go.
"""Returns dropout proportions based on the dropout_schedule for the | ||
fraction of data seen at this stage of training. | ||
Returns None if dropout_schedule is None. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please give a couple of examples of what this function might return for different inputs, covering different types of input? e.g. (and this will be wrong):
e.g.:
_get_dropout_proportions('0.0,0.5,0.0', 0.75) = [ ('*', 0.75) ]
_get_dropout_proportions('*=0.0,0.5,0.0,lstm.*=0.0,0.3@0.75,0.0', 0.75) = \
[ ('*', 0.75), ('lstm.*', 0.3) ]
IMO it's always a good idea for this type of code to give such examples, it will
make maintenance much easier. Please give examples for other functions in this
module; and remember to cover trivial cases such as where the input is the
empty string; there may be situations where 3 or 4 examples are needed to
demonstrate the function's range of behavior (but of course we'll
assume the reader is smart enough to extrapolate things).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... actually, here's an idea (this is similar to something I did in the xconfig code),
How about having a function called _self_test(), that will actually test all of these examples, e.g.
def _self_test():
assert _get_dropout_proportions('*=0.0,0.5,0.0,lstm.*=0.0,0.3@0.75,0.0', 0.75) == \
[ ('*', 0.75), ('lstm.*', 0.3) ]
and have it called directly from __main__
so we can check that it works.
Then the documentation for the function can just say 'see _self_test() for examples'.
That way we will have confidence that the examples are actually correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I added self_test. Should it be called every time the module is imported on only when run?
if (dropout > 0) | ||
SetDropoutProportion(dropout, &nnet); | ||
if (dropout >= 0) | ||
KALDI_ERR << "--dropout option is deprecated. " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just remove the dropout option from the code; it's not used in any example scripts. Only people here at JHU were probably using it and they can easily ask around or check the git history.
Only when run from the command line as a standalone script.
…On Sun, Jan 22, 2017 at 10:24 PM, Vimal Manohar ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In egs/wsj/s5/steps/libs/nnet3/train/dropout_schedule.py
<#1248>:
> + return initial_dropout
+
+ assert (data_fraction >= initial_data_fraction
+ and data_fraction < final_data_fraction)
+
+ return ((data_fraction - initial_data_fraction)
+ * (final_dropout - initial_dropout)
+ / (final_data_fraction - initial_data_fraction)
+ + initial_dropout)
+
+
+def _get_dropout_proportions(dropout_schedule, data_fraction):
+ """Returns dropout proportions based on the dropout_schedule for the
+ fraction of data seen at this stage of training.
+ Returns None if dropout_schedule is None.
+
Ok I added self_test. Should it be called every time the module is
imported on only when run?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1248>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADJVuytEBIKct1lbS3E7bxmD7jToZMRfks5rVB1VgaJpZM4LEkLv>
.
|
This PR is a work in progress implementing dropout schedule in nnet3 training. See issue #1247.