Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding dropout schedule option to nnet3 #1248

Merged
merged 32 commits into from
Jan 23, 2017

Conversation

vimalmanohar
Copy link
Contributor

This PR is a work in progress implementing dropout schedule in nnet3 training. See issue #1247.

@vimalmanohar vimalmanohar force-pushed the dropout_schedule branch 3 times, most recently from 9571832 to d055533 Compare December 6, 2016 19:06
@vimalmanohar
Copy link
Contributor Author

@GaofengCheng @pegahgh Please review this PR and test this in your experiments.

@pegahgh
Copy link
Contributor

pegahgh commented Dec 7, 2016 via email

@GaofengCheng
Copy link
Contributor

@vimalmanohar Thanks Vimal, I'm preparing to PR tensorflow+kaldi right now, after this I will continue my experiments based on your PR and review.

Conflicts:
	egs/wsj/s5/steps/libs/nnet3/train/common.py

logger.info("On iteration {0}, learning rate is {1}"
"{dropout_info}{shrink_info}.".format(
iter, learning_rate(iter, num_jobs,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change this to learning_rate!

dropout_proportions, raw_model_string)
dropout_info_str = ', {0}'.format(", ".join(dropout_info))

shrink_info_str = ' and shrink value is {0}'.format(shrinkage_value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated, but can we just not print the shrink info if it's 1.0?

@pegahgh
Copy link
Contributor

pegahgh commented Dec 9, 2016

I am testing new dropout schedule method on one of my old models with dropout component and it is still running and the training looks fine.

@@ -530,6 +731,30 @@ def __init__(self):
Note: we implemented it in such a way that it
doesn't increase the effective learning
rate.""")
self.parser.add_argument("--trainer.dropout-schedule", type=str,
dest='dropout_schedule', default='',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default value here better be None, when run steps/nnet3/chain/train.py, your dropout default type is None:

    if args.dropout_schedule is not None:
        dropout_schedule = common_train_lib.parse_dropout_option(
            num_archives_to_process, args.dropout_schedule)

not null string, this will conflict.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed this.

@GaofengCheng
Copy link
Contributor

@vimalmanohar in the old kaldi version, we'd remove dropout components before the max_combine_iter, do you do this under the xconfigs? or just adding dropout all the way to the end of training?

@danpovey
Copy link
Contributor

danpovey commented Dec 12, 2016 via email

@GaofengCheng
Copy link
Contributor

@danpovey I'm testing on TDNN+LSTM AMI sdm, hoping to get good results

@GaofengCheng
Copy link
Contributor

@vijayaditya @danpovey
AMI SMD1 TDNN+LSTM with frame dropout, schedule:0,0@0.20,0.5@0.50,0@0.50,0, I have revised some scripts to simulate my earlier experiments:
results as follow:
%WER 36.7 | 14655 94484 | 66.8 17.7 15.6 3.5 36.7 64.2 | 0.642 | exp/sdm1/chain_cleaned/tdnn_lstm1i_4epoch_dp_test21_sp_bi_ihmali_ld5/decode_dev/ascore_10/dev_hires_o4.ctm.filt.sys
%WER 39.9 | 14069 89978 | 63.9 21.0 15.1 3.8 39.9 63.5 | 0.630 | exp/sdm1/chain_cleaned/tdnn_lstm1i_4epoch_dp_test21_sp_bi_ihmali_ld5/decode_eval/ascore_9/eval_hires_o4.ctm.filt.sys
``
exp/sdm1/chain_cleaned/tdnn_lstm1i_4epoch_dp_test21_sp_bi_ihmali_ld5: num-iters=87 nj=2..12 num-params=43.4M dim=40+100->3741 combine=-0.15->-0.13 xent:train/valid[57,86,final]=(-2.33,-1.55,-1.54/-2.52,-2.11,-2.10) logprob:train/valid[57,86,final]=(-0.218,-0.123,-0.118/-0.276,-0.248,-0.246)
baseline: `37.6 40.9` from RESULTS_SDM

@vijayaditya
Copy link
Contributor

vijayaditya commented Dec 14, 2016 via email

"order of data fractions.", value_x_pair)
raise ValueError

dropout_values.append(num_archives, float(dropout_proportion))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vimalmanohar there should be dropout_values.append((num_archives, float(dropout_proportion)))

@danpovey
Copy link
Contributor

danpovey commented Jan 8, 2017

I'd like to get the parts of this dropout stuff that we know we'll want, checked in, while leaving the uncertain parts till later.
Vimal, this means that I want to check in your code and script changes that allow a dropout schedule to be set (but not the example script or the changes to lstm.py).
@GaofengCheng, can you please submit a separate PR with just your changes to the DropoutComponent, that enable the per-frame dropout? Let's not delay on this.

value_x_pair = parts[i].split('@')
if len(value_x_pair) == 1:
# Dropout proportion at half of training
dropout_proportion = float(value_x_pair)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change this into dropout_proportion = float(value_x_pair[0]) to avoid crash

in the option.

Arguments:
dropout_option: may have different rules for different component-name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are starting with the most complex case here and not describing the most simple, common case where there are no names.
This documentation probably requires an example.

Copy link
Contributor

@danpovey danpovey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is about the python code for the dropout...
Right now it's rather unclear and I think the organization could be improved.

In the top-level training function you know how many archives are to be processed, and you know how many archives have been processed.
You could define a function like this:

def get_dropout_edit_string(proportion_processed,
                     dropout_opt):
   """This function returns a command that will (as part
        of a pipe) edit a raw nnet3 model to set the
       dropout values appropriately.  If dropout_opt is
       empty it will return the empty string.  Otherwise
       it will return a string like:
       "| nnet3-copy --edits='...'", where the ... contains
       one or more commands of type set-dropout-proportion
        that set the dropout proportion for different components.  
        Please see documentation for --trainer.dropout-schedule for more info."""

Then all the top-level trainer code has to do is to call get_dropout_edit_string(num_archives_processed * 1.0 / num_archives_to_process, args.dropout_schedule)
and pass the resulting string into train_one_iteration.
That substantially simplifies the code that has to interact with the dropout schedule, and it means that there are no hard-to-explain interfaces (which your code currently has).
The script has to parse it each time but it doesn't matter, speed is not a concern here.

@danpovey
Copy link
Contributor

@vimalmanohar, I'm sure you're busy but don't lose track of this issue... also there was another comment in another thread needing your attention, search your email for "deprecated"; and of course some comments to address in the PR on discriminative training [RE deleting models]... I know this is in addition to your work on diarization etc....
Re the issue here about refactoring the dropout, it may be possible to reuse most of your existing code, just refactor a little.

@vimalmanohar
Copy link
Contributor Author

vimalmanohar commented Jan 17, 2017 via email

@danpovey
Copy link
Contributor

danpovey commented Jan 17, 2017 via email

@danpovey
Copy link
Contributor

danpovey commented Jan 21, 2017

Vimal, I see that you've changed the code, but it still isn't the way I was asking for it to be.
The problem is that you are still exposing to the rest of the program this "dropout_proportions" quantity, which you are not documenting. And just documenting it won't completely solve the issue which is that the way you implemented it has a large, messy interface and not a small, simple interface. You may be concerned about efficiency, but efficiency is not an issue here.

What I was asking for is that if you just pass around the dropout option itself, and have an interface that says "here's the dropout option (maybe an empty string) and here's the proportion of data that we saw; give us the command to edit the nnet (or maybe an empty string)". Then the interface is just one function taking a string and a float, and returning one string; and as far as the rest of the program is concerned there is nothing else that needs to be documented.
You can even use your current functions if you want, just make them internal and don't expose them to the rest of the program. [But in any case you should always document the return-type of functions, and provide examples.]

@danpovey
Copy link
Contributor

... also, I notice that this PR contains an example script, and its naming is not right (and it may depend on other stuff that's not in this PR, like per-frame dropout). I'd prefer to merge your changes RE dropout-proportion first, and later worry about Gaofeng's changes.

@danpovey
Copy link
Contributor

Actually, I think it would be helpful to put all the internal functions needed for this in a separate module, say dropout_schedule.py, and then have just the top-level function be imported [and hence re-exported, I imagine] by common_lib.py, e.g.

from dropout_scheule.py import get_dropout_command

this will stop common_lib.py from blowing up in size too much.

Copy link
Contributor Author

@vimalmanohar vimalmanohar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the suggested changes.

@@ -17,12 +17,14 @@
import shutil

import libs.common as common_lib
import libs.nnet3.train.dropout_schedule as dropout_schedule
from dropout_schedule import *
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better if you just imported get_dropout_edit_string, because that's the only function we need from there, and if you just import the one function it's clear that that's the only one that's the real interface. You could rename all the others with underscores at the start of their names (assuming they really are internal to the module and assuming that's what the Google style guide recommends in such circumstances).

Copy link
Contributor

@danpovey danpovey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this, I know you have many things to do... just some small comments to address and then we'll be good to go.

"""Returns dropout proportions based on the dropout_schedule for the
fraction of data seen at this stage of training.
Returns None if dropout_schedule is None.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please give a couple of examples of what this function might return for different inputs, covering different types of input? e.g. (and this will be wrong):

e.g.:
 _get_dropout_proportions('0.0,0.5,0.0', 0.75) = [ ('*', 0.75) ]
 _get_dropout_proportions('*=0.0,0.5,0.0,lstm.*=0.0,0.3@0.75,0.0', 0.75) = \
          [ ('*', 0.75), ('lstm.*', 0.3) ]

IMO it's always a good idea for this type of code to give such examples, it will
make maintenance much easier. Please give examples for other functions in this
module; and remember to cover trivial cases such as where the input is the
empty string; there may be situations where 3 or 4 examples are needed to
demonstrate the function's range of behavior (but of course we'll
assume the reader is smart enough to extrapolate things).

Copy link
Contributor

@danpovey danpovey Jan 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... actually, here's an idea (this is similar to something I did in the xconfig code),
How about having a function called _self_test(), that will actually test all of these examples, e.g.

def _self_test():
      assert  _get_dropout_proportions('*=0.0,0.5,0.0,lstm.*=0.0,0.3@0.75,0.0', 0.75) == \
          [ ('*', 0.75), ('lstm.*', 0.3) ]

and have it called directly from __main__ so we can check that it works.
Then the documentation for the function can just say 'see _self_test() for examples'.
That way we will have confidence that the examples are actually correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I added self_test. Should it be called every time the module is imported on only when run?

if (dropout > 0)
SetDropoutProportion(dropout, &nnet);
if (dropout >= 0)
KALDI_ERR << "--dropout option is deprecated. "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just remove the dropout option from the code; it's not used in any example scripts. Only people here at JHU were probably using it and they can easily ask around or check the git history.

@danpovey
Copy link
Contributor

danpovey commented Jan 23, 2017 via email

@danpovey danpovey changed the title WIP: Adding dropout schedule option to nnet3 Adding dropout schedule option to nnet3 Jan 23, 2017
@danpovey danpovey merged commit 0440417 into kaldi-asr:master Jan 23, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants