-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP Sequence training of nnet3 models #3
Conversation
determine the "pinch points". | ||
*/ | ||
void SplitDiscriminativeExample( | ||
const std::string &name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are multiple supervisions, then only the "name" object will be considered to identify the pinch points.
This is a difference from nnet2. The Excise function also has the same issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I know I told you we should support multiple supervision objects,
but on second thoughts, I think it makes sense to allow just one (and of
course store its name). In a multilingual setup, an utterance corresponds
to just one language.
In addition, after thinking about this a bit, I think we're going to have
to make some substantial changes from the 'nnet2' way of doing the
discriminative training. In nnet2 we split the lattices on pinch points,
and I think there was some way of padding them and stitching examples
together. But if we have recurrent architectures that see infinite
context, stitching examples together won't fly, and even padding at the
ends isn't quite right. Also, there is a big cost to using variable-length
egs, because of how the compilation works. So we will need to rely more on
fixed-length egs extracted from the lattice without regard to where the
pinch points lie. This is what I do in the 'chain' models. I use
fixed-length egs (1.5 seconds by default), and discard training utterances
shorter than this. (we can append training data at the data-dir level if
we're concerned about losing too many short utterances; @tomkocse already
wrote a script for this).
So we have extract fixed-length egs from the lattices. The edge effects
can be handled by using the 'forward' and 'backward' scores of the cut
points as the initial and final-probs. [you can of course renormalize
somehow so the best cost is zero.] Initial-probs can be simulated using
arc probabilities. In order to know which frames the acoustic scores
correspond to, the decoder will have to dump in the non-compact lattice
format (--determinize=false), and because this takes up a lot of disk, we
can eventually consider integrating the decoding with the initial phase of
egs-dumping. But for now probably best to just dump the lattices without
determinization.
The initial splitting up of lattices can decide on random fixed-length
pieces of lattice- use the 'SplitIntoRanges' function from the 'chain'
branch. The lattice splitting-up code will be similar to class
SupervisionSplitter in the 'chain' branch, except with more attention to
the initial and final costs. (To do this, in addition to computing the
lattice state times, you'll want to compute the lattice alpha and beta
scores).
Obviously this is a slightly bigger project than we thought, now. If you
don't have time, feel free to reconsider.
Dan
On Sat, Dec 19, 2015 at 10:24 AM, Vimal Manohar notifications@github.com
wrote:
In src/nnet3/nnet-discriminative-example.h
#3 (comment):
- int64 num_frames_kept_after_split;
- int32 longest_segment_after_split;
- int64 num_frames_kept_after_excise;
- int32 longest_segment_after_excise;
- SplitExampleStats() { memset(this, 0, sizeof(*this)); }
- void Print();
+};
+/** Split a "discriminative example" into multiple pieces,
- splitting where the lattice has "pinch points".
- Uses "name" as the supervision object that is used to
- determine the "pinch points".
- */
+void SplitDiscriminativeExample(- const std::string &name,
If there are multiple supervisions, then only the "name" object will be
considered to identify the pinch points.
This is a difference from nnet2. The Excise function also has the same
issue.—
Reply to this email directly or view it on GitHub
https://github.com/danpovey/kaldi/pull/3/files#r48095338.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some time to work on this project.
If we are not looking at pinch points and creating only fixed-length segments, then it should not be too difficult to have to support multiple supervision objects. This might be useful in some situations like training with MMI and CE objective. Microsoft had done this in one of their papers to fix some issues like getting large number of deletions and insertions, which we usually have.
If we use lattice forward and backward scores, would we need to update these scores during some of the training iterations since they change when the model gets updated?
Vimal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we would need different forward and backward scores for the different objectives, right? So each supervision object would be specific to a particular objective because the forward scores for MMI would be different from those for sMBR and MPE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we won't be updating the scores. This is a hassle to do and will
hardly change the results.
Just compute the MMI-type scores. The extra MPE-type scores will be set to
zero. The scores won't be specific to the objective.
Dan
On Sat, Dec 19, 2015 at 3:00 PM, Vimal Manohar notifications@github.com
wrote:
In src/nnet3/nnet-discriminative-example.h
#3 (comment):
- int64 num_frames_kept_after_split;
- int32 longest_segment_after_split;
- int64 num_frames_kept_after_excise;
- int32 longest_segment_after_excise;
- SplitExampleStats() { memset(this, 0, sizeof(*this)); }
- void Print();
+};
+/** Split a "discriminative example" into multiple pieces,
- splitting where the lattice has "pinch points".
- Uses "name" as the supervision object that is used to
- determine the "pinch points".
- */
+void SplitDiscriminativeExample(- const std::string &name,
Also, we would need different forward and backward scores for the
different objectives, right? So each supervision object would be specific
to a particular objective because the forward scores for MMI would be
different from those for sMBR and MPE.—
Reply to this email directly or view it on GitHub
https://github.com/danpovey/kaldi/pull/3/files#r48097595.
I moved some of the code thats common to chain and sequence training to chain/chain-utils.cc. |
Yes. Actually, later on we could investigate making them nonzero, but I |
There might be an issue when splitting lattices. Since a new state is added to accommodate the initial weights for the states, the length of path in the lattice will be 1 more than the number of frames. This will be a problem because we would have to add a dummy in the alignment. Otherwise the functions in lattice-functions.cc would not work. At what stage must this be handled? Should there be a variable in the supervision object to identify if the supervision object has undergone some splitting, in which case, a dummy can be added the alignment when necessary. |
There might be an issue when splitting lattices. Since a new state is added
|
Ok, I think it might work. I can do RmEpsilon after the lattice is split. |
OK, but before doing that, verify that it's even necessary, and let me know On Tue, Dec 22, 2015 at 3:52 PM, Vimal Manohar notifications@github.com
|
Epsilons are supported in lattice. The discrminative training functions use LatticeStateTimes to get the frame index for a state. So the path length in the lattice must match the number of frames in the alignment. |
Yes but LatticeStateTimes doesn't count epsilons when measuring the path On Tue, Dec 22, 2015 at 4:03 PM, Vimal Manohar notifications@github.com
|
Ok, I just checked the LatticeStateTimes. Its fine. It does not count the epsilon arcs. I don't need to do RmEpsilon then. |
I added all the discriminative training codes from nnet2 including the semi-supervised training stuff. I am now going to write the scripts to test them out. |
Great, thanks! On Sun, Dec 27, 2015 at 5:32 PM, Vimal Manohar notifications@github.com
|
I wrote the scripts and codes. I have some questions about the implementation:
|
Dan Reply to this email directly or view it on GitHub |
examples.back() = cur_eg; | ||
|
||
bool minibatch_ready = | ||
static_cast<int32>(examples.size()) >= minibatch_size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minibatch_size is measured in terms of number of examples rather than the number of output frames. This is the same as in the chain code. Is there a reason why this is preferred? What should be the minibatch_size if the examples are 1.5s long? The default in chain code was 64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally we like the number of sequences in the minibatch to be a power of
two, and preferably a multiple of 64 (for reasons relating to NVidia board
architecture). This is easier to ensure if we set it absolutely, not as a
number of frames.
Dan
On Wed, Jan 6, 2016 at 12:07 AM, Vimal Manohar notifications@github.com
wrote:
In src/nnet3bin/nnet3-discriminative-merge-egs.cc
#3 (comment):
examples_wspecifier = po.GetArg(2);
- SequentialNnetDiscriminativeExampleReader example_reader(examples_rspecifier);
- NnetDiscriminativeExampleWriter example_writer(examples_wspecifier);
- std::vector examples;
- examples.reserve(minibatch_size);
- int64 num_read = 0, num_written = 0;
- while (!example_reader.Done()) {
const NnetDiscriminativeExample &cur_eg = example_reader.Value();
examples.resize(examples.size() + 1);
examples.back() = cur_eg;
bool minibatch_ready =
static_cast<int32>(examples.size()) >= minibatch_size;
minibatch_size is measured in terms of number of examples rather than the
number of output frames. This is the same as in the chain code. Is there a
reason why this is preferred? What should be the minibatch_size if the
examples are 1.5s long? The default in chain code was 64.—
Reply to this email directly or view it on GitHub
https://github.com/danpovey/kaldi/pull/3/files#r48934967.
…fix memory-exhaustion issue found by Xiang Li
…ext from the network
…ed regularization
…s with discriminative training
changed beam to 11, commented online decoding block and added online decoding results to 6v_sp script.
added the option trainer.deriv-truncate-margin to train_rnn.py and tr…
* OCR: Add IAM corpus with unk decoding support (#3) * Add a new English OCR database 'UW3' * Some minor fixes re IAM corpus * Fix an issue in IAM chain recipes + add a new recipe (#6) * Some fixes based on the pull request review * Various fixes + cleaning on IAM * Fix LM estimation and add extended dictionary + other minor fixes * Add README for IAM * Add output filter for scoring * Fix a bug RE switch to pyhton3 * Add updated results + minor fixes * Remove unk decoding -- gives almost no gain * Add UW3 OCR database * Fix cmd.sh in IAM + fix usages of train/decode_cmd in chain recipes * Various minor fixes on UW3 * Rename iam/s5 to iam/v1 * Add README file for UW3 * Various cosmetic fixes on UW3 scripts * Minor fixes in IAM
* OCR: Add IAM corpus with unk decoding support (#3) * Add a new English OCR database 'UW3' * Some minor fixes re IAM corpus * Fix an issue in IAM chain recipes + add a new recipe (#6) * Some fixes based on the pull request review * Various fixes + cleaning on IAM * Fix LM estimation and add extended dictionary + other minor fixes * Add README for IAM * Add output filter for scoring * Fix a bug RE switch to pyhton3 * Add updated results + minor fixes * Remove unk decoding -- gives almost no gain * Add UW3 OCR database * Fix cmd.sh in IAM + fix usages of train/decode_cmd in chain recipes * Various minor fixes on UW3 * Rename iam/s5 to iam/v1 * Add README file for UW3 * Various cosmetic fixes on UW3 scripts * Minor fixes in IAM
No description provided.