Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-use all weights when re-training on same dataset #538

Merged
merged 1 commit into from
Jan 22, 2016

Conversation

gheinrich
Copy link
Contributor

Current behaviour:
When a classification model is being re-trained the last InnerProduct layer is renamed to allow training on another dataset (with a possibly different number of classes).
This implies that the weights of that last layer are discarded.

Problem:
If the model is being re-trained on the same dataset then initially the model will perform badly as it needs to re-learn the weights of its last InnerProduct layer.

Solution:
Refrain from renaming the last InnerProduct layer if the model is being re-trained on the same dataset

Note:
This applies to Caffe only.

closes #533

@@ -48,7 +48,7 @@ def get_network_from_desc(self, network_desc):
"""
raise NotImplementedError('Please implement me')

def get_network_from_previous(self, previous_network):
def get_network_from_previous(self, previous_network, useSameDataset):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems weird to have previous_network (with an underscore) right next to useSameDataset (using camelCase).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yes, I'll change that

@lukeyeager
Copy link
Member

Looks like it works to me. Thanks for the fix!

Current behaviour:
When a classification model is being re-trained the last `InnerProduct` layer is renamed to allow training on another dataset (with a possibly different number of classes).
This implies that the weights of that last layer are discarded.

Problem:
If the model is being re-trained on the same dataset then initially the model will perform badly as it needs to re-learn the weights of its last `InnerProduct` layer.

Solution:
Refrain from renaming the last `InnerProduct` layer if the model is being re-trained on the same dataset

Note:
This applies to Caffe only.

closes NVIDIA#533
@gheinrich
Copy link
Contributor Author

Updated with underscore naming convention

lukeyeager added a commit that referenced this pull request Jan 22, 2016
Re-use all weights when re-training on same dataset
@lukeyeager lukeyeager merged commit dc9f827 into NVIDIA:master Jan 22, 2016
@mfernezir
Copy link

Hi!

@gheinrich @lukeyeager there are still a couple of issues with this solution:

1.) There's another use case where one would not want to lose weights from the last layer: continuing training on a larger dataset after pretraining and experimenting on a smaller sample from the same training distribution of images. To be able to do that for my current task, I manually redefined the use_same_dataset variable to True in two Python files changed in this commit.

2.) Not all networks have only one inner product classification layer. In my case, I'm using GoogLeNet which has 3 outputs. At the moment, what happens is that the final classification layer gets erased when loading another dataset but two earlier ones don't.

Perhaps it would be better to always preserve pretrained weights and leave it to the user to manually rename layers that he wants trained from scratch?

@lukeyeager
Copy link
Member

Good point @mfernezir - we don't have a good general-purpose solution yet. It's a tricky problem because we'd like to satisfy a bunch of conditions:

  • Automatically rename and resize output layer[s] to match the number of classes
  • Handle multiple output layers
  • Handle output layer types which are not InnerProduct layers
  • Use a similar approach for both Torch and Caffe

Since you can't have variables in a .prototxt file, I've considered adding some keywords to Caffe .prototxt files which can be search/replaced before processing with Caffe, like this:

inner_product_param {
  num_output: NUM_CLASSES
}

But I'd want a solution that enables Caffe to use the original .prototxt file without alteration, too. Maybe something in a comment like this?

inner_product_param {
  #replace with NUM_CLASSES#
  num_output: 1000
}

@mfernezir
Copy link

I'm not sure what would be best. In any case, I think that between conflicting design objectives transparency and getting expected behavior should be more important than achieving automation and convenience. One simple option would be the following:

  1. Don't bother with automatic accommodation to different number of classes. It's not hard to add num_output wherever it is needed, if one knows that's required. Comments in standard nets definitions and documenting the requirement would suffice.
  2. Emphasize Caffe requirement to rename all layers which are to be retrained when loading weights from a previous model. There is already a disclaimer in place that one should know how finetuning is done in Caffe before using the feature; it is more troublesome to unexpectedly get some weights preserved and some erased.

@gheinrich gheinrich deleted the dev/bug-533 branch May 24, 2016 11:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Loading the weights from old model and continuing training apparently broken
3 participants