Re-use all weights when re-training on same dataset #538

gheinrich · 2016-01-22T15:38:59Z

Current behaviour:
When a classification model is being re-trained the last InnerProduct layer is renamed to allow training on another dataset (with a possibly different number of classes).
This implies that the weights of that last layer are discarded.

Problem:
If the model is being re-trained on the same dataset then initially the model will perform badly as it needs to re-learn the weights of its last InnerProduct layer.

Solution:
Refrain from renaming the last InnerProduct layer if the model is being re-trained on the same dataset

Note:
This applies to Caffe only.

closes #533

lukeyeager · 2016-01-22T17:47:43Z

digits/frameworks/framework.py

@@ -48,7 +48,7 @@ def get_network_from_desc(self, network_desc):
        """
        raise NotImplementedError('Please implement me')

-    def get_network_from_previous(self, previous_network):
+    def get_network_from_previous(self, previous_network, useSameDataset):


Seems weird to have previous_network (with an underscore) right next to useSameDataset (using camelCase).

oh yes, I'll change that

lukeyeager · 2016-01-22T17:53:32Z

Looks like it works to me. Thanks for the fix!

Current behaviour: When a classification model is being re-trained the last `InnerProduct` layer is renamed to allow training on another dataset (with a possibly different number of classes). This implies that the weights of that last layer are discarded. Problem: If the model is being re-trained on the same dataset then initially the model will perform badly as it needs to re-learn the weights of its last `InnerProduct` layer. Solution: Refrain from renaming the last `InnerProduct` layer if the model is being re-trained on the same dataset Note: This applies to Caffe only. closes NVIDIA#533

gheinrich · 2016-01-22T18:30:37Z

Updated with underscore naming convention

Re-use all weights when re-training on same dataset

mfernezir · 2016-02-06T23:05:26Z

Hi!

@gheinrich @lukeyeager there are still a couple of issues with this solution:

1.) There's another use case where one would not want to lose weights from the last layer: continuing training on a larger dataset after pretraining and experimenting on a smaller sample from the same training distribution of images. To be able to do that for my current task, I manually redefined the use_same_dataset variable to True in two Python files changed in this commit.

2.) Not all networks have only one inner product classification layer. In my case, I'm using GoogLeNet which has 3 outputs. At the moment, what happens is that the final classification layer gets erased when loading another dataset but two earlier ones don't.

Perhaps it would be better to always preserve pretrained weights and leave it to the user to manually rename layers that he wants trained from scratch?

lukeyeager · 2016-02-08T18:15:37Z

Good point @mfernezir - we don't have a good general-purpose solution yet. It's a tricky problem because we'd like to satisfy a bunch of conditions:

Automatically rename and resize output layer[s] to match the number of classes
Handle multiple output layers
Handle output layer types which are not InnerProduct layers
Use a similar approach for both Torch and Caffe

Since you can't have variables in a .prototxt file, I've considered adding some keywords to Caffe .prototxt files which can be search/replaced before processing with Caffe, like this:

inner_product_param {
  num_output: NUM_CLASSES
}

But I'd want a solution that enables Caffe to use the original .prototxt file without alteration, too. Maybe something in a comment like this?

inner_product_param {
  #replace with NUM_CLASSES#
  num_output: 1000
}

mfernezir · 2016-02-08T22:53:30Z

I'm not sure what would be best. In any case, I think that between conflicting design objectives transparency and getting expected behavior should be more important than achieving automation and convenience. One simple option would be the following:

Don't bother with automatic accommodation to different number of classes. It's not hard to add num_output wherever it is needed, if one knows that's required. Comments in standard nets definitions and documenting the requirement would suffice.
Emphasize Caffe requirement to rename all layers which are to be retrained when loading weights from a previous model. There is already a disclaimer in place that one should know how finetuning is done in Caffe before using the feature; it is more troublesome to unexpectedly get some weights preserved and some erased.

lukeyeager reviewed Jan 22, 2016
View reviewed changes

lukeyeager added the caffe label Jan 22, 2016

gheinrich force-pushed the dev/bug-533 branch from 7507224 to 95b9254 Compare January 22, 2016 18:20

lukeyeager added a commit that referenced this pull request Jan 22, 2016

Merge pull request #538 from gheinrich/dev/bug-533

dc9f827

Re-use all weights when re-training on same dataset

lukeyeager merged commit dc9f827 into NVIDIA:master Jan 22, 2016

gheinrich deleted the dev/bug-533 branch May 24, 2016 11:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-use all weights when re-training on same dataset #538

Re-use all weights when re-training on same dataset #538

gheinrich commented Jan 22, 2016

lukeyeager Jan 22, 2016

gheinrich Jan 22, 2016

lukeyeager commented Jan 22, 2016

gheinrich commented Jan 22, 2016

mfernezir commented Feb 6, 2016

lukeyeager commented Feb 8, 2016

mfernezir commented Feb 8, 2016

Re-use all weights when re-training on same dataset #538

Re-use all weights when re-training on same dataset #538

Conversation

gheinrich commented Jan 22, 2016

lukeyeager Jan 22, 2016

Choose a reason for hiding this comment

gheinrich Jan 22, 2016

Choose a reason for hiding this comment

lukeyeager commented Jan 22, 2016

gheinrich commented Jan 22, 2016

mfernezir commented Feb 6, 2016

lukeyeager commented Feb 8, 2016

mfernezir commented Feb 8, 2016