-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-use all weights when re-training on same dataset #538
Conversation
@@ -48,7 +48,7 @@ def get_network_from_desc(self, network_desc): | |||
""" | |||
raise NotImplementedError('Please implement me') | |||
|
|||
def get_network_from_previous(self, previous_network): | |||
def get_network_from_previous(self, previous_network, useSameDataset): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems weird to have previous_network
(with an underscore) right next to useSameDataset
(using camelCase).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yes, I'll change that
Looks like it works to me. Thanks for the fix! |
Current behaviour: When a classification model is being re-trained the last `InnerProduct` layer is renamed to allow training on another dataset (with a possibly different number of classes). This implies that the weights of that last layer are discarded. Problem: If the model is being re-trained on the same dataset then initially the model will perform badly as it needs to re-learn the weights of its last `InnerProduct` layer. Solution: Refrain from renaming the last `InnerProduct` layer if the model is being re-trained on the same dataset Note: This applies to Caffe only. closes NVIDIA#533
7507224
to
95b9254
Compare
Updated with underscore naming convention |
Re-use all weights when re-training on same dataset
Hi! @gheinrich @lukeyeager there are still a couple of issues with this solution: 1.) There's another use case where one would not want to lose weights from the last layer: continuing training on a larger dataset after pretraining and experimenting on a smaller sample from the same training distribution of images. To be able to do that for my current task, I manually redefined the use_same_dataset variable to True in two Python files changed in this commit. 2.) Not all networks have only one inner product classification layer. In my case, I'm using GoogLeNet which has 3 outputs. At the moment, what happens is that the final classification layer gets erased when loading another dataset but two earlier ones don't. Perhaps it would be better to always preserve pretrained weights and leave it to the user to manually rename layers that he wants trained from scratch? |
Good point @mfernezir - we don't have a good general-purpose solution yet. It's a tricky problem because we'd like to satisfy a bunch of conditions:
Since you can't have variables in a .prototxt file, I've considered adding some keywords to Caffe .prototxt files which can be search/replaced before processing with Caffe, like this: inner_product_param {
num_output: NUM_CLASSES
} But I'd want a solution that enables Caffe to use the original .prototxt file without alteration, too. Maybe something in a comment like this? inner_product_param {
#replace with NUM_CLASSES#
num_output: 1000
} |
I'm not sure what would be best. In any case, I think that between conflicting design objectives transparency and getting expected behavior should be more important than achieving automation and convenience. One simple option would be the following:
|
Current behaviour:
When a classification model is being re-trained the last
InnerProduct
layer is renamed to allow training on another dataset (with a possibly different number of classes).This implies that the weights of that last layer are discarded.
Problem:
If the model is being re-trained on the same dataset then initially the model will perform badly as it needs to re-learn the weights of its last
InnerProduct
layer.Solution:
Refrain from renaming the last
InnerProduct
layer if the model is being re-trained on the same datasetNote:
This applies to Caffe only.
closes #533