-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make auxiliary heads in pretrained models optional #828
Conversation
Codecov Report
@@ Coverage Diff @@
## master #828 +/- ##
==========================================
- Coverage 52.96% 52.65% -0.31%
==========================================
Files 35 35
Lines 3389 3405 +16
Branches 538 543 +5
==========================================
- Hits 1795 1793 -2
- Misses 1464 1480 +16
- Partials 130 132 +2
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this story around aux-logits is not great.
Given that we only load the model weights after having defined the model structure, I think that we should unconditionally have the branches.
The forward should decide then if it should use the branches or not (which in general should depend only if the model is in training
mode or not).
Thoughts?
original_aux_logits = kwargs['aux_logits'] | ||
kwargs['aux_logits'] = True | ||
else: | ||
original_aux_logits = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work if the model was trained without the aux_logits
, because load_state_dict
will not match, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I assumed that the pretrained models all have aux_logits
and they are trained but based on #821 it seems that they are actually not trained(but included in the pretrained models)
so doesn't it make sense to disable aux_heads in the pretrained models and update the pretrained models(.pth files)?
it breaks backward compatibility though and maybe it makes sense to set strict
to False in load_from_state_dict
?
edit: I see it still breaks backward compatibility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, one of my ideas was to completely disable .train(True)
mode, and not have the aux branches at all.
Did you train a checkpoint on top of inception v3 without the aux heads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
completely disabling train mode(for pretrained models) is a bit harsh, don't you think?
I guess most of the time people freeze the net, train just the fc layer and then unfreeze the whole thing with a small lr. disabling train will break those codes.
No no, by reading #821 I thought that maybe aux paths are not trained at all in the .pth
file and that's the reason they are discarded, am I mistaken?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Separius only the aux heads of GoogLeNet don't have pretrained weights, the aux branch of inception v3 however is trained.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, I think we should leave GoogleNet as is, and then this is good to be merged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TheCodez so I guess we have two options then, either delete aux branch after loading it in the inception model(like my code) or do not allow aux_logits=False
in the load_pretrained
function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. @TheCodez do you have any further thoughts?
My question is why should we delete the aux branch for inception v3 but not for GoogleNet? @Separius code with the lines |
ok based on our discussion, here is what I propose: both pretrained models should accept it won't break BC and it won't take extra space for the unused parameters and more importantly it allows fine-tuning, either with or without auxiliary heads. what do you think?, @TheCodez @fmassa @TheCodez ragarding breaking existing models, right now |
@Separius your reasoning seems good 👍
This is important to document that they aren't pretrained.
Yeah my idea of always creating the aux branch for inception v3 as well would cause bc issues with finetuned models. |
@TheCodez I also changed the ordering of the returned values from googlenet when aux is on to make it consistent with inception, I know it's a breaking change but I believe googlenet is not in the latest torchvision release, so we are safe to change it, right? we could also return a |
@Separius I think that’s fine. |
should I also add some tests to the |
@Separius let's not download pre-trained models in the tests. This can make the tests flaky due to connection errors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
it is related to pytorch/pytorch#18668