Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make auxiliary heads in pretrained models optional #828

Merged
merged 5 commits into from
Apr 4, 2019
Merged

make auxiliary heads in pretrained models optional #828

merged 5 commits into from
Apr 4, 2019

Conversation

Separius
Copy link
Contributor

@Separius Separius commented Apr 1, 2019

it is related to pytorch/pytorch#18668

@codecov-io
Copy link

codecov-io commented Apr 1, 2019

Codecov Report

Merging #828 into master will decrease coverage by 0.3%.
The diff coverage is 15.78%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #828      +/-   ##
==========================================
- Coverage   52.96%   52.65%   -0.31%     
==========================================
  Files          35       35              
  Lines        3389     3405      +16     
  Branches      538      543       +5     
==========================================
- Hits         1795     1793       -2     
- Misses       1464     1480      +16     
- Partials      130      132       +2
Impacted Files Coverage Δ
torchvision/models/inception.py 86.54% <0%> (-2.81%) ⬇️
torchvision/models/googlenet.py 73.52% <25%> (-4.43%) ⬇️
torchvision/transforms/transforms.py 82.84% <0%> (-0.68%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f566fac...b74977b. Read the comment docs.

@Separius Separius changed the title make auxilary heads in pretrained models optional make auxiliary heads in pretrained models optional Apr 1, 2019
Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this story around aux-logits is not great.

Given that we only load the model weights after having defined the model structure, I think that we should unconditionally have the branches.

The forward should decide then if it should use the branches or not (which in general should depend only if the model is in training mode or not).

Thoughts?

original_aux_logits = kwargs['aux_logits']
kwargs['aux_logits'] = True
else:
original_aux_logits = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work if the model was trained without the aux_logits, because load_state_dict will not match, right?

Copy link
Contributor Author

@Separius Separius Apr 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I assumed that the pretrained models all have aux_logits and they are trained but based on #821 it seems that they are actually not trained(but included in the pretrained models)

so doesn't it make sense to disable aux_heads in the pretrained models and update the pretrained models(.pth files)?

it breaks backward compatibility though and maybe it makes sense to set strict to False in load_from_state_dict?

edit: I see it still breaks backward compatibility

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, one of my ideas was to completely disable .train(True) mode, and not have the aux branches at all.
Did you train a checkpoint on top of inception v3 without the aux heads?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

completely disabling train mode(for pretrained models) is a bit harsh, don't you think?
I guess most of the time people freeze the net, train just the fc layer and then unfreeze the whole thing with a small lr. disabling train will break those codes.

No no, by reading #821 I thought that maybe aux paths are not trained at all in the .pth file and that's the reason they are discarded, am I mistaken?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Separius only the aux heads of GoogLeNet don't have pretrained weights, the aux branch of inception v3 however is trained.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, I think we should leave GoogleNet as is, and then this is good to be merged?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TheCodez so I guess we have two options then, either delete aux branch after loading it in the inception model(like my code) or do not allow aux_logits=False in the load_pretrained function

torchvision/models/googlenet.py Show resolved Hide resolved
Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @TheCodez do you have any further thoughts?

@TheCodez
Copy link
Contributor

TheCodez commented Apr 2, 2019

My question is why should we delete the aux branch for inception v3 but not for GoogleNet?
I agree with @fmassa that we should always create those branches and then just disable them if not desired.

@Separius code with the lines del model.AuxLogits and if aux_logits: removed should be fine if I'm not mistaken?
But we should double check that to make sure we don't break any existing models.

@Separius
Copy link
Contributor Author

Separius commented Apr 3, 2019

ok based on our discussion, here is what I propose:

both pretrained models should accept aux_logits=True(and False) and both models should treat it similarly(right now when aux_logits is false in the inception we don't instantiate those heads but we do it in the googlenet) but we must warn the user that using aux_logits=True in googlenet is not recommended(because they are not pretrained).

it won't break BC and it won't take extra space for the unused parameters and more importantly it allows fine-tuning, either with or without auxiliary heads.

what do you think?, @TheCodez @fmassa

@TheCodez ragarding breaking existing models, right now load_pretraind_inception with aux_logits=True fails, so I think it's safe to say that we won't break anything.

@TheCodez
Copy link
Contributor

TheCodez commented Apr 3, 2019

@Separius your reasoning seems good 👍

but we must warn the user that using aux_logits=True in googlenet is not recommended(because they are not pretrained).

This is important to document that they aren't pretrained.

@TheCodez ragarding breaking existing models, right now load_pretraind_inception with aux_logits=True fails, so I think it's safe to say that we won't break anything.

Yeah my idea of always creating the aux branch for inception v3 as well would cause bc issues with finetuned models.
So it's probably a good idea to be consistent in both googlenet and inception.

@Separius
Copy link
Contributor Author

Separius commented Apr 3, 2019

@TheCodez I also changed the ordering of the returned values from googlenet when aux is on to make it consistent with inception, I know it's a breaking change but I believe googlenet is not in the latest torchvision release, so we are safe to change it, right?

we could also return a namedtuple in both models when aux is on. but I don't know what to name them, GooglenetAllOutputs?, I don't think so

@TheCodez
Copy link
Contributor

TheCodez commented Apr 3, 2019

@Separius I think that’s fine.
What about ‘GoogLeNetOuputs’?

@Separius
Copy link
Contributor Author

Separius commented Apr 4, 2019

should I also add some tests to the test_models.py?
I mean, is it ok for the tests to download pretrained models? (I live in Iran and downloading all the pretrained models takes a lot of time. that's why I'm asking :)))

@fmassa
Copy link
Member

fmassa commented Apr 4, 2019

@Separius let's not download pre-trained models in the tests. This can make the tests flaky due to connection errors

Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants