-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GoogLeNet (Inception v1) #678
Conversation
@fmassa what do you think? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation looks great, thanks!
But we need some pre-trained weights before this can be merged.
I can launch some trainings to see if I manage to obtain the reported results.
Do you know what hyperparameters to use to launch training?
From my understanding, reproducing the results of GoogleNet was kind of hard though.
Those are the parameters used by the Berkely Vision GoogLeNet.
|
Awesome, thanks! I'll try running the imagenet examples with your model and I'll report back once training is over |
@fmassa I just noticed, I missed a padding in |
I have also added a batch normalized GoogLeNet version which could be trained using:
I'm not entirely sure which learning rate decay strategy would be best suited, maybe the default one from the imagenet script or the 6 times faster than the GoogLeNet decay from the BatchNorm paper. Also for the original GoogLeNet, maybe the poly decay policy used by the Berkeley Vision GoogLeNet would be better suited than the one found in the paper. |
Hi, I just got back from holidays. The training that I launched didn't learn anything.
There is a difference between I can try running more training jobs, if you have the hyper-parameters to try. |
@fmassa the batch normalized version should be easier to train:
For the regular GoogLeNet:
I have updated the training script adding some more data augmentation ColorJitter and LightingNoise, which are reduced in strength for the batch normalized version. Hopefully it succeds this time. |
I launched another training with googlenet using your newer commits. Let's see what it gives |
@TheCodez Doesn't look like it's learning anything on 8GPUs, with the defaults that you sent. It has run through 5 epochs already. I'll let it run overnight, but this seems to align with my expectations that reproducing their results might not be easy |
The only obvious differences I notice to the Berkeley Vision version are:
They needed 60 epochs (2,400,000 iterations) to achieve top-1 accuracy 68.7% (31.3% error) and a top-5 accuracy 88.9% (11.1% error) using the fast solver. In theory we should get better results because of the data augmentations. Other than that my only idea is using a different learning rate to start with or trying the batch normalized version first and see if that works better. |
Hi @TheCodez , Here are my thoughts:
I believe the the fact is not training at all is probably a combination of weight initialization, data transformation (are their images in 0-1 or 0-255?). Can you double check that on top of existing implementations that are known to work? |
Codecov Report
@@ Coverage Diff @@
## master #678 +/- ##
=========================================
- Coverage 40.99% 39.9% -1.09%
=========================================
Files 29 30 +1
Lines 2747 2874 +127
Branches 432 445 +13
=========================================
+ Hits 1126 1147 +21
- Misses 1542 1648 +106
Partials 79 79
Continue to review full report at Codecov.
|
@fmassa I changed the weight initialization to
But shouldn't this only matter when the model is already pretrained? Also updated my training script to get rid off the line I wasn't sure about. TheCodez/examples@dde173d |
8bc54e2
to
0bfb10e
Compare
@fmassa could you please test again, if you have time? |
Hi @TheCodez , Sorry for the delay. I didn't have the chance to try it again since last time. I think for googlenet, given that it seems to be hard to train from scratch, I think I'll end up providing only pre-trained versions (by reusing the original weights), and not enforce that they will have training code associated with it. In this case, it would make sense to put it in torchhub, but I still need to figure out the details of how to do it. |
@fmassa my ideas would be:
What do you think? |
@TheCodez I'm currently leaning towards adapting the weights from TensorFlow, and providing it on the Hub. |
@fmassa I'll be trying to convert the weights from tensorflow. |
@fmassa I converted the weights. I haven't done extensive evaluation on the ImageNet validation set, instead I just tested some images and all of them were classified correctly. The problem I'm currently having is that in Tensorflow the ImageNet dataset has 1001 classes, which leads to me having to substract 1 from the prediction to get the correct classification. @colesbury how did you convert the 1001 to 1000 classes in the InceptionV3 model? |
@TheCodez I think that we can remove the first row of the classifier weights, which should remove the first class and should be fine I think? |
@fmassa I've updated the code to match the structure required for the TensorFlow weights. Also added the input normalization used for the Inception v3 model. Removing the first row did the trick. Weights are currently hosted here: https://github.com/TheCodez/vision/releases/tag/1.0 |
@TheCodez thanks for the update! I'm evaluating the pre-trained model on ImageNet to compare the results and I'll let you know |
I got
which seems about right, thanks! I'm uploading the pre-trained weights, and then I'll update the download path and the documentation, thanks! |
torchvision/models/googlenet.py
Outdated
self.inception4c = Inception(512, 128, 128, 256, 24, 64, 64) | ||
self.inception4d = Inception(512, 112, 144, 288, 32, 64, 64) | ||
self.inception4e = Inception(528, 256, 160, 320, 32, 128, 128) | ||
self.maxpool4 = nn.MaxPool2d(3, stride=2, ceil_mode=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fmassa one thing to note here is that TensorFlow uses 2x2 pooling here instead of 3x3. Don't know if that has a positive impact on the accuracy, but it would mean to further diverge from the paper definition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the reason why you used 3x3 pooling, in order to make everything work out fine, given the differences between TF and PyTorch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This didn't cause problems during my conversion process so I probably just missed it. Should I change it? In that case it might be a good idea to add a note that the implementation differs from the paper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me try seeing if it makes a difference for the performance, and I'll let you know
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accuracy is 1 point better using the 2x2 pooling, with
Acc@1 69.778 Acc@5 89.530
so I'll be changing it. Thanks for the heads up
Thanks a lot @TheCodez ! |
@fmassa Thank you for all the help and guidance 👍 |
Thanks for the PR! Keep the amazing work! |
This adds the GoogLeNet (Inception v1) model, including the auxiliary classifiers found in the paper. Related to Issue #537
I have updated the example project on my branch to add support for training the GoogLeNet. Sadly I don't have the computing resources available to train the model on ImageNet.