Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SSD512 with ResNet50 backbone #3760

Closed
wants to merge 25 commits into from

Conversation

datumbox
Copy link
Contributor

@datumbox datumbox commented May 3, 2021

This PR adds a newer backbone to the standard SSD algorithm which might be more appropriate for real-world applications.

The proposed ResNet50 backbone follows a similar approach as on the paper and is 1.4 mAP more accurate than the equivalent SSD512-VGG16 described on the paper while being 45% faster.


Trained using the code committed at 2f1f578. The current best pre-trained model was trained with:

python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py\
     --dataset coco --model ssd512_resnet50 --epochs 120\
     --lr-steps 80 110 --aspect-ratio-group-factor 3 --lr 0.002 --batch-size 4\
     --weight-decay 0.0005 --data-augmentation ssd

Submitted batch job 40937540 (killed midway)
Submitted batch job 40985712 (resumed)

Accuracy metrics:

Epoch 118:
0: IoU metric: bbox
0:  Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.302
0:  Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.510
0:  Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.311
0:  Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.136
0:  Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.359
0:  Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.433
0:  Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.275
0:  Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.415
0:  Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.440
0:  Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.224
0:  Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.502
0:  Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.599

Validated with:

python -m torch.distributed.launch --nproc_per_node=2 --use_env train.py\
   --dataset coco --model ssd512_resnet50 --pretrained --test-only

Speed benchmark:
1.23 sec per image on CPU

@datumbox datumbox marked this pull request as draft May 3, 2021 09:13
@datumbox datumbox changed the title [WIP] Test experimental backbones Add SSD512 with ResNet50 backbone May 6, 2021
@datumbox datumbox marked this pull request as ready for review May 6, 2021 11:22
@datumbox datumbox requested a review from fmassa May 6, 2021 17:27
@datumbox datumbox mentioned this pull request May 10, 2021
13 tasks
@datumbox datumbox changed the title Add SSD512 with ResNet50 backbone Add SSD512 with ResNet50 backbone & SSD cleanup May 11, 2021
Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the PR Vasilis!

I've left a few comments, let me know what you think

Comment on lines +577 to +586
self.features = nn.Sequential(
backbone.conv1,
backbone.bn1,
backbone.relu,
backbone.maxpool,
backbone.layer1,
backbone.layer2,
backbone.layer3,
backbone.layer4,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason why you didn't use the IntermediateLayerGetter?

Also, for the future, I think we will want to unify the way we extract features so that we rely on the FX-based feature extractor, which will be more generic.

Comment on lines +589 to +591
for m in self.features[-1][0].modules():
if hasattr(m, 'stride'):
m.stride = 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't we want to instead to pass the dilation=[False, False, True] in the ResNet? Just replacing the stride from the last layer without adding dilation means that the features from the last block are not really acting the way they were initially trained to do.

Comment on lines +595 to +601
nn.Sequential(
nn.Conv2d(backbone_out_channels, 256, kernel_size=1, bias=False),
nn.BatchNorm2d(256),
nn.ReLU(inplace=True),
nn.Conv2d(256, 512, kernel_size=3, padding=1, stride=2, bias=False),
nn.BatchNorm2d(512),
nn.ReLU(inplace=True),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit for the future: Might be good to refactor this in a Block class or something like that, which inherits from Sequential so that we keep the same names for the modules.
Something like

class ExtraBlock(nn.Sequential):
    def __init__(self, in_channels mid_channels, out_channels):
        super().__init__(nn.Conv2d(...), ...)

SSD VGG16 0.2093 0.0744 1.5
SSDlite MobileNetV3-Large 0.1773 0.0906 1.5
SSD300 VGG16 0.2093 0.0744 1.5
SSD512 ResNet-50 0.2316 0.0772 3.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the future: we need to change those tables as they are misleading for now -- the test time column for the SSD models is for a batch size of 4 per GPU, while for Faster R-CNN it was for a batch size 2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe do a back of the envelop estimation to bring them to comparable batch-sizes?


def forward(self, x: Tensor) -> Dict[str, Tensor]:
# Undo the 0-1 scaling of toTensor. Necessary for some backbones.
if self.rescaling:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for changing this!

@datumbox
Copy link
Contributor Author

We ran thorough benchmarks and discussed with @fmassa on whether we should add this backbone on TorchVision. For the shake of transparency, I'm going to write here why we decided not to merge this.

Originally the motivation behind adding ResNet50 as a backbone was that it's a newer backbone that is likely to be more useful than VGG for people in the industry. After training the backbone we got an mAP of 30.2 (without hyperparameter tuning) which is 1.4 mAP higher than the SSD512+VGG16 reported on the SSD paper. Moreover the model is about 45% faster on CPU that the VGG16 equivalent so it looked like a great candidate.

On the other hand, one of the key drawbacks of this backbone is that it's not a canonical implementation of a model backed by a paper. Rather it's inspired by the SSD paper and closely resembles other implementations (for example check NVIDIA's). Though this should not be necessarily a blocker for adding a model in the library, we do require for a new model to bring a significant improvement in at least one field.

To answer the question on whether this model has a unique characteristic that none of the other models of the library has, we benchmarked it against the pre-trained FasterRCNN+ResNet50+FPN. To make the comparison fair, we restricted the input of all images to 512x512. Here are the results:

- SSD512 + ResNet50:
    - speed cpu: 1.1136991381645203 per image
    - speed gpu: model_time: 0.0393 - 0.0672 s / it
    - mAP: 0.302
- FasterRCNN + ResNet50 + FPN with fixed_size=(512,512)
    - speed cpu: 1.809812992811203 per image
    - speed gpu: model_time: 0.0283 - 0.0431 s / it
    - mAP: 0.301

As we can see both techniques have equivalent mAPs. We should note that though we can definitely push the accuracy of both models higher by tuning them, but especially for FasterRCNN we can improve just by training on the specific fixed_size. Looking at the CPU speed the SSD model is clearly faster but that's not the case when we look on the speed on GPU. Given that the SSD512+ResNet50 backbone is not a model that one will run on mobile (SSDlite is much better candidate for that) and that it's slower than the FasterRCNN equivalent on GPU, it makes a much less compelling argument on including it.

If someone stumbles upon this PR on the future and they want to give the ResNet50 a try, they will be able to do so by copying the necessary classes. I will also leave the pre-trained weights online so that people can download them if they want to.

I will close this and cherrypick the improvements made on the remaining files on a separate PR.

@datumbox datumbox closed this May 12, 2021
@datumbox datumbox deleted the models/ssd_resnet branch May 12, 2021 15:41
@datumbox datumbox restored the models/ssd_resnet branch May 19, 2021 10:09
@datumbox datumbox reopened this May 19, 2021
@datumbox datumbox closed this May 19, 2021
@datumbox datumbox deleted the models/ssd_resnet branch May 19, 2021 10:13
@datumbox datumbox changed the title Add SSD512 with ResNet50 backbone & SSD cleanup Add SSD512 with ResNet50 backbone May 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants