Add SSD512 with ResNet50 backbone #3760

datumbox · 2021-05-03T09:13:02Z

This PR adds a newer backbone to the standard SSD algorithm which might be more appropriate for real-world applications.

The proposed ResNet50 backbone follows a similar approach as on the paper and is 1.4 mAP more accurate than the equivalent SSD512-VGG16 described on the paper while being 45% faster.

Trained using the code committed at 2f1f578. The current best pre-trained model was trained with:

python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py\
     --dataset coco --model ssd512_resnet50 --epochs 120\
     --lr-steps 80 110 --aspect-ratio-group-factor 3 --lr 0.002 --batch-size 4\
     --weight-decay 0.0005 --data-augmentation ssd

Submitted batch job 40937540 (killed midway)
Submitted batch job 40985712 (resumed)

Accuracy metrics:

Epoch 118:
0: IoU metric: bbox
0:  Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.302
0:  Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.510
0:  Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.311
0:  Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.136
0:  Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.359
0:  Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.433
0:  Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.275
0:  Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.415
0:  Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.440
0:  Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.224
0:  Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.502
0:  Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.599

Validated with:

python -m torch.distributed.launch --nproc_per_node=2 --use_env train.py\
   --dataset coco --model ssd512_resnet50 --pretrained --test-only

Speed benchmark:
1.23 sec per image on CPU

This reverts commits b640680, 36163dc and 2c0f46d.

fmassa

Thanks a lot for the PR Vasilis!

I've left a few comments, let me know what you think

fmassa · 2021-05-12T12:45:09Z

torchvision/models/detection/ssd.py

+        self.features = nn.Sequential(
+            backbone.conv1,
+            backbone.bn1,
+            backbone.relu,
+            backbone.maxpool,
+            backbone.layer1,
+            backbone.layer2,
+            backbone.layer3,
+            backbone.layer4,
+        )


Any particular reason why you didn't use the IntermediateLayerGetter?

Also, for the future, I think we will want to unify the way we extract features so that we rely on the FX-based feature extractor, which will be more generic.

fmassa · 2021-05-12T12:46:56Z

torchvision/models/detection/ssd.py

+        for m in self.features[-1][0].modules():
+            if hasattr(m, 'stride'):
+                m.stride = 1


Wouldn't we want to instead to pass the dilation=[False, False, True] in the ResNet? Just replacing the stride from the last layer without adding dilation means that the features from the last block are not really acting the way they were initially trained to do.

fmassa · 2021-05-12T12:52:03Z

torchvision/models/detection/ssd.py

+            nn.Sequential(
+                nn.Conv2d(backbone_out_channels, 256, kernel_size=1, bias=False),
+                nn.BatchNorm2d(256),
+                nn.ReLU(inplace=True),
+                nn.Conv2d(256, 512, kernel_size=3, padding=1, stride=2, bias=False),
+                nn.BatchNorm2d(512),
+                nn.ReLU(inplace=True),


nit for the future: Might be good to refactor this in a Block class or something like that, which inherits from Sequential so that we keep the same names for the modules.
Something like

class ExtraBlock(nn.Sequential): def __init__(self, in_channels mid_channels, out_channels): super().__init__(nn.Conv2d(...), ...)

fmassa · 2021-05-12T12:54:06Z

docs/source/models.rst

-SSD VGG16                               0.2093               0.0744              1.5
-SSDlite MobileNetV3-Large               0.1773               0.0906              1.5
+SSD300 VGG16                            0.2093               0.0744              1.5
+SSD512 ResNet-50                        0.2316               0.0772              3.0


For the future: we need to change those tables as they are misleading for now -- the test time column for the SSD models is for a batch size of 4 per GPU, while for Faster R-CNN it was for a batch size 2.

Maybe do a back of the envelop estimation to bring them to comparable batch-sizes?

fmassa · 2021-05-12T12:55:13Z

torchvision/models/detection/ssd.py


    def forward(self, x: Tensor) -> Dict[str, Tensor]:
-        # Undo the 0-1 scaling of toTensor. Necessary for some backbones.
-        if self.rescaling:


Thanks for changing this!

datumbox · 2021-05-12T15:41:43Z

We ran thorough benchmarks and discussed with @fmassa on whether we should add this backbone on TorchVision. For the shake of transparency, I'm going to write here why we decided not to merge this.

Originally the motivation behind adding ResNet50 as a backbone was that it's a newer backbone that is likely to be more useful than VGG for people in the industry. After training the backbone we got an mAP of 30.2 (without hyperparameter tuning) which is 1.4 mAP higher than the SSD512+VGG16 reported on the SSD paper. Moreover the model is about 45% faster on CPU that the VGG16 equivalent so it looked like a great candidate.

On the other hand, one of the key drawbacks of this backbone is that it's not a canonical implementation of a model backed by a paper. Rather it's inspired by the SSD paper and closely resembles other implementations (for example check NVIDIA's). Though this should not be necessarily a blocker for adding a model in the library, we do require for a new model to bring a significant improvement in at least one field.

To answer the question on whether this model has a unique characteristic that none of the other models of the library has, we benchmarked it against the pre-trained FasterRCNN+ResNet50+FPN. To make the comparison fair, we restricted the input of all images to 512x512. Here are the results:

- SSD512 + ResNet50:
    - speed cpu: 1.1136991381645203 per image
    - speed gpu: model_time: 0.0393 - 0.0672 s / it
    - mAP: 0.302
- FasterRCNN + ResNet50 + FPN with fixed_size=(512,512)
    - speed cpu: 1.809812992811203 per image
    - speed gpu: model_time: 0.0283 - 0.0431 s / it
    - mAP: 0.301

As we can see both techniques have equivalent mAPs. We should note that though we can definitely push the accuracy of both models higher by tuning them, but especially for FasterRCNN we can improve just by training on the specific fixed_size. Looking at the CPU speed the SSD model is clearly faster but that's not the case when we look on the speed on GPU. Given that the SSD512+ResNet50 backbone is not a model that one will run on mobile (SSDlite is much better candidate for that) and that it's slower than the FasterRCNN equivalent on GPU, it makes a much less compelling argument on including it.

If someone stumbles upon this PR on the future and they want to give the ResNet50 a try, they will be able to do so by copying the necessary classes. I will also leave the pre-trained weights online so that people can download them if they want to.

I will close this and cherrypick the improvements made on the remaining files on a separate PR.

Add experimental resnet50 backbone.

2f1f578

facebook-github-bot added the cla signed label May 3, 2021

datumbox marked this pull request as draft May 3, 2021 09:13

datumbox and others added 7 commits May 3, 2021 18:53

Merge branch 'master' into models/ssd_resnet

d46a302

Passing custom scales (necessary after master merge).

0c17b0a

Add experimental FPN-style resnet50 backbone.

b640680

Add experimental VGG-style resnet50 backbone.

36163dc

Add a highres option to support both the 300 and 512 versions.

2c0f46d

Select best performing prototype.

eef01bc

This reverts commits b640680, 36163dc and 2c0f46d.

Adding documentation.

9cf7c5d

datumbox changed the title ~~[WIP] Test experimental backbones~~ Add SSD512 with ResNet50 backbone May 6, 2021

datumbox and others added 2 commits May 6, 2021 11:52

Adding weights.

d419eea

Merge branch 'master' into models/ssd_resnet

e6fb426

datumbox marked this pull request as ready for review May 6, 2021 11:22

datumbox added module: models topic: object detection labels May 6, 2021

Merge branch 'master' into models/ssd_resnet

40da375

datumbox requested a review from fmassa May 6, 2021 17:27

datumbox added 4 commits May 7, 2021 08:42

Merge branch 'master' into models/ssd_resnet

e66b800

Merge branch 'master' into models/ssd_resnet

e5472d0

Merge branch 'master' into models/ssd_resnet

e526e32

Merge branch 'master' into models/ssd_resnet

18495f3

datumbox mentioned this pull request May 10, 2021

TorchVision Roadmap - 2021 H1 #3221

Closed

13 tasks

datumbox and others added 6 commits May 11, 2021 09:21

Merge branch 'master' into models/ssd_resnet

14299da

Fix not implemented for half exception

ea1e2c4

Merge branch 'master' into models/ssd_resnet

04ec56a

Apply recommendations from code review.

777126d

Updating docs.

87d0153

Change the way we rescale to [-1, 1]

8b2715d

datumbox changed the title ~~Add SSD512 with ResNet50 backbone~~ Add SSD512 with ResNet50 backbone & SSD cleanup May 11, 2021

datumbox and others added 3 commits May 11, 2021 16:46

Change the way we rescale input on SSD300+VGG16

d08fc10

Add comment.

61ae292

Merge branch 'master' into models/ssd_resnet

18bf381

fmassa reviewed May 12, 2021

View reviewed changes

datumbox mentioned this pull request May 12, 2021

Clean up SSD and SSDlite implementations #3818

Merged

datumbox closed this May 12, 2021

datumbox deleted the models/ssd_resnet branch May 12, 2021 15:41

oke-aditya mentioned this pull request May 15, 2021

Are new models planned to be added? #2707

Open

37 tasks

datumbox restored the models/ssd_resnet branch May 19, 2021 10:09

datumbox reopened this May 19, 2021

Merge branch 'master' into models/ssd_resnet

644bdcd

datumbox closed this May 19, 2021

datumbox deleted the models/ssd_resnet branch May 19, 2021 10:13

datumbox changed the title ~~Add SSD512 with ResNet50 backbone & SSD cleanup~~ Add SSD512 with ResNet50 backbone May 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SSD512 with ResNet50 backbone #3760

Add SSD512 with ResNet50 backbone #3760

datumbox commented May 3, 2021 •

edited

Loading

fmassa left a comment

fmassa May 12, 2021

fmassa May 12, 2021

fmassa May 12, 2021

fmassa May 12, 2021

datumbox May 12, 2021

fmassa May 12, 2021

datumbox commented May 12, 2021

Add SSD512 with ResNet50 backbone #3760

Add SSD512 with ResNet50 backbone #3760

Conversation

datumbox commented May 3, 2021 • edited Loading

fmassa left a comment

Choose a reason for hiding this comment

fmassa May 12, 2021

Choose a reason for hiding this comment

fmassa May 12, 2021

Choose a reason for hiding this comment

fmassa May 12, 2021

Choose a reason for hiding this comment

fmassa May 12, 2021

Choose a reason for hiding this comment

datumbox May 12, 2021

Choose a reason for hiding this comment

fmassa May 12, 2021

Choose a reason for hiding this comment

datumbox commented May 12, 2021

datumbox commented May 3, 2021 •

edited

Loading