Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Post-paper Detection Optimizations #5444

Merged
merged 54 commits into from
Apr 5, 2022

Conversation

datumbox
Copy link
Contributor

@datumbox datumbox commented Feb 19, 2022

Related to #5307 and #2263. Touches upon requests recorded at #4932 and #5325.

Our target is to improve the existing RetinaNet, FasterRCNN and MaskRCNN architectures using post-paper optimizations:

This PR contains commits which were later split on separate PRs to assist the reviews. Currently it contains the following changes on top of the main branch:

  • Adds support of norm_layer in all Detection heads.
  • Adds a new extendible FastRCNNConvFCHead which follows a similar strategy as the existing MaskRCNNHeads.
  • Replaces Conv-Norm-BNs with Conv2dNormActivation where possible; we use PyTorch core's _load_from_state_dict() approach to maintain BC.
  • Extends RPNHead to support heavier heads.
  • Adds an experimental private _box_loss utility which allows for training models with different box losses. No plans to make this public; the entire API is kept private so that we can review our Detection API as a whole and examine the best way to support different Transforms and Losses in Detection models.
  • Adds new 3 variants for RetinaNet and *RCNN based on follow up papers.

@facebook-github-bot
Copy link

facebook-github-bot commented Feb 19, 2022

💊 CI failures summary and remediations

As of commit 6488c41 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@datumbox datumbox changed the title Enhance Detection Recipe [WIP] Enhance Detection Recipe Feb 19, 2022
@datumbox datumbox marked this pull request as draft February 19, 2022 11:38
@datumbox datumbox force-pushed the references/detection_recipe branch from 2bc1e81 to bcf0afc Compare March 8, 2022 00:04
@datumbox datumbox changed the title [WIP] Enhance Detection Recipe [WIP] Post-paper Detection Optimizations Apr 1, 2022
@datumbox datumbox requested a review from fmassa April 1, 2022 11:24
@datumbox datumbox marked this pull request as ready for review April 1, 2022 11:33
Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

I've left just a minor comment regarding naming, but it's just for discussion and there is no need to act on it

torchvision/models/detection/faster_rcnn.py Outdated Show resolved Hide resolved
torchvision/models/detection/_utils.py Show resolved Hide resolved
@datumbox datumbox changed the title [WIP] Post-paper Detection Optimizations Post-paper Detection Optimizations Apr 5, 2022
@datumbox datumbox merged commit 08cc9a7 into pytorch:main Apr 5, 2022
@datumbox datumbox deleted the references/detection_recipe branch April 5, 2022 17:49
@d4l3k
Copy link

d4l3k commented Apr 14, 2022

@datumbox I'm getting errors when trying to load a model using FPN as a frozen layer.

Traceback (most recent call last):
  File "/mnt/ext/openape/apedepth/train.py", line 85, in <module>
    model.load_state_dict(state_dict, strict=False)
  File "/home/rice/venvs/openape/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1566, in load_state_dict
    load(self)
  File "/home/rice/venvs/openape/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1564, in load
    load(child, prefix + name + '.')
  File "/home/rice/venvs/openape/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1564, in load
    load(child, prefix + name + '.')
  File "/home/rice/venvs/openape/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1564, in load
    load(child, prefix + name + '.')
  File "/home/rice/venvs/openape/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1560, in load
    module._load_from_state_dict(
  File "/home/rice/venvs/openape/lib/python3.10/site-packages/torchvision/ops/feature_pyramid_network.py", line 131, in _load_from_state_dict
    state_dict[new_key] = state_dict.pop(old_key)
KeyError: 'semantic.backbone.fpn.inner_blocks.0.weight'

This seems to be a regression since PT 1.11 stable and I can no longer save/load my model. Excluding the semantic weights when calling torch.load doesn't help either.

My model has

        self.semantic = freeze(models.detection.fasterrcnn_mobilenet_v3_large_fpn(pretrained=True))

@d4l3k
Copy link

d4l3k commented Apr 14, 2022

Adding a if old_key in state_dict in a few spots seems to fix it.

        version = local_metadata.get("version", None)

        if version is None or version < 2:
            for type in ["weight", "bias"]:
                old_key = f"{prefix}conv.{type}"
                new_key = f"{prefix}conv.0.0.{type}"
+                if old_key in state_dict:
                    state_dict[new_key] = state_dict.pop(old_key)

@datumbox
Copy link
Contributor Author

@d4l3k Thanks for the heads up.

This seems to be a regression since PT 1.11 stable and I can no longer save/load my model.

If this is true, it's probably not related to this PR. To reduce any confusion, could you please open a separate issue and provide a way to reproduce the problem?

Adding a if old_key in state_dict in a few spots seems to fix it.

This might indicate that the old structure is modified and not compatible with v1. The proposed patch is not a solution but rather more likely to mask the issue. If you provide a way to reproduce the issue, we can help you invstigate.

facebook-github-bot pushed a commit that referenced this pull request May 5, 2022
Summary:
* Use frozen BN only if pre-trained.

* Add LSJ and ability to from scratch training.

* Fixing formatter

* Adding `--opt` and `--norm-weight-decay` support in Detection.

* Fix error message

* Make ScaleJitter proportional.

* Adding more norm layers in split_normalization_params.

* Add FixedSizeCrop

* Temporary fix for fill values on PIL

* Fix the bug on fill.

* Add RandomShortestSize.

* Skip resize when an augmentation method is used.

* multiscale in [480, 800]

* Add missing star

* Add new RetinaNet variant.

* Add tests.

* Update expected file for old retina

* Fixing tests

* Add FrozenBN to retinav2

* Fix network initialization issues

* Adding BN support in MaskRCNNHeads and FPN

* Adding support of FasterRCNNHeads

* Introduce norm_layers in backbone utils.

* Bigger RPN head + 2x rcnn v2 models.

* Adding gIoU support to retinanet

* Fix assert

* Add back nesterov momentum

* Rename and extend `FastRCNNConvFCHead` to support arbitrary FCs

* Fix linter

(Note: this ignores all push blocking failures!)

Reviewed By: jdsgomes, NicolasHug

Differential Revision: D36095683

fbshipit-source-id: 9105524308694ac8830ed12ba40286bb75c4aa8d
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants