Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix RT-DETR cache for generate_anchors #31671

Merged
merged 9 commits into from
Jul 3, 2024

Conversation

qubvel
Copy link
Member

@qubvel qubvel commented Jun 27, 2024

What does this PR do?

For RT-DETR model:

  1. Fix lru_cache for the generate_anchors method, so even if we go with dynamic anchors generation we only generate it once per image of the same size.
  2. Fix anchors dtype: make anchors generated always in float32, and then convert to the desired type. Otherwise, a difference might be observed between dynamic and static anchors inference in float16/bfloat16 (test is added).

Who can review?

cc @SangbumChoi

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Contributor

@SangbumChoi SangbumChoi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for handling this!
(Did you also finetuned RT-DETR with bfloat16? Is it okay?)

@qubvel
Copy link
Member Author

qubvel commented Jun 28, 2024

Did you also finetuned RT-DETR with bfloat16? Is it okay?

No, I didn't. I tried bfloat16 only for inference 🙂

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing!

@@ -1656,7 +1656,10 @@ def unfreeze_backbone(self):
param.requires_grad_(True)

@lru_cache(maxsize=32)
def generate_anchors(self, spatial_shapes=None, grid_size=0.05, dtype=torch.float32, device="cpu"):
def generate_anchors(self, spatial_shapes=None, grid_size=0.05):
# We always generate anchors in float32 to preserve original model code
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's update this to reflect the true reason: preserving equivalence between dynamic and static anchor inference. We don't really care about whether our code matches the original model's, just that equivalent logic is used

@qubvel qubvel merged commit b975216 into huggingface:main Jul 3, 2024
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants