Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix rpn memory leak and dataType errors. #1657

Merged
merged 1 commit into from
Dec 11, 2019
Merged

Conversation

tengerye
Copy link
Contributor

  1. Current version causes memory leak, which can be easily repeated using the repository here. The reason is that rpn._cache keeps the pair with the model and the number of tensors it holds expands as proposal grows;

  2. During the process fixing the first problem, I found some data type errors in the development version and fix it.

@codecov-io
Copy link

codecov-io commented Dec 11, 2019

Codecov Report

Merging #1657 into master will increase coverage by 0.01%.
The diff coverage is 20%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1657      +/-   ##
==========================================
+ Coverage      66%   66.02%   +0.01%     
==========================================
  Files          92       92              
  Lines        7330     7331       +1     
  Branches     1107     1107              
==========================================
+ Hits         4838     4840       +2     
+ Misses       2176     2175       -1     
  Partials      316      316
Impacted Files Coverage Δ
torchvision/models/detection/_utils.py 46.25% <0%> (ø) ⬆️
torchvision/models/detection/rpn.py 81.57% <33.33%> (+0.52%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 333af7a...12d79ac. Read the comment docs.

Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR and the fixes!

I have some comments regarding completely dropping the cache, but we can move on with this for now and optimize it again later.

@@ -163,6 +163,8 @@ def forward(self, image_list, feature_maps):
anchors_in_image.append(anchors_per_feature_map)
anchors.append(anchors_in_image)
anchors = [torch.cat(anchors_per_image) for anchors_per_image in anchors]
# Clear the cache in case that memory leaks.
self._cache.clear()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a bit too drastic: the size of the cache is in general fairly small (less than 1MB per key), and we can get some speedup if we cache the most used sizes. This is effectively removing the caching altogether.

Maybe for a follow-up PR: What about using some variant of functools.lru_cache instead of manually keeping a dictionary for the cache?

This way, we could potentially keep save the 32 last sizes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants