Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak in forward pass on FasterRCNN with varying aspect ratio #1689

Closed
MichelHalmes opened this issue Dec 18, 2019 · 2 comments
Closed

Comments

@MichelHalmes
Copy link

MichelHalmes commented Dec 18, 2019

Thank you for pytorch!

My object detector is running out of CPU memory after a few iterations.
I traced down the issue to the fact that I use varying aspect ratios at the input.
(My dataset has varying aspect ratios and I want to train the network to handle this properly. Since my training is distributed, I use batch size 1, making this possible.)

The memory keeps increasing on the forward pass. Training with varying sizes but constant aspect ratio or putting the model in evaluation mode solves the issue (ie constant memory at 1171MB). After a few iterations, memory will oscillate but still have an upwards trend.

I'm not blocked by this but still wanted to know if this is expected behavior and if there is a quick fix for this?

The code below illustrates the issue:

import os
import gc
import random

import torch
import psutil
from torchvision.models.detection import fasterrcnn_resnet50_fpn

process = psutil.Process(os.getpid())
model = fasterrcnn_resnet50_fpn(pretrained=True)
# model.eval() # Gives no issues

for i in range(10):
    size = random.randint(700, 900)
    images = torch.rand([1, 3, size, 800]) 
    # images = torch.rand([1, 3, size, size]) # Gives no issues
    targets = [{'boxes': torch.tensor([[ 10., 20., 30., 40.]]), 'labels': torch.tensor([1])}]
    model(images, targets)
    gc.collect()
    print("Current memory: ", process.memory_info()[0] / float(2**20))

Output:

Current memory:  1301.67578125
Current memory:  2059.98046875
Current memory:  2895.0234375
Current memory:  3668.140625
Current memory:  3556.203125
Current memory:  4434.28515625
Current memory:  4428.390625
Current memory:  4432.43359375
Current memory:  3522.76953125
Current memory:  3351.46484375

Environment

Output of collect.py

ollecting environment information...
PyTorch version: 1.3.1
Is debug build: No
CUDA used to build PyTorch: None

OS: Mac OSX 10.14.6
GCC version: Could not collect
CMake version: Could not collect

Python version: 3.7
Is CUDA available: No
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA

Versions of relevant libraries:
[pip3] numpy==1.17.4
[pip3] torch==1.3.1
[pip3] torchvision==0.4.2
@fmassa
Copy link
Member

fmassa commented Dec 19, 2019

Hi,

I believe this is fixed in #1657

Can you try with a latest PyTorch / torchvision nightly and report back if this hasn't been fixed?

@MichelHalmes
Copy link
Author

Indeed, it's fixed on the nightly. 👍
Much appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants