Memory leak in forward pass on FasterRCNN with varying aspect ratio #1689

MichelHalmes · 2019-12-18T11:01:23Z

Thank you for pytorch!

My object detector is running out of CPU memory after a few iterations.
I traced down the issue to the fact that I use varying aspect ratios at the input.
(My dataset has varying aspect ratios and I want to train the network to handle this properly. Since my training is distributed, I use batch size 1, making this possible.)

The memory keeps increasing on the forward pass. Training with varying sizes but constant aspect ratio or putting the model in evaluation mode solves the issue (ie constant memory at 1171MB). After a few iterations, memory will oscillate but still have an upwards trend.

I'm not blocked by this but still wanted to know if this is expected behavior and if there is a quick fix for this?

The code below illustrates the issue:

import os
import gc
import random

import torch
import psutil
from torchvision.models.detection import fasterrcnn_resnet50_fpn

process = psutil.Process(os.getpid())
model = fasterrcnn_resnet50_fpn(pretrained=True)
# model.eval() # Gives no issues

for i in range(10):
    size = random.randint(700, 900)
    images = torch.rand([1, 3, size, 800]) 
    # images = torch.rand([1, 3, size, size]) # Gives no issues
    targets = [{'boxes': torch.tensor([[ 10., 20., 30., 40.]]), 'labels': torch.tensor([1])}]
    model(images, targets)
    gc.collect()
    print("Current memory: ", process.memory_info()[0] / float(2**20))

Output:

Current memory:  1301.67578125
Current memory:  2059.98046875
Current memory:  2895.0234375
Current memory:  3668.140625
Current memory:  3556.203125
Current memory:  4434.28515625
Current memory:  4428.390625
Current memory:  4432.43359375
Current memory:  3522.76953125
Current memory:  3351.46484375

Environment

Output of collect.py

ollecting environment information...
PyTorch version: 1.3.1
Is debug build: No
CUDA used to build PyTorch: None

OS: Mac OSX 10.14.6
GCC version: Could not collect
CMake version: Could not collect

Python version: 3.7
Is CUDA available: No
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA

Versions of relevant libraries:
[pip3] numpy==1.17.4
[pip3] torch==1.3.1
[pip3] torchvision==0.4.2

The text was updated successfully, but these errors were encountered:

fmassa · 2019-12-19T15:01:43Z

Hi,

I believe this is fixed in #1657

Can you try with a latest PyTorch / torchvision nightly and report back if this hasn't been fixed?

MichelHalmes · 2019-12-19T16:44:22Z

Indeed, it's fixed on the nightly. 👍
Much appreciated!

fmassa closed this as completed Dec 19, 2019

fmassa added module: models topic: object detection labels Dec 19, 2019

datumbox mentioned this issue Apr 14, 2021

Caching disabled on AnchorGenerator #3667

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leak in forward pass on FasterRCNN with varying aspect ratio #1689

Memory leak in forward pass on FasterRCNN with varying aspect ratio #1689

MichelHalmes commented Dec 18, 2019 •

edited

Loading

fmassa commented Dec 19, 2019

MichelHalmes commented Dec 19, 2019

Memory leak in forward pass on FasterRCNN with varying aspect ratio #1689

Memory leak in forward pass on FasterRCNN with varying aspect ratio #1689

Comments

MichelHalmes commented Dec 18, 2019 • edited Loading

Environment

fmassa commented Dec 19, 2019

MichelHalmes commented Dec 19, 2019

MichelHalmes commented Dec 18, 2019 •

edited

Loading