Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak when subscribing to /videofile/image_raw topic #15

Open
aviogit opened this issue Apr 11, 2018 · 28 comments
Open

Memory leak when subscribing to /videofile/image_raw topic #15

aviogit opened this issue Apr 11, 2018 · 28 comments

Comments

@aviogit
Copy link
Contributor

aviogit commented Apr 11, 2018

Houston, we have another problem.

There is a memory leak when subscribing to /videofile/image_raw topic. Valgrind sees the memory as always reachable (so technically speaking it's not a true leak), but the process ends up using several gigs of memory. When one stops image_view, memory is released correctly.

If you want to take a look at it, you can clone this branch:

https://github.com/aviogit/video_stream_opencv/tree/memory_leak_when_subscribing_to_slash_videofile_slash_image_raw

I found an excellent piece of code by @BloodAxe here: https://computer-vision-talks.com/hacking-opencv-for-fun-and-profit/

and now I can even see precise numbers. I'll try to understand why it happens, cv_bridge::CvImage::toImageMsg() does a deep copy, so I actually see no reason for this to happen.

http://docs.ros.org/kinetic/api/cv_bridge/html/c++/cv__bridge_8cpp_source.html

I'll try to look into this further...

@aviogit
Copy link
Contributor Author

aviogit commented Apr 11, 2018

I think that the problem is somewhat simpler than what I believed originally. In the launchfile I left the buffer_queue_size parameter at it's default value of 1000 and I'm using a full HD video that produces 1920x1080 RGB images on a Odroid XU4 with 2Gb of RAM. This means 6 megs per image @ 25fps = 140 megs per second and an OOM kill after maybe 15 seconds on average vs. 40 seconds capacity of framesQueue.

There is just one point which leaves me somewhat perplexed: why all this happens only if I subscribe to the /videofile/image_raw topic? I'm pretty sure this behavior is triggered by ROS that skips useless stuff if nobody is listening on the topic. But I'm still not understanding it fully, so I don't know how to solve this.

@awesomebytes
Copy link
Member

As an initial guess, lower the buffer_queue_size of course.

Dunno why it happens only when there is a subscriber... if it's a problem of the thread that reads from the videofile, it should happen all the time until the buffer is full. And stop there. With or without subscribers.

The queue_size of the publisher is 1 https://github.com/ros-drivers/video_stream_opencv/blob/master/src/video_stream.cpp#L118 . So is not related to that.

My mind is somewhere else right now, but, may this line have anything to do?

https://github.com/ros-drivers/video_stream_opencv/blob/master/src/video_stream.cpp#L104

It still makes no sense that it happens only when there is a subscriber tho.

@aviogit
Copy link
Contributor Author

aviogit commented Apr 11, 2018

It's so weird... If I run it without any subscriber, I see this:

framesQueue.push(frame.clone()) 12442567/18663367
framesQueue.push(frame.clone()) 12442567/18663367
framesQueue.push(frame.clone()) 12442567/18663367
framesQueue.push(frame.clone()) 18663367/18663367
framesQueue.push(frame.clone()) 18663367/18663367
framesQueue.push(frame.clone()) 18663367/18663367
framesQueue.push(frame.clone()) 18663367/18663367
framesQueue.push(frame.clone()) 18663367/18663367
framesQueue.push(frame.clone()) 18663367/18663367
framesQueue.push(frame.clone()) 12442567/18663367
framesQueue.push(frame.clone()) 12442567/18663367
framesQueue.push(frame.clone()) 12442567/18663367

Oscillating between 12442567/18663367 and 18663367/18663367 for, say, ten seconds, then it stays at 12442567/18663367.

If I run it with an image_view as subscriber, I see this:

framesQueue.push(frame.clone()) 99533767/105754567
framesQueue.push(frame.clone()) 99533767/105754567
framesQueue.push(frame.clone()) 99533767/105754567
framesQueue.push(frame.clone()) 105754567/111975367
framesQueue.push(frame.clone()) 105754567/111975367
framesQueue.push(frame.clone()) 105754567/111975367
framesQueue.push(frame.clone()) 111975367/111975367
framesQueue.push(frame.clone()) 111975367/111975367

and at the end this:

framesQueue.push(frame.clone()) 1735604167/1741824967
framesQueue.push(frame.clone()) 1735604167/1741824967
framesQueue.push(frame.clone()) 1741824967/1741824967
framesQueue.push(frame.clone()) 1741824967/1748045767
framesQueue.push(frame.clone()) 1741824967/1748045767
framesQueue.push(frame.clone()) 1748045767/1754266567
framesQueue.push(frame.clone()) 1748045767/1754266567
framesQueue.push(frame.clone()) 1716941767/1716941767
framesQueue.push(frame.clone()) 1710720967/1710720967
framesQueue.push(frame.clone()) 1710720967/1710720967
framesQueue.push(frame.clone()) 1710720967/1710720967
framesQueue.push(frame.clone()) 1710720967/1716941767

(This time the OOM killer has been lenient with it.)

So no frame drops at all because of queue full (yes, I forgot to instrument the other framesQueue.pull(), the one that normally pops out frames to convert them into ROS messages).

I think we can say that producer and consumer here works perfectly well together, but the queue is filled anyway up to 1000.

Don't worry about this too much, for the moment I can solve it setting queue size to 10 and maybe adding a ROS_WARN if someone sets the queue so that the memory usage can reach, say, 512 megabytes? Just a curiosity of mine :)

@aviogit
Copy link
Contributor Author

aviogit commented Apr 11, 2018

This is what happens adding a few more couts and one MEASURE_MEMORY() around framesQueue.pull(frame):

Queue before framesQueue.pull(frame): 278
framesQueue.pull(frame) 1748045767/1741824967
Queue after framesQueue.pull(frame): 277
framesQueue.push(frame.clone()) 1741824967/1741824967
Queue before framesQueue.pull(frame): 278
framesQueue.pull(frame) 1748045767/1741824967
Queue after framesQueue.pull(frame): 277
framesQueue.push(frame.clone()) 1741824967/1741824967
Queue before framesQueue.pull(frame): 278
framesQueue.pull(frame) 1748045767/1741824967
Queue after framesQueue.pull(frame): 277
framesQueue.push(frame.clone()) 1741824967/1741824967
framesQueue.push(frame.clone()) 1741824967/1748045767
Queue before framesQueue.pull(frame): 279
framesQueue.pull(frame) 1748045767/1741824967
Queue after framesQueue.pull(frame): 278
framesQueue.push(frame.clone()) 1741824967/1748045767
framesQueue.push(frame.clone()) 1748045767/1754266567
[videofile/videofile_stream-1] process has died [pid 13121, exit code -9, cmd /mnt/data/workspace/ros/devel/lib/video_stream_opencv/video_stream camera:=image_raw __name:=videofile_stream __log:=/home/pelan/.ros/log/25c39d52-3cc9-11e8-845e-001e0631df58/videofile-videofile_stream-1.log].
log file: /home/pelan/.ros/log/25c39d52-3cc9-11e8-845e-001e0631df58/videofile-videofile_stream-1*.log
all processes on machine have died, roslaunch will exit
shutting down processing monitor...
... shutting down processing monitor complete
done

I'm starting to believe that boost::sync_queue::pull() is not behaving as expected...

@awesomebytes
Copy link
Member

Can you commit the version with more couts?

@aviogit
Copy link
Contributor Author

aviogit commented Apr 11, 2018

Done. I think that the producer here is just a little bit faster than the consumer. If I put a double camera_fps_rate.sleep() in do_capture(), the queue size reaches 1 at maximum.

@awesomebytes
Copy link
Member

awesomebytes commented Apr 11, 2018

I get:

framesQueue.push(frame.clone())	68/68
framesQueue.push(frame.clone())	68/68
framesQueue.pull(frame)	68/68
framesQueue.pull(frame)	68/68
framesQueue.push(frame.clone())	68/68
framesQueue.push(frame.clone())	68/68

With a 3200 x 1800 30fps video.

I get the same with my webcam at 30 or 60fps.

On a laptop, 4 cores Intel(R) Core(TM) i7-6500U CPU @ 2.50GHz

@aviogit
Copy link
Contributor Author

aviogit commented Apr 11, 2018

Eheh because you're using a monster-PC. Try it on a Raspberry Pi :)

@aviogit
Copy link
Contributor Author

aviogit commented Apr 11, 2018

Mmm no, there's something wrong. 68 bytes always makes absolutely no sense...

@aviogit
Copy link
Contributor Author

aviogit commented Apr 11, 2018

Are you compiling with -O2 or -O3?

@awesomebytes
Copy link
Member

I've pulled your last version, tried with the video again (my laptop is dying btw) and now I get:

Queue before framesQueue.pull(frame): 1
framesQueue.pull(frame)	68/68
Queue after framesQueue.pull(frame): 0
framesQueue.push(frame.clone())	68/68
Queue before framesQueue.pull(frame): 1

framesQueue.pull(frame)	68/68
Queue after framesQueue.pull(frame): 0
framesQueue.push(frame.clone())	68/68

Queue before framesQueue.pull(frame): 1
framesQueue.pull(frame)	68/68
Queue after framesQueue.pull(frame): 0

framesQueue.push(frame.clone())	68/68

Queue before framesQueue.pull(frame): 1
framesQueue.pull(frame)	68/68
Queue after framesQueue.pull(frame): 0
framesQueue.push(frame.clone())	68/68
Queue before framesQueue.pull(frame): 1
framesQueue.pull(frame)	68/68
Queue after framesQueue.pull(frame): 0
framesQueue.push(frame.clone())	68/68
Queue before framesQueue.pull(frame): 1
framesQueue.pull(frame)	68/68
Queue after framesQueue.pull(frame): 0
framesQueue.push(frame.clone())	68/68
Queue before framesQueue.pull(frame): 1
framesQueue.pull(frame)	68/68
Queue after framesQueue.pull(frame): 0
framesQueue.push(frame.clone())	68/68
framesQueue.push(frame.clone())	68/68
framesQueue.push(frame.clone())	68/68
Queue before framesQueue.pull(frame): 3
framesQueue.pull(frame)	68/68
Queue after framesQueue.pull(frame): 2
framesQueue.push(frame.clone())	68/68
Queue before framesQueue.pull(frame): 3
framesQueue.pull(frame)	68/68
Queue after framesQueue.pull(frame): 2
framesQueue.push(frame.clone())	68/68
Queue before framesQueue.pull(frame): 3
framesQueue.pull(frame)	68/68
Queue after framesQueue.pull(frame): 2
framesQueue.push(frame.clone())	68/68
Queue before framesQueue.pull(frame): 3

@awesomebytes
Copy link
Member

I pulled your branch, I'm using your CMakeLists.txt: set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -g3 -gdwarf-2 -O0 -std=c++11")

@awesomebytes
Copy link
Member

Also, I get that same 68 with our without subscribers.

@aviogit
Copy link
Contributor Author

aviogit commented Apr 11, 2018

Mmm very strange, let me try it on my desktop.

@aviogit
Copy link
Contributor Author

aviogit commented Apr 11, 2018

Queue after framesQueue.pull(frame): 0
framesQueue.push(frame.clone()) 18663623/18663623
Queue before framesQueue.pull(frame): 1
framesQueue.pull(frame) 24884423/18663623
Queue after framesQueue.pull(frame): 0
framesQueue.push(frame.clone()) 18663623/18663623
Queue before framesQueue.pull(frame): 1
framesQueue.pull(frame) 24884423/18663623
Queue after framesQueue.pull(frame): 0
framesQueue.push(frame.clone()) 18663623/18663623
Queue before framesQueue.pull(frame): 1
framesQueue.pull(frame) 24884423/18663623
Queue after framesQueue.pull(frame): 0
framesQueue.push(frame.clone()) 18663623/18663623

This is what I get on my desktop. Values are stable but != 68 :)

@awesomebytes
Copy link
Member

I'm on a Ubuntu 14.04, 64bits, ROS indigo. Opencv 2.4.8

@awesomebytes
Copy link
Member

Just tried on a Ubuntu 16.04, 64b, ROS Kinetic, on a VM. Opencv3 from kinetic.
And the numbers are more similar to yours... 11029703/8272583
And so (cant copy paste).

@aviogit
Copy link
Contributor Author

aviogit commented Apr 11, 2018

Mmm, OpenCV 2.4.8 seems to have fastMalloc() as well: https://github.com/opencv/opencv/blob/2.4/modules/core/src/alloc.cpp

So I don't know where is the problem. I'm on Ubuntu 16.04, 64bits, Kinetic with OpenCV 3.3.1 (I checked with cv::getBuildInformation() because Ubuntu still installs 2.4.9.1 while ROS pulls in its own package).

@aviogit
Copy link
Contributor Author

aviogit commented Apr 11, 2018

Ok, maybe I'm starting to understand:

[ INFO] [1523450336.860524732]: Setting camera FPS to: 30
[ INFO] [1523450336.860832980]: Camera reports FPS: 25
[ INFO] [1523450336.864597742]: Setting buffer size for capturing frames to: 1000
[ INFO] [1523450336.868807250]: Throttling to fps: 30

this is the problem IMHO. My video is 25 fps but in the launchfile I ask for 30 fps, so maybe frames are produced at 30 fps and consumed at 25?

@awesomebytes
Copy link
Member

You are trying to consume at 30fps (Throttling fps). You are producing at 30fps (setting camera fps sets set_camera_fps which is the only used for the sleep in the producer thread).

The reporting value is just a print, is not used.

@Axel13fr
Copy link
Contributor

Axel13fr commented Jun 21, 2018

Hi,
I could reproduce the growing memory problem as well (Kinetic, Ubu 16.04 64 Bits). When I set fps parameter < Camera Reports FPS, I get the issue. In my example, I had fps = 10 and my camera reported 25fps. Setting fps to 25 solved the issue. I have a proper core i7 with 8GB RAM and even a buffer queue size of 5 didn't stop the leak.

This issue still is strange to me as it appears to be a recent problem: I have used this node for several months with fps = 10 and a 25 fps camera without any issues until recently.

@Axel13fr
Copy link
Contributor

So using the same parameters (fps=10, set_camera_fps=25) on an older version (before implementing a thread using a synch queue), I did not get this memory growing problem.

(Older version used: https://github.com/ros-drivers/video_stream_opencv/blob/89b0224b18b6e67860d4ad5c36c993fe7534e199/src/video_stream.cpp).

@Axel13fr
Copy link
Contributor

Axel13fr commented Jun 21, 2018

So here is my conclusion: the problem comes from Boost. Proof: check my merge request where I basically reimplemented a thread safe queue with mutex using plain std classes.

The code does exacly the same thing, except that locks are done once per consum/produce cycle. It fixes as well a possible edge case where the user sets a queue size of 1 and 2 calls of pull can happen on after the other from the 2 different threads, leading the second pull to return either the previous picture (consumer pulls last) or an empty image (producer pulls last).

I'm a bit surprised that this was implemented with experimental boost features...

@awesomebytes
Copy link
Member

awesomebytes commented Jun 21, 2018 via email

@Axel13fr
Copy link
Contributor

@awesomebytes looking forward to a melodic release :-)

@awesomebytes
Copy link
Member

@Axel13fr re-release on all distros coming soon (it's PR-ed).

@Axel13fr
Copy link
Contributor

Thanks a lot !

@lchop
Copy link

lchop commented Sep 3, 2019

Hello !
Thanks for your work ! I have a memory leak using your package streaming an rtsp stream.
I have played the stream all night and in the morning the node had crash because of lack of memory. I think it is related to this issue: #15
This issue is not solved ?
I am running on ROS kinetic. I have a camera streaming at 10 fps and I have set the fps param of your node at 30. I think that is the reason leading to the memory leak. Any updates ?
Thank you !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants