-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak when subscribing to /videofile/image_raw topic #15
Comments
I think that the problem is somewhat simpler than what I believed originally. In the launchfile I left the buffer_queue_size parameter at it's default value of 1000 and I'm using a full HD video that produces 1920x1080 RGB images on a Odroid XU4 with 2Gb of RAM. This means 6 megs per image @ 25fps = 140 megs per second and an OOM kill after maybe 15 seconds on average vs. 40 seconds capacity of framesQueue. There is just one point which leaves me somewhat perplexed: why all this happens only if I subscribe to the /videofile/image_raw topic? I'm pretty sure this behavior is triggered by ROS that skips useless stuff if nobody is listening on the topic. But I'm still not understanding it fully, so I don't know how to solve this. |
As an initial guess, lower the buffer_queue_size of course. Dunno why it happens only when there is a subscriber... if it's a problem of the thread that reads from the videofile, it should happen all the time until the buffer is full. And stop there. With or without subscribers. The queue_size of the publisher is 1 https://github.com/ros-drivers/video_stream_opencv/blob/master/src/video_stream.cpp#L118 . So is not related to that. My mind is somewhere else right now, but, may this line have anything to do? https://github.com/ros-drivers/video_stream_opencv/blob/master/src/video_stream.cpp#L104 It still makes no sense that it happens only when there is a subscriber tho. |
It's so weird... If I run it without any subscriber, I see this: framesQueue.push(frame.clone()) 12442567/18663367 Oscillating between 12442567/18663367 and 18663367/18663367 for, say, ten seconds, then it stays at 12442567/18663367. If I run it with an image_view as subscriber, I see this: framesQueue.push(frame.clone()) 99533767/105754567 and at the end this: framesQueue.push(frame.clone()) 1735604167/1741824967 (This time the OOM killer has been lenient with it.) So no frame drops at all because of queue full (yes, I forgot to instrument the other framesQueue.pull(), the one that normally pops out frames to convert them into ROS messages). I think we can say that producer and consumer here works perfectly well together, but the queue is filled anyway up to 1000. Don't worry about this too much, for the moment I can solve it setting queue size to 10 and maybe adding a ROS_WARN if someone sets the queue so that the memory usage can reach, say, 512 megabytes? Just a curiosity of mine :) |
This is what happens adding a few more couts and one MEASURE_MEMORY() around framesQueue.pull(frame): Queue before framesQueue.pull(frame): 278 I'm starting to believe that boost::sync_queue::pull() is not behaving as expected... |
Can you commit the version with more couts? |
Done. I think that the producer here is just a little bit faster than the consumer. If I put a double camera_fps_rate.sleep() in do_capture(), the queue size reaches 1 at maximum. |
I get:
With a 3200 x 1800 30fps video. I get the same with my webcam at 30 or 60fps. On a laptop, 4 cores Intel(R) Core(TM) i7-6500U CPU @ 2.50GHz |
Eheh because you're using a monster-PC. Try it on a Raspberry Pi :) |
Mmm no, there's something wrong. 68 bytes always makes absolutely no sense... |
Are you compiling with -O2 or -O3? |
I've pulled your last version, tried with the video again (my laptop is dying btw) and now I get:
|
I pulled your branch, I'm using your CMakeLists.txt: |
Also, I get that same 68 with our without subscribers. |
Mmm very strange, let me try it on my desktop. |
Queue after framesQueue.pull(frame): 0 This is what I get on my desktop. Values are stable but != 68 :) |
I'm on a Ubuntu 14.04, 64bits, ROS indigo. Opencv 2.4.8 |
Just tried on a Ubuntu 16.04, 64b, ROS Kinetic, on a VM. Opencv3 from kinetic. |
Mmm, OpenCV 2.4.8 seems to have fastMalloc() as well: https://github.com/opencv/opencv/blob/2.4/modules/core/src/alloc.cpp So I don't know where is the problem. I'm on Ubuntu 16.04, 64bits, Kinetic with OpenCV 3.3.1 (I checked with cv::getBuildInformation() because Ubuntu still installs 2.4.9.1 while ROS pulls in its own package). |
Ok, maybe I'm starting to understand: [ INFO] [1523450336.860524732]: Setting camera FPS to: 30 this is the problem IMHO. My video is 25 fps but in the launchfile I ask for 30 fps, so maybe frames are produced at 30 fps and consumed at 25? |
You are trying to consume at 30fps (Throttling fps). You are producing at 30fps (setting camera fps sets set_camera_fps which is the only used for the sleep in the producer thread). The reporting value is just a print, is not used. |
Hi, This issue still is strange to me as it appears to be a recent problem: I have used this node for several months with fps = 10 and a 25 fps camera without any issues until recently. |
So using the same parameters (fps=10, set_camera_fps=25) on an older version (before implementing a thread using a synch queue), I did not get this memory growing problem. (Older version used: https://github.com/ros-drivers/video_stream_opencv/blob/89b0224b18b6e67860d4ad5c36c993fe7534e199/src/video_stream.cpp). |
So here is my conclusion: the problem comes from Boost. Proof: check my merge request where I basically reimplemented a thread safe queue with mutex using plain std classes. The code does exacly the same thing, except that locks are done once per consum/produce cycle. It fixes as well a possible edge case where the user sets a queue size of 1 and 2 calls of pull can happen on after the other from the 2 different threads, leading the second pull to return either the previous picture (consumer pulls last) or an empty image (producer pulls last). I'm a bit surprised that this was implemented with experimental boost features... |
Thank you very much for your work!!
I'm currently busy (in the Robocup contest) but I'll take a look, try it
and act accordingly as soon as I can.
Thank you again!
…On Thu, Jun 21, 2018, 12:07 Axel13fr ***@***.***> wrote:
So here is my conclusion: the problem comes from Boost. Proof: check my
merge request where I basically reimplemented a thread safe queue with
mutex using plain std classes.
The code does exacly the same thing, except that locks are done once per
consum/produce cycle. It fixes as well a possible edge case where the user
sets a queue size of 1 and 2 calls of pull can happen on after the other
from the 2 different threads, leading the second pull to return either the
previous picture (consumer pulls last) or an empty image (producer pulls
last).
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#15 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABpFdKhrxvovYG8-BA2KJe0OT591SfZ5ks5t-8S1gaJpZM4TPu_B>
.
|
@awesomebytes looking forward to a melodic release :-) |
@Axel13fr re-release on all distros coming soon (it's PR-ed). |
Thanks a lot ! |
Hello ! |
Houston, we have another problem.
There is a memory leak when subscribing to /videofile/image_raw topic. Valgrind sees the memory as always reachable (so technically speaking it's not a true leak), but the process ends up using several gigs of memory. When one stops image_view, memory is released correctly.
If you want to take a look at it, you can clone this branch:
https://github.com/aviogit/video_stream_opencv/tree/memory_leak_when_subscribing_to_slash_videofile_slash_image_raw
I found an excellent piece of code by @BloodAxe here: https://computer-vision-talks.com/hacking-opencv-for-fun-and-profit/
and now I can even see precise numbers. I'll try to understand why it happens, cv_bridge::CvImage::toImageMsg() does a deep copy, so I actually see no reason for this to happen.
http://docs.ros.org/kinetic/api/cv_bridge/html/c++/cv__bridge_8cpp_source.html
I'll try to look into this further...
The text was updated successfully, but these errors were encountered: