-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Understanding sharp mem usage with a disabled cache #429
Comments
Hello, I wonder if what we're seeing here is the effect of memory fragmentation? Given this is Linux, are you able to try with a different memory allocator? If so, add It's probably also worth testing with file-based output via |
Thanks for looking into it :) With jemalloc it's still basically the same trend: https://gist.github.com/papandreou/ef7543f1b79f7d8857c84d0632ec143c -- it ends up with an RSS of 428 MB. With file-based input, on the other hand: https://gist.github.com/papandreou/b71e0f336bf709e743c0889a3db96cb6 -- it's much more stable and ends up at 59 MB after 500 iterations. Switching back to buffer-based input, but only reading the file once instead of at the start of every iteration shows the same low overall memory footprint as the file-based input: https://gist.github.com/papandreou/818a6ffd729697ed6359f9abbdc7100d It seems like we're getting close, but I don't know exactly what to conclude? Either node.js has a memory fragmentation problem, or sharp is leaking buffers, but it's less visible when there's only a single, big buffer in play? |
Reading the file once, but creating a copy of the buffer before each iteration ( |
Seems like it does depend on the node.js version: 0.10.45:
0.12.14:
io.js 2.5.0:
io.js 3.0.0:
io.js 3.3.1:
4.4.4:
5.11.1:
So the problem got introduced with io.js 3.0.0. I guess that makes the Buffer-on-top-of-UintArray8 reimplementation the most likely culprit: https://github.com/nodejs/node/blob/master/doc/changelogs/CHANGELOG_IOJS.md#2015-08-04-version-300-rvagg -- or maybe sharp hasn't correctly adapted to that change. |
Thanks for running the extra tests on the various versions of Node/io.js. The implementation of Buffer within Node has quite an influence on the results! The GCs that occur due to "external memory allocation limit reached" appear to reduce the RSS by a much greater value when using jemalloc, which suggests this is at least partly a memory fragmentation problem. sharp uses Nan::NewBuffer to create the Buffer object, providing a custom callback to guarantee we free the memory in the same shared library as it was malloc'd (to keep Windows happy). You can track the GC of sharp-created Buffer objects by adding some printf-style "debugging" to the FreeCallback to show when backing data is freed. |
For completeness -- it seems like node.js 6.2.0 improves on this quite a bit: 6.2.0example before iteration #1 { rss: 52207616, heapTotal: 12578816, heapUsed: 7073760 }
example before iteration #2 { rss: 69861376, heapTotal: 12578816, heapUsed: 8019312 }
[...]
example before iteration #100 { rss: 117493760, heapTotal: 10481664, heapUsed: 6657384 } |
Thank you for this thorough testing @papandreou ! Looking through recent Node commits, I think the change in V8 API used when cleaning up Buffer objects as part of nodejs/node#5204 might have improved things. |
I guess that fully explains the symptoms originally observed by @janaz. It might make sense to discourage the use of io.js 3.x and node.js 4.x + 5.x in the README. It's of course unfortunate that the current node.js LTS series has memory fragmentation issues, so it might also make sense to ping about this, in case they aren't already aware. The really funny thing is that the issue I'm having occurs with node.js 0.10, so it's also back to the drawing board for me :) |
nodejs/node#5204 has the |
Ok, I think I've found another problem that occurs with node.js 0.10 as well -- the memory utilization pattern changes drastically when I change the test script to use stream-based input (which my actual app does): With Buffer-based input: http://pastebin.com/Ny0xyD4m This is the difference between the two test runs: papandreou/sharp-mem-usage@5c97369 This looks rather serious as well, any ideas? |
Using Stream-based input usually results in the creation of a lot of small-but-temporary Buffer objects, which might exacerbate any memory fragmentation. Is this causing the process to crash? Have you tried experimenting with V8's "new space" ("semi space" in recent versions) and "old space" heap settings? |
Please can you test the effects of commit 6e31318 on the stream-concat-429 branch, which should reduce the number of memory allocations when using Stream-based input. |
No, it's not causing the process to crash. That goes for all the test runs I've done and the service I have in production. But both end up having a very high memory footprint, and garbage collections don't bring down the RSS substantially.
No, I haven't tried that yet. Will do that and let you know.
It's a big improvement compared to the stream-based input test I did yesterday: http://pastebin.com/KH2vUazn Still, that RSS is not coming down :/ |
That last test, and the ones from yesterday, were all performed on node.js 0.10.45, by the way. |
I eliminated the use of fs.ReadStream, while still using the stream-based API against the stream-concat-429 branch. The RSS growth rate decreases, but it still isn't coming down after a GC and ends up at 312 MB after 100 iterations: http://pastebin.com/dFizUr8k I'm reading the image file once, but creating a new |
Hmm, the symptoms actually persist with Buffer-based input as long as I create a new Buffer instance for each iteration: http://pastebin.com/1UKExJ16 |
High RSS isn't necessarily a bad sign. It could mean the C runtime has yet to release some memory blocks back to the OS, perhaps because they are still partially in use. Are you able to try a tool like smem to split the reported RSS into unique vs proportion-of-shared? |
You're probably using this already, but vips has some simple things to report memory usage: I expect sharp exposes these (can't remember). You can get current number of allocations, total allocated space, and peak memory use. This only includes memory directly allocated by vips for pixel buffers (usually the great bulk of vips memuse). These stats would show how much of the RSS use you are seeing is due to fragmentation or brk() not handing stuff back to the host OS. Of course it would be no help at all with the tricky question of working out why references are being kept longer than they should :-( If you build vips with debugging on, you can get a table of all objects and their counts. These tables are only useful for very small cases though. |
Tweaking An interesting observation: If I keep creating a new Buffer instance for each iteration, but pass the original Buffer to sharp each time, the RSS grows at approximately the same rate, but the mark-sweep GCs do reclaim the memory: http://pastebin.com/JzB0K0iP That seems to indicate that sharp or libvips does have a hand in it not being released, right? |
Here's the output of I don't know how to interpret that? |
If I hack in some printfs that report the Before the first iteration it reports:
and from then on it always comes out as:
|
Picking up on a previous piece of advice: If I add logging to the |
It sounds like it's the V8 GC not freeing stuff as quickly as you'd think it might. I've observed similar things with the Ruby and Python bindings. Ruby has a generational mark-sweep GC, rather like V8 (I think?), and it's tricky to get it to do the right thing. Executing a long-running program in constant memory requires some manual GC triggers. Python has a very annoying reference counting system (annoying in that it's difficult for extensions to get right) combined with generational mark-sweep to break cycles. But because of the ref counting, Python will usually free things as soon as you'd expect it to. It's much simpler to get predictable memory behaviour. |
The RSS output (in pages) from If we take the
The use of 2x semi-spaces for "new" generation in more recent versions of V8 means a slightly faster scavenger strategy can be used for short-lived objects. Mark+sweep is still used for the single "old" generation. I'm unsure which version of Node introduced the semi-space concept. If you know you're going to have lots of short-lived Buffers, even though their backing store is not on the V8 heap, perhaps increasing the new/semi space might allow these Buffer objects to be scavenged before they reach "old" generation. Commit 331926d adds the change to improve perf (and reduce allocations) for Stream-based input, thanks for confirming you too see an improvement. |
Sorry that I didn't make clear that ever since I went back to test with 0.10.45, I've run And yeah, that discrepancy puzzled me as well. |
I no longer believe this problem is sharp's fault as I've seen similar behavior in reduced cases where only JavaScript and Thanks for all the help, @lovell & @jcupitt! I'm very happy that this project is in such good hands. |
We've found two memory-gotchas in libvips in the last couple of weeks. The changes will be in 8.4 and ought to make at least a small difference in this case. |
I finally managed to update that application from node.js 0.10.45 to 6.3.0 in production. The RSS behavior is much better, but there's still a leak: The colored lines depict the RSS of each worker process in the app. The bottom greenish-yellow line is the RSS of a master process. It looks like node.js 6.3 manages memory more efficiently, but it's just the "amplitude" that was lowered as a result of the upgrade. The leak pattern still looks the same. The JavaScript heap size is stable, so I'm still suspecting node.js itself. @jcupitt, it would be nice to try out those memory-related fixes as well. When is 8.4 due? Any chance they can be backported to 8.3? |
8.4 ought to be within a week, the last thing has been merged and it's just final tweaks, updating all the docs and updating the various builds. |
@jcupitt That sounds great -- thanks! |
Update: I've recently updated from node.js 6.3.0 to 7.8.0, and that has changed the memory footprint and RSS pattern dramatically. I assume that has to do with the improvements mentioned in https://v8project.blogspot.dk/2016/09/v8-release-54.html Also, I've found that specifying a soft memory limit using linux cgroups to be effective. |
I'm seeing some weird memory usage patterns on a production system that does on-demand image processing after we switched to sharp recently. I found #349 and #260 which seemed very similar, and was happy that the matter seemed to be concluded to everyone's satisfaction.
As far as I can tell, there are still problems, though. I changed @janaz's test setup (thank you very much for putting that online) to use the latest sharp master commit, the latest libvips master commit, disable sharp's cache via
sharp.cache(false)
, and run more iterations of the test: https://github.com/papandreou/sharp-mem-usage/tree/fiddleAroundA
make tests
run in my fork yielded this output: https://gist.github.com/papandreou/27fde30202fe778f44dbaff9d7a39f63The RSS starts at 33 MB, raises to 42 MB after the first iteration, and adds about 10 MB per iteration for a bit, slows down, plateaus at 275 MB for about 30 iterations, then a full mark-sweep garbage collection of the JavaScript heap brings it down to 177 MB, the pattern repeats itself a few times, and then it starts a steady increase and approaches 500 MB near the end of the 500 iterations.
... There is something going on :). I don't know exactly how to interpret the results, but I would expect the RSS to come down to about the initial level after a full mark-sweep GC.
The
outfit
branch shows the same trend.The text was updated successfully, but these errors were encountered: