Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understanding sharp mem usage with a disabled cache #429

Closed
papandreou opened this issue May 17, 2016 · 31 comments
Closed

Understanding sharp mem usage with a disabled cache #429

papandreou opened this issue May 17, 2016 · 31 comments
Labels

Comments

@papandreou
Copy link
Contributor

I'm seeing some weird memory usage patterns on a production system that does on-demand image processing after we switched to sharp recently. I found #349 and #260 which seemed very similar, and was happy that the matter seemed to be concluded to everyone's satisfaction.

As far as I can tell, there are still problems, though. I changed @janaz's test setup (thank you very much for putting that online) to use the latest sharp master commit, the latest libvips master commit, disable sharp's cache via sharp.cache(false), and run more iterations of the test: https://github.com/papandreou/sharp-mem-usage/tree/fiddleAround

A make tests run in my fork yielded this output: https://gist.github.com/papandreou/27fde30202fe778f44dbaff9d7a39f63

The RSS starts at 33 MB, raises to 42 MB after the first iteration, and adds about 10 MB per iteration for a bit, slows down, plateaus at 275 MB for about 30 iterations, then a full mark-sweep garbage collection of the JavaScript heap brings it down to 177 MB, the pattern repeats itself a few times, and then it starts a steady increase and approaches 500 MB near the end of the 500 iterations.

... There is something going on :). I don't know exactly how to interpret the results, but I would expect the RSS to come down to about the initial level after a full mark-sweep GC.

The outfit branch shows the same trend.

@lovell
Copy link
Owner

lovell commented May 17, 2016

Hello, I wonder if what we're seeing here is the effect of memory fragmentation?

Given this is Linux, are you able to try with a different memory allocator? If so, add jemalloc to the installed packages then use LD_PRELOAD=/path/to/lib/libjemalloc.so.1 node --trace-gc test to run the tests within the container.

It's probably also worth testing with file-based output via toFile instead of toBuffer to see if this alters the memory behaviour. This will help confirm/deny the involvement of Node Buffer objects.

@papandreou
Copy link
Contributor Author

Thanks for looking into it :)

With jemalloc it's still basically the same trend: https://gist.github.com/papandreou/ef7543f1b79f7d8857c84d0632ec143c -- it ends up with an RSS of 428 MB.

With file-based input, on the other hand: https://gist.github.com/papandreou/b71e0f336bf709e743c0889a3db96cb6 -- it's much more stable and ends up at 59 MB after 500 iterations.

Switching back to buffer-based input, but only reading the file once instead of at the start of every iteration shows the same low overall memory footprint as the file-based input: https://gist.github.com/papandreou/818a6ffd729697ed6359f9abbdc7100d

It seems like we're getting close, but I don't know exactly what to conclude? Either node.js has a memory fragmentation problem, or sharp is leaking buffers, but it's less visible when there's only a single, big buffer in play?

@papandreou
Copy link
Contributor Author

Reading the file once, but creating a copy of the buffer before each iteration (image => Buffer.concat([image])) makes the RSS climb again.

@papandreou
Copy link
Contributor Author

papandreou commented May 17, 2016

Seems like it does depend on the node.js version:

0.10.45:

example before iteration #1 { rss: 35733504, heapTotal: 10324992, heapUsed: 5621720 }
example before iteration #2 { rss: 48336896, heapTotal: 11344896, heapUsed: 5808192 }
[...]
example before iteration #100 { rss: 60399616, heapTotal: 11344896, heapUsed: 5716080 }

0.12.14:

example before iteration #1 { rss: 44343296, heapTotal: 12835584, heapUsed: 6608768 }
example before iteration #2 { rss: 54784000, heapTotal: 12835584, heapUsed: 6851904 }
[...]
example before iteration #100 { rss: 63778816, heapTotal: 17029888, heapUsed: 6777864 }

io.js 2.5.0:

example before iteration #1 { rss: 43737088, heapTotal: 14423040, heapUsed: 6252816 }
example before iteration #2 { rss: 56307712, heapTotal: 14423040, heapUsed: 6496480 }
[...]
example before iteration #100 { rss: 66166784, heapTotal: 18550784, heapUsed: 7190832 }

io.js 3.0.0:

example before iteration #1 { rss: 49037312, heapTotal: 11327232, heapUsed: 6800992 }
[...]
example before iteration #99 { rss: 481263616, heapTotal: 17518848, heapUsed: 6402736 }
example before iteration #100 { rss: 485507072, heapTotal: 17518848, heapUsed: 6423008 }

io.js 3.3.1:

example before iteration #1 { rss: 50425856, heapTotal: 11327232, heapUsed: 6976832 }
example before iteration #2 { rss: 67518464, heapTotal: 11327232, heapUsed: 7215952 }
[...]
example before iteration #100 { rss: 486883328, heapTotal: 17518848, heapUsed: 6503456 }

4.4.4:

example before iteration #1 { rss: 51679232, heapTotal: 11536480, heapUsed: 6891464 }
example before iteration #2 { rss: 68751360, heapTotal: 11536480, heapUsed: 7790792 }
[...]
example before iteration #100 { rss: 230236160, heapTotal: 9489248, heapUsed: 6491544 }

5.11.1:

example before iteration #1 { rss: 51593216, heapTotal: 10619424, heapUsed: 6905552 }
example before iteration #2 { rss: 69070848, heapTotal: 11651360, heapUsed: 7795328 }
[...]
example before iteration #100 { rss: 230211584, heapTotal: 10624032, heapUsed: 6702000 }

So the problem got introduced with io.js 3.0.0. I guess that makes the Buffer-on-top-of-UintArray8 reimplementation the most likely culprit: https://github.com/nodejs/node/blob/master/doc/changelogs/CHANGELOG_IOJS.md#2015-08-04-version-300-rvagg -- or maybe sharp hasn't correctly adapted to that change.

@lovell
Copy link
Owner

lovell commented May 17, 2016

Thanks for running the extra tests on the various versions of Node/io.js. The implementation of Buffer within Node has quite an influence on the results!

The GCs that occur due to "external memory allocation limit reached" appear to reduce the RSS by a much greater value when using jemalloc, which suggests this is at least partly a memory fragmentation problem.

sharp uses Nan::NewBuffer to create the Buffer object, providing a custom callback to guarantee we free the memory in the same shared library as it was malloc'd (to keep Windows happy).

You can track the GC of sharp-created Buffer objects by adding some printf-style "debugging" to the FreeCallback to show when backing data is freed.

@papandreou
Copy link
Contributor Author

For completeness -- it seems like node.js 6.2.0 improves on this quite a bit:

6.2.0

example before iteration #1 { rss: 52207616, heapTotal: 12578816, heapUsed: 7073760 }
example before iteration #2 { rss: 69861376, heapTotal: 12578816, heapUsed: 8019312 }
[...]
example before iteration #100 { rss: 117493760, heapTotal: 10481664, heapUsed: 6657384 }

@lovell
Copy link
Owner

lovell commented May 18, 2016

Thank you for this thorough testing @papandreou !

Looking through recent Node commits, I think the change in V8 API used when cleaning up Buffer objects as part of nodejs/node#5204 might have improved things.

@papandreou
Copy link
Contributor Author

I guess that fully explains the symptoms originally observed by @janaz. It might make sense to discourage the use of io.js 3.x and node.js 4.x + 5.x in the README. It's of course unfortunate that the current node.js LTS series has memory fragmentation issues, so it might also make sense to ping about this, in case they aren't already aware.

The really funny thing is that the issue I'm having occurs with node.js 0.10, so it's also back to the drawing board for me :)

@lovell
Copy link
Owner

lovell commented May 23, 2016

nodejs/node#5204 has the dont-land-on-v4.x label, which I suspect means that the version of V8 in Node v4 doesn't support the API being used, or at least that the change is considered major enough to break other things. Not sure what we can do here, other than your most excellent suggestion of documenting it :)

@papandreou
Copy link
Contributor Author

papandreou commented May 23, 2016

Ok, I think I've found another problem that occurs with node.js 0.10 as well -- the memory utilization pattern changes drastically when I change the test script to use stream-based input (which my actual app does):

With Buffer-based input: http://pastebin.com/Ny0xyD4m
With stream-based input: http://pastebin.com/dhYwM4kK

This is the difference between the two test runs: papandreou/sharp-mem-usage@5c97369

This looks rather serious as well, any ideas?

@papandreou papandreou reopened this May 23, 2016
@lovell
Copy link
Owner

lovell commented May 23, 2016

Using Stream-based input usually results in the creation of a lot of small-but-temporary Buffer objects, which might exacerbate any memory fragmentation.

Is this causing the process to crash?

Have you tried experimenting with V8's "new space" ("semi space" in recent versions) and "old space" heap settings?

@lovell
Copy link
Owner

lovell commented May 23, 2016

Please can you test the effects of commit 6e31318 on the stream-concat-429 branch, which should reduce the number of memory allocations when using Stream-based input.

@papandreou
Copy link
Contributor Author

Is this causing the process to crash?

No, it's not causing the process to crash. That goes for all the test runs I've done and the service I have in production. But both end up having a very high memory footprint, and garbage collections don't bring down the RSS substantially.

Have you tried experimenting with V8's "new space" ("semi space" in recent versions) and "old space" heap settings?

No, I haven't tried that yet. Will do that and let you know.

Please can you test the effects of commit 6e31318 on the stream-concat-429 branch, which should reduce the number of memory allocations when using Stream-based input.

It's a big improvement compared to the stream-based input test I did yesterday: http://pastebin.com/KH2vUazn

Still, that RSS is not coming down :/

@papandreou
Copy link
Contributor Author

That last test, and the ones from yesterday, were all performed on node.js 0.10.45, by the way.

@papandreou
Copy link
Contributor Author

I eliminated the use of fs.ReadStream, while still using the stream-based API against the stream-concat-429 branch. The RSS growth rate decreases, but it still isn't coming down after a GC and ends up at 312 MB after 100 iterations: http://pastebin.com/dFizUr8k

I'm reading the image file once, but creating a new Buffer instance with it for each iteration. Could that be important?

@papandreou
Copy link
Contributor Author

Hmm, the symptoms actually persist with Buffer-based input as long as I create a new Buffer instance for each iteration: http://pastebin.com/1UKExJ16

@lovell
Copy link
Owner

lovell commented May 24, 2016

High RSS isn't necessarily a bad sign. It could mean the C runtime has yet to release some memory blocks back to the OS, perhaps because they are still partially in use.

http://stackoverflow.com/questions/1421491/does-calling-free-or-delete-ever-release-memory-back-to-the-system

Are you able to try a tool like smem to split the reported RSS into unique vs proportion-of-shared?

@jcupitt
Copy link
Contributor

jcupitt commented May 24, 2016

You're probably using this already, but vips has some simple things to report memory usage:

http://www.vips.ecs.soton.ac.uk/supported/current/doc/html/libvips/libvips-memory.html#vips-tracked-get-mem

I expect sharp exposes these (can't remember). You can get current number of allocations, total allocated space, and peak memory use. This only includes memory directly allocated by vips for pixel buffers (usually the great bulk of vips memuse).

These stats would show how much of the RSS use you are seeing is due to fragmentation or brk() not handing stuff back to the host OS.

Of course it would be no help at all with the tricky question of working out why references are being kept longer than they should :-( If you build vips with debugging on, you can get a table of all objects and their counts. These tables are only useful for very small cases though.

@papandreou
Copy link
Contributor Author

Tweaking --max-old-space-size does not seem to have any effect. I suspect that's because the JavaScript heap usage is so low.

An interesting observation: If I keep creating a new Buffer instance for each iteration, but pass the original Buffer to sharp each time, the RSS grows at approximately the same rate, but the mark-sweep GCs do reclaim the memory: http://pastebin.com/JzB0K0iP

That seems to indicate that sharp or libvips does have a hand in it not being released, right?

@papandreou
Copy link
Contributor Author

Here's the output of smem mixed in with a test run: https://gist.github.com/papandreou/5db6e17c86390eef98333f8cba7925de

I don't know how to interpret that?

@papandreou
Copy link
Contributor Author

If I hack in some printfs that report the vips_tracked_get_* values before processing each image, it seems like vips does not account for the "lost memory": https://gist.github.com/papandreou/f4f8a717dba0396c857e3cf5e8ff36a6

Before the first iteration it reports:

vips_tracked_get_mem 0
vips_tracked_get_mem_highwater 0
vips_tracked_get_allocs 0
vips_tracked_get_files 0

and from then on it always comes out as:

vips_tracked_get_mem 0
vips_tracked_get_mem_highwater 8484536
vips_tracked_get_allocs 0
vips_tracked_get_files 0

@papandreou
Copy link
Contributor Author

Picking up on a previous piece of advice: If I add logging to the FreeCallback and where Nan::NewBuffer is used, it looks like every buffer allocated like this does get freed eventually: https://gist.github.com/papandreou/acf92947128fb02d9dc6afe745131835

@jcupitt
Copy link
Contributor

jcupitt commented May 24, 2016

It sounds like it's the V8 GC not freeing stuff as quickly as you'd think it might.

I've observed similar things with the Ruby and Python bindings. Ruby has a generational mark-sweep GC, rather like V8 (I think?), and it's tricky to get it to do the right thing. Executing a long-running program in constant memory requires some manual GC triggers.

Python has a very annoying reference counting system (annoying in that it's difficult for extensions to get right) combined with generational mark-sweep to break cycles. But because of the ref counting, Python will usually free things as soon as you'd expect it to. It's much simpler to get predictable memory behaviour.

@lovell
Copy link
Owner

lovell commented May 24, 2016

The RSS output (in pages) from smem seems to be somewhat inconsistent with the RSS output (in bytes) from Node. I wonder if this is a docker thing?

If we take the smem stats to be relative, rather than absolute, it looks like USS (non-shared memory) is regularly dropping to almost half its previous value, presumably as a result of GC occurring, so memory is definitely being freed but not necessarily being returned to the OS.

example before iteration #99 { rss: 287756288, heapTotal: 11344896, heapUsed: 5172176 }
  PID User     Command                         Swap      USS      PSS      RSS 
 6993 andreas  node --trace-gc test.js            0   753512   753556   757048 
example before iteration #100 { rss: 291610624, heapTotal: 11344896, heapUsed: 5234072 }
  PID User     Command                         Swap      USS      PSS      RSS 
 6993 andreas  node --trace-gc test.js            0   476560   476604   480096 

The use of 2x semi-spaces for "new" generation in more recent versions of V8 means a slightly faster scavenger strategy can be used for short-lived objects. Mark+sweep is still used for the single "old" generation. I'm unsure which version of Node introduced the semi-space concept.

If you know you're going to have lots of short-lived Buffers, even though their backing store is not on the V8 heap, perhaps increasing the new/semi space might allow these Buffer objects to be scavenged before they reach "old" generation.

Commit 331926d adds the change to improve perf (and reduce allocations) for Stream-based input, thanks for confirming you too see an improvement.

@papandreou
Copy link
Contributor Author

The RSS output (in pages) from smem seems to be somewhat inconsistent with the RSS output (in bytes) from Node. I wonder if this is a docker thing?

Sorry that I didn't make clear that ever since I went back to test with 0.10.45, I've run test.js directly on my local machine without docker. Turned out to be easier to monkey-patch sharp and libvips that way.

And yeah, that discrepancy puzzled me as well.

@papandreou
Copy link
Contributor Author

papandreou commented Jun 14, 2016

I no longer believe this problem is sharp's fault as I've seen similar behavior in reduced cases where only JavaScript and Buffers are in the picture. I've shifted my focus to get the application upgraded to node.js 6 as it appears be a big improvement in this regard -- it actually releases memory to the OS again :). Also, it seems that others have noticed similar symptoms and it's being worked on in V8 and 6.x.

Thanks for all the help, @lovell & @jcupitt! I'm very happy that this project is in such good hands.

@jcupitt
Copy link
Contributor

jcupitt commented Jun 14, 2016

We've found two memory-gotchas in libvips in the last couple of weeks. The changes will be in 8.4 and ought to make at least a small difference in this case.

jcupitt/libvips@1909b31

https://github.com/jcupitt/libvips/issues/466

@papandreou
Copy link
Contributor Author

papandreou commented Sep 13, 2016

I finally managed to update that application from node.js 0.10.45 to 6.3.0 in production. The RSS behavior is much better, but there's still a leak:

rssannotated

The colored lines depict the RSS of each worker process in the app. The bottom greenish-yellow line is the RSS of a master process. It looks like node.js 6.3 manages memory more efficiently, but it's just the "amplitude" that was lowered as a result of the upgrade. The leak pattern still looks the same.

The JavaScript heap size is stable, so I'm still suspecting node.js itself.

@jcupitt, it would be nice to try out those memory-related fixes as well. When is 8.4 due? Any chance they can be backported to 8.3?

@jcupitt
Copy link
Contributor

jcupitt commented Sep 13, 2016

8.4 ought to be within a week, the last thing has been merged and it's just final tweaks, updating all the docs and updating the various builds.

@papandreou
Copy link
Contributor Author

@jcupitt That sounds great -- thanks!

@papandreou
Copy link
Contributor Author

Update: I've recently updated from node.js 6.3.0 to 7.8.0, and that has changed the memory footprint and RSS pattern dramatically. I assume that has to do with the improvements mentioned in https://v8project.blogspot.dk/2016/09/v8-release-54.html

Also, I've found that specifying a soft memory limit using linux cgroups to be effective.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants