Understanding sharp mem usage with a disabled cache #429

papandreou · 2016-05-17T09:37:38Z

I'm seeing some weird memory usage patterns on a production system that does on-demand image processing after we switched to sharp recently. I found #349 and #260 which seemed very similar, and was happy that the matter seemed to be concluded to everyone's satisfaction.

As far as I can tell, there are still problems, though. I changed @janaz's test setup (thank you very much for putting that online) to use the latest sharp master commit, the latest libvips master commit, disable sharp's cache via sharp.cache(false), and run more iterations of the test: https://github.com/papandreou/sharp-mem-usage/tree/fiddleAround

A make tests run in my fork yielded this output: https://gist.github.com/papandreou/27fde30202fe778f44dbaff9d7a39f63

The RSS starts at 33 MB, raises to 42 MB after the first iteration, and adds about 10 MB per iteration for a bit, slows down, plateaus at 275 MB for about 30 iterations, then a full mark-sweep garbage collection of the JavaScript heap brings it down to 177 MB, the pattern repeats itself a few times, and then it starts a steady increase and approaches 500 MB near the end of the 500 iterations.

... There is something going on :). I don't know exactly how to interpret the results, but I would expect the RSS to come down to about the initial level after a full mark-sweep GC.

The outfit branch shows the same trend.

The text was updated successfully, but these errors were encountered:

lovell · 2016-05-17T11:40:22Z

Hello, I wonder if what we're seeing here is the effect of memory fragmentation?

Given this is Linux, are you able to try with a different memory allocator? If so, add jemalloc to the installed packages then use LD_PRELOAD=/path/to/lib/libjemalloc.so.1 node --trace-gc test to run the tests within the container.

It's probably also worth testing with file-based output via toFile instead of toBuffer to see if this alters the memory behaviour. This will help confirm/deny the involvement of Node Buffer objects.

papandreou · 2016-05-17T12:31:58Z

Thanks for looking into it :)

With jemalloc it's still basically the same trend: https://gist.github.com/papandreou/ef7543f1b79f7d8857c84d0632ec143c -- it ends up with an RSS of 428 MB.

With file-based input, on the other hand: https://gist.github.com/papandreou/b71e0f336bf709e743c0889a3db96cb6 -- it's much more stable and ends up at 59 MB after 500 iterations.

Switching back to buffer-based input, but only reading the file once instead of at the start of every iteration shows the same low overall memory footprint as the file-based input: https://gist.github.com/papandreou/818a6ffd729697ed6359f9abbdc7100d

It seems like we're getting close, but I don't know exactly what to conclude? Either node.js has a memory fragmentation problem, or sharp is leaking buffers, but it's less visible when there's only a single, big buffer in play?

papandreou · 2016-05-17T12:54:52Z

Reading the file once, but creating a copy of the buffer before each iteration (image => Buffer.concat([image])) makes the RSS climb again.

papandreou · 2016-05-17T13:24:24Z

Seems like it does depend on the node.js version:

0.10.45:

example before iteration #1 { rss: 35733504, heapTotal: 10324992, heapUsed: 5621720 }
example before iteration #2 { rss: 48336896, heapTotal: 11344896, heapUsed: 5808192 }
[...]
example before iteration #100 { rss: 60399616, heapTotal: 11344896, heapUsed: 5716080 }

0.12.14:

example before iteration #1 { rss: 44343296, heapTotal: 12835584, heapUsed: 6608768 }
example before iteration #2 { rss: 54784000, heapTotal: 12835584, heapUsed: 6851904 }
[...]
example before iteration #100 { rss: 63778816, heapTotal: 17029888, heapUsed: 6777864 }

io.js 2.5.0:

example before iteration #1 { rss: 43737088, heapTotal: 14423040, heapUsed: 6252816 }
example before iteration #2 { rss: 56307712, heapTotal: 14423040, heapUsed: 6496480 }
[...]
example before iteration #100 { rss: 66166784, heapTotal: 18550784, heapUsed: 7190832 }

io.js 3.0.0:

example before iteration #1 { rss: 49037312, heapTotal: 11327232, heapUsed: 6800992 }
[...]
example before iteration #99 { rss: 481263616, heapTotal: 17518848, heapUsed: 6402736 }
example before iteration #100 { rss: 485507072, heapTotal: 17518848, heapUsed: 6423008 }

io.js 3.3.1:

example before iteration #1 { rss: 50425856, heapTotal: 11327232, heapUsed: 6976832 }
example before iteration #2 { rss: 67518464, heapTotal: 11327232, heapUsed: 7215952 }
[...]
example before iteration #100 { rss: 486883328, heapTotal: 17518848, heapUsed: 6503456 }

4.4.4:

example before iteration #1 { rss: 51679232, heapTotal: 11536480, heapUsed: 6891464 }
example before iteration #2 { rss: 68751360, heapTotal: 11536480, heapUsed: 7790792 }
[...]
example before iteration #100 { rss: 230236160, heapTotal: 9489248, heapUsed: 6491544 }

5.11.1:

example before iteration #1 { rss: 51593216, heapTotal: 10619424, heapUsed: 6905552 }
example before iteration #2 { rss: 69070848, heapTotal: 11651360, heapUsed: 7795328 }
[...]
example before iteration #100 { rss: 230211584, heapTotal: 10624032, heapUsed: 6702000 }

So the problem got introduced with io.js 3.0.0. I guess that makes the Buffer-on-top-of-UintArray8 reimplementation the most likely culprit: https://github.com/nodejs/node/blob/master/doc/changelogs/CHANGELOG_IOJS.md#2015-08-04-version-300-rvagg -- or maybe sharp hasn't correctly adapted to that change.

lovell · 2016-05-17T14:31:31Z

Thanks for running the extra tests on the various versions of Node/io.js. The implementation of Buffer within Node has quite an influence on the results!

The GCs that occur due to "external memory allocation limit reached" appear to reduce the RSS by a much greater value when using jemalloc, which suggests this is at least partly a memory fragmentation problem.

sharp uses Nan::NewBuffer to create the Buffer object, providing a custom callback to guarantee we free the memory in the same shared library as it was malloc'd (to keep Windows happy).

You can track the GC of sharp-created Buffer objects by adding some printf-style "debugging" to the FreeCallback to show when backing data is freed.

papandreou · 2016-05-18T09:16:16Z

For completeness -- it seems like node.js 6.2.0 improves on this quite a bit:

6.2.0

example before iteration #1 { rss: 52207616, heapTotal: 12578816, heapUsed: 7073760 }
example before iteration #2 { rss: 69861376, heapTotal: 12578816, heapUsed: 8019312 }
[...]
example before iteration #100 { rss: 117493760, heapTotal: 10481664, heapUsed: 6657384 }

lovell · 2016-05-18T14:01:02Z

Thank you for this thorough testing @papandreou !

Looking through recent Node commits, I think the change in V8 API used when cleaning up Buffer objects as part of nodejs/node#5204 might have improved things.

papandreou · 2016-05-23T07:20:20Z

I guess that fully explains the symptoms originally observed by @janaz. It might make sense to discourage the use of io.js 3.x and node.js 4.x + 5.x in the README. It's of course unfortunate that the current node.js LTS series has memory fragmentation issues, so it might also make sense to ping about this, in case they aren't already aware.

The really funny thing is that the issue I'm having occurs with node.js 0.10, so it's also back to the drawing board for me :)

lovell · 2016-05-23T13:59:13Z

nodejs/node#5204 has the dont-land-on-v4.x label, which I suspect means that the version of V8 in Node v4 doesn't support the API being used, or at least that the change is considered major enough to break other things. Not sure what we can do here, other than your most excellent suggestion of documenting it :)

papandreou · 2016-05-23T15:58:07Z

Ok, I think I've found another problem that occurs with node.js 0.10 as well -- the memory utilization pattern changes drastically when I change the test script to use stream-based input (which my actual app does):

With Buffer-based input: http://pastebin.com/Ny0xyD4m
With stream-based input: http://pastebin.com/dhYwM4kK

This is the difference between the two test runs: papandreou/sharp-mem-usage@5c97369

This looks rather serious as well, any ideas?

lovell · 2016-05-23T17:19:39Z

Using Stream-based input usually results in the creation of a lot of small-but-temporary Buffer objects, which might exacerbate any memory fragmentation.

Is this causing the process to crash?

Have you tried experimenting with V8's "new space" ("semi space" in recent versions) and "old space" heap settings?

lovell · 2016-05-23T20:59:10Z

Please can you test the effects of commit 6e31318 on the stream-concat-429 branch, which should reduce the number of memory allocations when using Stream-based input.

papandreou · 2016-05-24T09:26:32Z

Is this causing the process to crash?

No, it's not causing the process to crash. That goes for all the test runs I've done and the service I have in production. But both end up having a very high memory footprint, and garbage collections don't bring down the RSS substantially.

Have you tried experimenting with V8's "new space" ("semi space" in recent versions) and "old space" heap settings?

No, I haven't tried that yet. Will do that and let you know.

Please can you test the effects of commit 6e31318 on the stream-concat-429 branch, which should reduce the number of memory allocations when using Stream-based input.

It's a big improvement compared to the stream-based input test I did yesterday: http://pastebin.com/KH2vUazn

Still, that RSS is not coming down :/

papandreou · 2016-05-24T09:28:54Z

That last test, and the ones from yesterday, were all performed on node.js 0.10.45, by the way.

papandreou · 2016-05-24T09:56:58Z

I eliminated the use of fs.ReadStream, while still using the stream-based API against the stream-concat-429 branch. The RSS growth rate decreases, but it still isn't coming down after a GC and ends up at 312 MB after 100 iterations: http://pastebin.com/dFizUr8k

I'm reading the image file once, but creating a new Buffer instance with it for each iteration. Could that be important?

papandreou · 2016-05-24T10:37:09Z

Hmm, the symptoms actually persist with Buffer-based input as long as I create a new Buffer instance for each iteration: http://pastebin.com/1UKExJ16

lovell · 2016-05-24T11:40:15Z

High RSS isn't necessarily a bad sign. It could mean the C runtime has yet to release some memory blocks back to the OS, perhaps because they are still partially in use.

http://stackoverflow.com/questions/1421491/does-calling-free-or-delete-ever-release-memory-back-to-the-system

Are you able to try a tool like smem to split the reported RSS into unique vs proportion-of-shared?

jcupitt · 2016-05-24T13:54:40Z

You're probably using this already, but vips has some simple things to report memory usage:

http://www.vips.ecs.soton.ac.uk/supported/current/doc/html/libvips/libvips-memory.html#vips-tracked-get-mem

I expect sharp exposes these (can't remember). You can get current number of allocations, total allocated space, and peak memory use. This only includes memory directly allocated by vips for pixel buffers (usually the great bulk of vips memuse).

These stats would show how much of the RSS use you are seeing is due to fragmentation or brk() not handing stuff back to the host OS.

Of course it would be no help at all with the tricky question of working out why references are being kept longer than they should :-( If you build vips with debugging on, you can get a table of all objects and their counts. These tables are only useful for very small cases though.

papandreou · 2016-05-24T14:13:50Z

Tweaking --max-old-space-size does not seem to have any effect. I suspect that's because the JavaScript heap usage is so low.

An interesting observation: If I keep creating a new Buffer instance for each iteration, but pass the original Buffer to sharp each time, the RSS grows at approximately the same rate, but the mark-sweep GCs do reclaim the memory: http://pastebin.com/JzB0K0iP

That seems to indicate that sharp or libvips does have a hand in it not being released, right?

papandreou · 2016-05-24T14:26:04Z

Here's the output of smem mixed in with a test run: https://gist.github.com/papandreou/5db6e17c86390eef98333f8cba7925de

I don't know how to interpret that?

papandreou · 2016-05-24T14:45:29Z

If I hack in some printfs that report the vips_tracked_get_* values before processing each image, it seems like vips does not account for the "lost memory": https://gist.github.com/papandreou/f4f8a717dba0396c857e3cf5e8ff36a6

Before the first iteration it reports:

vips_tracked_get_mem 0
vips_tracked_get_mem_highwater 0
vips_tracked_get_allocs 0
vips_tracked_get_files 0

and from then on it always comes out as:

vips_tracked_get_mem 0
vips_tracked_get_mem_highwater 8484536
vips_tracked_get_allocs 0
vips_tracked_get_files 0

papandreou · 2016-05-24T15:20:27Z

Picking up on a previous piece of advice: If I add logging to the FreeCallback and where Nan::NewBuffer is used, it looks like every buffer allocated like this does get freed eventually: https://gist.github.com/papandreou/acf92947128fb02d9dc6afe745131835

jcupitt · 2016-05-24T15:49:23Z

It sounds like it's the V8 GC not freeing stuff as quickly as you'd think it might.

I've observed similar things with the Ruby and Python bindings. Ruby has a generational mark-sweep GC, rather like V8 (I think?), and it's tricky to get it to do the right thing. Executing a long-running program in constant memory requires some manual GC triggers.

Python has a very annoying reference counting system (annoying in that it's difficult for extensions to get right) combined with generational mark-sweep to break cycles. But because of the ref counting, Python will usually free things as soon as you'd expect it to. It's much simpler to get predictable memory behaviour.

#429

lovell · 2016-05-24T20:36:56Z

The RSS output (in pages) from smem seems to be somewhat inconsistent with the RSS output (in bytes) from Node. I wonder if this is a docker thing?

If we take the smem stats to be relative, rather than absolute, it looks like USS (non-shared memory) is regularly dropping to almost half its previous value, presumably as a result of GC occurring, so memory is definitely being freed but not necessarily being returned to the OS.

example before iteration #99 { rss: 287756288, heapTotal: 11344896, heapUsed: 5172176 }
  PID User     Command                         Swap      USS      PSS      RSS 
 6993 andreas  node --trace-gc test.js            0   753512   753556   757048 
example before iteration #100 { rss: 291610624, heapTotal: 11344896, heapUsed: 5234072 }
  PID User     Command                         Swap      USS      PSS      RSS 
 6993 andreas  node --trace-gc test.js            0   476560   476604   480096

The use of 2x semi-spaces for "new" generation in more recent versions of V8 means a slightly faster scavenger strategy can be used for short-lived objects. Mark+sweep is still used for the single "old" generation. I'm unsure which version of Node introduced the semi-space concept.

If you know you're going to have lots of short-lived Buffers, even though their backing store is not on the V8 heap, perhaps increasing the new/semi space might allow these Buffer objects to be scavenged before they reach "old" generation.

Commit 331926d adds the change to improve perf (and reduce allocations) for Stream-based input, thanks for confirming you too see an improvement.

papandreou · 2016-05-25T07:51:15Z

The RSS output (in pages) from smem seems to be somewhat inconsistent with the RSS output (in bytes) from Node. I wonder if this is a docker thing?

Sorry that I didn't make clear that ever since I went back to test with 0.10.45, I've run test.js directly on my local machine without docker. Turned out to be easier to monkey-patch sharp and libvips that way.

And yeah, that discrepancy puzzled me as well.

papandreou · 2016-06-14T07:36:32Z

I no longer believe this problem is sharp's fault as I've seen similar behavior in reduced cases where only JavaScript and Buffers are in the picture. I've shifted my focus to get the application upgraded to node.js 6 as it appears be a big improvement in this regard -- it actually releases memory to the OS again :). Also, it seems that others have noticed similar symptoms and it's being worked on in V8 and 6.x.

Thanks for all the help, @lovell & @jcupitt! I'm very happy that this project is in such good hands.

jcupitt · 2016-06-14T09:06:32Z

We've found two memory-gotchas in libvips in the last couple of weeks. The changes will be in 8.4 and ought to make at least a small difference in this case.

jcupitt/libvips@1909b31

https://github.com/jcupitt/libvips/issues/466

papandreou · 2016-09-13T07:22:29Z

I finally managed to update that application from node.js 0.10.45 to 6.3.0 in production. The RSS behavior is much better, but there's still a leak:

The colored lines depict the RSS of each worker process in the app. The bottom greenish-yellow line is the RSS of a master process. It looks like node.js 6.3 manages memory more efficiently, but it's just the "amplitude" that was lowered as a result of the upgrade. The leak pattern still looks the same.

The JavaScript heap size is stable, so I'm still suspecting node.js itself.

@jcupitt, it would be nice to try out those memory-related fixes as well. When is 8.4 due? Any chance they can be backported to 8.3?

jcupitt · 2016-09-13T07:26:43Z

8.4 ought to be within a week, the last thing has been merged and it's just final tweaks, updating all the docs and updating the various builds.

papandreou · 2016-09-13T07:28:02Z

@jcupitt That sounds great -- thanks!

papandreou · 2017-04-29T10:02:45Z

Update: I've recently updated from node.js 6.3.0 to 7.8.0, and that has changed the memory footprint and RSS pattern dramatically. I assume that has to do with the improvements mentioned in https://v8project.blogspot.dk/2016/09/v8-release-54.html

Also, I've found that specifying a soft memory limit using linux cgroups to be effective.

papandreou mentioned this issue May 17, 2016

Trying to understand sharp memory usage #349

Closed

lovell added the question label May 17, 2016

papandreou closed this as completed May 23, 2016

papandreou reopened this May 23, 2016

lovell added a commit that referenced this issue May 24, 2016

Concat Stream-based input in single operation for ~+3% perf and less GC

331926d

#429

papandreou closed this as completed Jun 14, 2016

lovell mentioned this issue Jan 5, 2017

Maybe memory leak? #667

Closed

lovell mentioned this issue Apr 21, 2017

Question: memory usage #778

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understanding sharp mem usage with a disabled cache #429

Understanding sharp mem usage with a disabled cache #429

papandreou commented May 17, 2016

lovell commented May 17, 2016

papandreou commented May 17, 2016

papandreou commented May 17, 2016

papandreou commented May 17, 2016 •

edited

Loading

lovell commented May 17, 2016

papandreou commented May 18, 2016

lovell commented May 18, 2016

papandreou commented May 23, 2016

lovell commented May 23, 2016

papandreou commented May 23, 2016 •

edited

Loading

lovell commented May 23, 2016

lovell commented May 23, 2016

papandreou commented May 24, 2016

papandreou commented May 24, 2016

papandreou commented May 24, 2016

papandreou commented May 24, 2016

lovell commented May 24, 2016

jcupitt commented May 24, 2016

papandreou commented May 24, 2016

papandreou commented May 24, 2016

papandreou commented May 24, 2016

papandreou commented May 24, 2016

jcupitt commented May 24, 2016

lovell commented May 24, 2016

papandreou commented May 25, 2016

papandreou commented Jun 14, 2016 •

edited

Loading

jcupitt commented Jun 14, 2016

papandreou commented Sep 13, 2016 •

edited

Loading

jcupitt commented Sep 13, 2016

papandreou commented Sep 13, 2016

papandreou commented Apr 29, 2017

Understanding sharp mem usage with a disabled cache #429

Understanding sharp mem usage with a disabled cache #429

Comments

papandreou commented May 17, 2016

lovell commented May 17, 2016

papandreou commented May 17, 2016

papandreou commented May 17, 2016

papandreou commented May 17, 2016 • edited Loading

0.10.45:

0.12.14:

io.js 2.5.0:

io.js 3.0.0:

io.js 3.3.1:

4.4.4:

5.11.1:

lovell commented May 17, 2016

papandreou commented May 18, 2016

6.2.0

lovell commented May 18, 2016

papandreou commented May 23, 2016

lovell commented May 23, 2016

papandreou commented May 23, 2016 • edited Loading

lovell commented May 23, 2016

lovell commented May 23, 2016

papandreou commented May 24, 2016

papandreou commented May 24, 2016

papandreou commented May 24, 2016

papandreou commented May 24, 2016

lovell commented May 24, 2016

jcupitt commented May 24, 2016

papandreou commented May 24, 2016

papandreou commented May 24, 2016

papandreou commented May 24, 2016

papandreou commented May 24, 2016

jcupitt commented May 24, 2016

lovell commented May 24, 2016

papandreou commented May 25, 2016

papandreou commented Jun 14, 2016 • edited Loading

jcupitt commented Jun 14, 2016

papandreou commented Sep 13, 2016 • edited Loading

jcupitt commented Sep 13, 2016

papandreou commented Sep 13, 2016

papandreou commented Apr 29, 2017

papandreou commented May 17, 2016 •

edited

Loading

papandreou commented May 23, 2016 •

edited

Loading

papandreou commented Jun 14, 2016 •

edited

Loading

papandreou commented Sep 13, 2016 •

edited

Loading