Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why don't we use jemalloc? #21973

Closed
ChALkeR opened this issue Jul 25, 2018 · 10 comments
Closed

Why don't we use jemalloc? #21973

ChALkeR opened this issue Jul 25, 2018 · 10 comments
Labels
discuss Issues opened for discussions and feedbacks. memory Issues and PRs related to the memory management or memory footprint. wontfix Issues that will not be fixed.

Comments

@ChALkeR
Copy link
Member

ChALkeR commented Jul 25, 2018

Note: while switching to jemalloc might be unoptimal, it could be still useful to gather information about positive and negative implications of jemalloc usage. Let's do that in this issue. I am not advocating to switch to jemalloc (yet), but imo that is worth investigation.

In some situations that I observed, it consumes slightly more memory (~5%), but it is able to significantly reduce the memory usage by orders of magnitude in some cases (basically in a subset of cases where glibc behaves significantly unoptimal).

In some situations, jemalloc consumes significantly more memory though, but appears to be faster.

Testcase 1 (based on #21967):

'use strict';

const bs = 4 * 1024 * 1024; // 4 MiB
const retained = [];
let i = 0, flag = false;

function tick() {
  i++;
  if (i % 1000 === 0) {
    console.log(`RSS [${i}]: ${process.memoryUsage().rss / 1024 / 1024} MiB`);
  }
  retained.push(Buffer.allocUnsafe(bs));
  if (i === 5000) {
    console.log('Clearing retained and enabling alloc');
    retained.length = 0;
    flag = true;
  }
  if (flag) Buffer.alloc(bs); // Buffer.alloc(bs - 10) seems to be fine here
  if (i < 10000) setImmediate(tick);
}

tick();

Atm (with Node.js v10.7.0) it produces the following results:

RSS [1000]: 35.140625 MiB
RSS [2000]: 40.24609375 MiB
RSS [3000]: 45.25390625 MiB
RSS [4000]: 49.27734375 MiB
RSS [5000]: 53.29296875 MiB
Clearing retained and enabling alloc
RSS [6000]: 993.45703125 MiB
RSS [7000]: 2233.32421875 MiB
RSS [8000]: 3499.56640625 MiB
RSS [9000]: 4792.9140625 MiB
RSS [10000]: 5997.30859375 MiB

I traced that down to C++ malloc() behavior (testcase in #21967 (comment)).

This is what happens just with LD_PRELOAD=/usr/lib/libjemalloc.so:

RSS [1000]: 36.640625 MiB
RSS [2000]: 42.62109375 MiB
RSS [3000]: 48.21875 MiB
RSS [4000]: 52.38671875 MiB
RSS [5000]: 56.7890625 MiB
Clearing retained and enabling alloc
RSS [6000]: 58.8828125 MiB
RSS [7000]: 62.77734375 MiB
RSS [8000]: 66.9140625 MiB
RSS [9000]: 71.2421875 MiB
RSS [10000]: 75.453125 MiB

Testcase 2:

const arr = [];
for (let i = 0; i < 1e4; i++) arr.push(Buffer.alloc(1e5));
console.log(`RSS: ${process.memoryUsage().rss / 1024 / 1024} MiB`);

Normal — 697 MiB, jemalloc — 37 MiB.

Testcase 3 (jemalloc consumes more memory):

function foo() {
  let a;
  for (let i = 0; i < 5; i++) a = Buffer.alloc(1e8, 1);
  console.log(`RSS: ${process.memoryUsage().rss / 1024 / 1024} MiB`);
}
const start = process.hrtime();
foo(); foo();
gc(); gc();
console.log(`RSS: ${process.memoryUsage().rss / 1024 / 1024} MiB`);
foo();
const time = process.hrtime(start);
console.log('Time:', time[0] + time[1] * 1e-9);

Normal:

RSS: 506.8125 MiB
RSS: 507.98046875 MiB
RSS: 31.2421875 MiB
RSS: 507.8828125 MiB
Time: 0.774285919

jemalloc:

RSS: 507.87890625 MiB
RSS: 604.40625 MiB
RSS: 604.625 MiB
RSS: 604.625 MiB
Time: 0.349531212

Testcase 4:

'use strict';

let i = 0;
function tick() {
  const a = Buffer.alloc(1e7);
  if (i++ >= 1e4) return;
  setImmediate(tick);
}

tick();

Measured with /usr/bin/time -f '%M KiB, %e seconds' node testcase-4.js.
Normal (1e4 * 1e7): 129 576 KiB, 11.41 seconds.
jemalloc (1e4 * 1e7): 34 928 KiB, 3.33 seconds.
Normal (1e5 * 1e6): 109 196 KiB, 11.51 seconds.
jemalloc (1e5 * 1e6): 35 636 KiB, 4.13 seconds.

Testcase 5 (like 4, but now we fill the buffer with 1-s):

'use strict';

let i = 0;
function tick() {
  const a = Buffer.alloc(1e7, 1);
  if (i++ >= 1e4) return;
  setImmediate(tick);
}

tick();

Normal (1e4 * 1e7): 139 060 KiB, 14.38 seconds.
jemalloc (1e4 * 1e7): 170 308 KiB, 10.65 seconds.
Normal (1e5 * 1e6): 105 120 KiB, 12.25 seconds.
jemalloc (1e5 * 1e6): 112 548 KiB, 10.99 seconds.

Testcase 6 (from #8871, where @bnoordhuis mentioned jemalloc):

const zlib = require('zlib');
const payload = Buffer.from(JSON.stringify({ some:"data" }));
for (let i =0; i < 30000; ++i) zlib.deflate(payload, () => {});

No improvement in this case and a 5% loss.
Normal: 3 022 496 KiB, 5.88 seconds.
jemalloc: 3 179 716 KiB, 5.91 seconds.

/cc @addaleax @bnoordhuis @mscdex

@ChALkeR ChALkeR added discuss Issues opened for discussions and feedbacks. memory Issues and PRs related to the memory management or memory footprint. labels Jul 25, 2018
@mscdex
Copy link
Contributor

mscdex commented Jul 25, 2018

Related: nodejs/node-v0.x-archive#5339

@mcollina
Copy link
Member

I've tested this briefly. I'm seeing some light performance improvements (2.5%-5%) but higher RSS usage (100 -> 140MB) with some HTTP usage.

I would like to test it with some heavy HTTP applications with very high memory usage first, to see if the increased 40MB of RSS is a fixed cost or a percentage.

This is not a buffer-heavy use case.

@lpinca
Copy link
Member

lpinca commented Jul 29, 2018

Also related: #17007

@Trott
Copy link
Member

Trott commented Oct 26, 2018

Should this remain open? Or is this a conversation that has run its course and the issue can be closed?

@ChALkeR
Copy link
Member Author

ChALkeR commented Oct 26, 2018

@mcollina Any updates re: http performance?

Also, just in case: which jemalloc version did you use?
It should be the latest one, which is currently 5.1.0.

@Trott I belive that we need some data on how this affects something closer to real-world applications, not just microbenchmarks.

@mcollina
Copy link
Member

I was not able to do any further testing.

@Fishrock123
Copy link
Contributor

As a point of reference, Rust just finished removing all of jemalloc, saying that the system allocators tend to be better.

@xacrimon
Copy link

@Fishrock123 Quite incorrect there. Jemalloc was removed as the default because it was forced binary bloat for projects that didnt care much. Jemalloc is often more performant.

@lovell
Copy link
Contributor

lovell commented Jun 24, 2019

Part of the reason for the removal of jemalloc as the only runtime allocator in Rust was to provide support for a wider range of target platforms, architectures and tooling. It now allows an optional, configurable global_allocator that works with crates such as jemallocator.

In my experience the reduced heap fragmentation in long-running, multi-threaded, glibc-based Linux processes is jemalloc's greatest benefit. Native Node.js modules using the libuv threadpool and worker threads would be good examples of where it might help some users some of the time.

The ability to inject jemalloc via LD_PRELOAD on Linux already provides a runtime integration on the platform it benefits most, from which one could infer it doesn't need to live in Node.js itself. Perhaps an addition to the documentation about how to do this would be appropriate?

@bnoordhuis
Copy link
Member

I think the consensus is we're not going to make this change? And as @lovell points out, you can already use jemalloc through LD_PRELOAD. See also the conclusion in #17007.

I'll go ahead and close this out. If someone wants to document the LD_PRELOAD approach, please open a pull request. I don't have suggestions on where to add it exactly, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issues opened for discussions and feedbacks. memory Issues and PRs related to the memory management or memory footprint. wontfix Issues that will not be fixed.
Projects
None yet
Development

No branches or pull requests

9 participants