String vs Buffer memory usage + performance #4506

znewsham · 2024-11-21T18:52:33Z

Node.js Version

v18-v22

NPM Version

v10.8.2

Operating System

Linux zacknewsham-xps 6.8.0-48-generic #48~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Mon Oct 7 11:24:13 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

Subsystem

buffer, string_decoder, v8

Description

I'm trying to understand why my application has high baseline memory usage - in doing this I discovered something I can't explain - strings seem to cost >10x more memory per character than the equivalent buffer. Some amount of this is expected (~2x given UTF-16 nature of JS strings) - but not on this scale.

A secondary question is why the setup time of a String->String map is so much slower (8x) than a map that takes that string and converts it to a buffer before storing.

Below is a minimal preproduction - the commented out lines in test allow you to toggle between the string->string map and the string->buffer map

I run it with --expose-gc just to get a valid heap snapshot at the end. The total string size stored is (17 + 1000) * 100,000 - so the absolute minimal memory usage of this would be around 100mb (a trivial C++ implementation of the same takes 114mb).

When running with the string->string map, the memory cost is around 3.2GB and the setup time (to populate the map) is ~11s, when running as a string->buffer map the memory cost is 280MB and the setup time is ~1.3s. The "time" difference reported is completely explicable (the cost of parsing the buffer each time)

Minimal Reproduction

import { setTimeout } from "timers/promises";

const characters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';

// random alpha numeric strings of a specific length
function makeid(length) {
  let result = '';
  const charactersLength = characters.length;
  let counter = 0;
  while (counter < length) {
    result += characters.charAt(Math.floor(Math.random() * charactersLength));
    counter += 1;
  }
  return result;
}

// test setup - 100,000 keys 17 chars long, 1mn iterations, values are 1000 chars long
const keyCount = 100000;
const iterations = 1_000_000;
const keyLength = 17;
const valueLength = 1000;
const keys = new Array(keyCount).fill(0).map(() => makeid(keyLength));

function testMap(map) {
  const startSetup = performance.now();
  keys.forEach(key => map.set(key, makeid(valueLength)));
  const endSetup = performance.now();
  const start = performance.now();
  for (let i = 0; i < iterations; i++) {
    const key = keys[Math.floor(Math.random() * keys.length)];
    const value = map.get(key);

    // v8 optimisation busting - without this the loop is 4x faster due to optimising out the get call
    globalThis.value = value;
  }
  const end = performance.now();
  return { time: end - start, setup: endSetup - startSetup };
}

// a naive implementation that keeps the API the same but converts value's into buffers
class ConvertToBufferMap extends Map {
  set(key, value) {
    super.set(key, Buffer.from(value, "utf-8"));
  }
  get(key) {
    return super.get(key)?.toString("utf-8");
  }
}


async function test() {
  // const map = new Map();
  // console.log("map", testMap(map));
  const bufferMap = new ConvertToBufferMap();
  console.log("bufferMap", testMap(bufferMap));
  gc();
  console.log(process.memoryUsage().rss / 1024 / 1024);

  // pause to go get a heap snapshot or whatever
  await setTimeout(100000);
}

test();

Output

bufferMap { time: 705.9530600000003, setup: 1303.258812 }
Memory usage:  279.30078125

map { time: 83.8109829999994, setup: 10450.127824000001 }
Memory usage:  3195.6953125

Before You Submit

I have looked for issues that already exist before submitting this
My issue follows the guidelines in the README file, and follows the 'How to ask a good question' guide at https://stackoverflow.com/help/how-to-ask

The text was updated successfully, but these errors were encountered:

znewsham · 2024-11-21T19:31:32Z

As is so often the case, 10 mins after I ask the question I figure it out :( (at least partially). The problem is the test setup - it looks like makeid leaks memory - I was aware that it would allocate a lot of memory in the incremental string building, but thought the call to gc would clear it up - evidently not. Additionally, it seems that any conversion (e.g., to a buffer) releases that accumulated memory.

If I change makeid to use an array of a pre-allocated length + join at the end, the setup performance and memory usage both drop below that of a buffer - I ran valgrind on it, and it didn't show any lost bytes - so the memory does get cleaned up, just not by the gc call

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

String vs Buffer memory usage + performance #4506

String vs Buffer memory usage + performance #4506

znewsham commented Nov 21, 2024

znewsham commented Nov 21, 2024

String vs Buffer memory usage + performance #4506

String vs Buffer memory usage + performance #4506

Comments

znewsham commented Nov 21, 2024

Node.js Version

NPM Version

Operating System

Subsystem

Description

Minimal Reproduction

Output

Before You Submit

znewsham commented Nov 21, 2024