You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Linux zacknewsham-xps 6.8.0-48-generic #48~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Mon Oct 7 11:24:13 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
Subsystem
buffer, string_decoder, v8
Description
I'm trying to understand why my application has high baseline memory usage - in doing this I discovered something I can't explain - strings seem to cost >10x more memory per character than the equivalent buffer. Some amount of this is expected (~2x given UTF-16 nature of JS strings) - but not on this scale.
A secondary question is why the setup time of a String->String map is so much slower (8x) than a map that takes that string and converts it to a buffer before storing.
Below is a minimal preproduction - the commented out lines in test allow you to toggle between the string->string map and the string->buffer map
I run it with --expose-gc just to get a valid heap snapshot at the end. The total string size stored is (17 + 1000) * 100,000 - so the absolute minimal memory usage of this would be around 100mb (a trivial C++ implementation of the same takes 114mb).
When running with the string->string map, the memory cost is around 3.2GB and the setup time (to populate the map) is ~11s, when running as a string->buffer map the memory cost is 280MB and the setup time is ~1.3s. The "time" difference reported is completely explicable (the cost of parsing the buffer each time)
Minimal Reproduction
import { setTimeout } from "timers/promises";
const characters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
// random alpha numeric strings of a specific length
function makeid(length) {
let result = '';
const charactersLength = characters.length;
let counter = 0;
while (counter < length) {
result += characters.charAt(Math.floor(Math.random() * charactersLength));
counter += 1;
}
return result;
}
// test setup - 100,000 keys 17 chars long, 1mn iterations, values are 1000 chars long
const keyCount = 100000;
const iterations = 1_000_000;
const keyLength = 17;
const valueLength = 1000;
const keys = new Array(keyCount).fill(0).map(() => makeid(keyLength));
function testMap(map) {
const startSetup = performance.now();
keys.forEach(key => map.set(key, makeid(valueLength)));
const endSetup = performance.now();
const start = performance.now();
for (let i = 0; i < iterations; i++) {
const key = keys[Math.floor(Math.random() * keys.length)];
const value = map.get(key);
// v8 optimisation busting - without this the loop is 4x faster due to optimising out the get call
globalThis.value = value;
}
const end = performance.now();
return { time: end - start, setup: endSetup - startSetup };
}
// a naive implementation that keeps the API the same but converts value's into buffers
class ConvertToBufferMap extends Map {
set(key, value) {
super.set(key, Buffer.from(value, "utf-8"));
}
get(key) {
return super.get(key)?.toString("utf-8");
}
}
async function test() {
// const map = new Map();
// console.log("map", testMap(map));
const bufferMap = new ConvertToBufferMap();
console.log("bufferMap", testMap(bufferMap));
gc();
console.log(process.memoryUsage().rss / 1024 / 1024);
// pause to go get a heap snapshot or whatever
await setTimeout(100000);
}
test();
As is so often the case, 10 mins after I ask the question I figure it out :( (at least partially). The problem is the test setup - it looks like makeid leaks memory - I was aware that it would allocate a lot of memory in the incremental string building, but thought the call to gc would clear it up - evidently not. Additionally, it seems that any conversion (e.g., to a buffer) releases that accumulated memory.
If I change makeid to use an array of a pre-allocated length + join at the end, the setup performance and memory usage both drop below that of a buffer - I ran valgrind on it, and it didn't show any lost bytes - so the memory does get cleaned up, just not by the gc call
Node.js Version
v18-v22
NPM Version
v10.8.2
Operating System
Linux zacknewsham-xps 6.8.0-48-generic #48~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Mon Oct 7 11:24:13 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
Subsystem
buffer, string_decoder, v8
Description
I'm trying to understand why my application has high baseline memory usage - in doing this I discovered something I can't explain - strings seem to cost >10x more memory per character than the equivalent buffer. Some amount of this is expected (~2x given UTF-16 nature of JS strings) - but not on this scale.
A secondary question is why the setup time of a String->String map is so much slower (8x) than a map that takes that string and converts it to a buffer before storing.
Below is a minimal preproduction - the commented out lines in
test
allow you to toggle between the string->string map and the string->buffer mapI run it with
--expose-gc
just to get a valid heap snapshot at the end. The total string size stored is (17 + 1000) * 100,000 - so the absolute minimal memory usage of this would be around 100mb (a trivial C++ implementation of the same takes 114mb).When running with the string->string map, the memory cost is around 3.2GB and the setup time (to populate the map) is ~11s, when running as a string->buffer map the memory cost is 280MB and the setup time is ~1.3s. The "time" difference reported is completely explicable (the cost of parsing the buffer each time)
Minimal Reproduction
Output
Before You Submit
The text was updated successfully, but these errors were encountered: