-
Notifications
You must be signed in to change notification settings - Fork 30.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retaining chunks while reading from fs stream appears to leak memory rapidly #21967
Comments
It might be caused by the fact that every n-th allocated pooled buffer chunk is retained in this scenario, effectively retaning the whole underlying buffers, whie needing only a small chunk of it actually. |
@ChALkeR Yes, looking at Buffer pooling would be my first guess, too. I’m not sure how they would be retained indefinitely, though, because you do reset |
@addaleax Shouldn't be caused by js-side buffer pooling as the second file has chunks larger than poolsize. Also, I was yet unable to remove the dependency on Updated testcase which removes the second file: 'use strict';
const fs = require('fs');
fs.writeFileSync('test-small.bin', Buffer.alloc(100));
const maxLength = 4 * 1024 * 1024; // 4 MiB
let built = 0, buf = [], length = 0;
function tick() {
built++;
if (built % 1000 === 0) {
console.log(`RSS [${built}]: ${process.memoryUsage().rss / 1024 / 1024} MiB`);
}
const stream = fs.createReadStream('./test-small.bin');
stream.on('data', function (data) {
//data = Buffer.from(data); // WARNING: uncommenting this line fixes things somehow
buf.push(data)
length += data.length
if (length >= maxLength) {
buf = []
length = 0
}
});
stream.once('end', tock);
}
function tock() {
Buffer.alloc(65536);
setImmediate(tick);
}
tick(); |
@ChALkeR I think it’s not about In particular, this line seems to be not ideal: node/lib/internal/fs/streams.js Line 186 in f5a2167
This adjusts the pool offset by what we planned to read, which could be the whole pool size, not what we actually read. |
I think it’s a good sign that this seems to “solve” the problem: diff --git a/lib/internal/fs/streams.js b/lib/internal/fs/streams.js
index f527b1de4b84..f2cda91c6873 100644
--- a/lib/internal/fs/streams.js
+++ b/lib/internal/fs/streams.js
@@ -171,6 +171,8 @@ ReadStream.prototype._read = function(n) {
this.emit('error', er);
} else {
let b = null;
+ if (start + toRead === thisPool.used)
+ thisPool.used += bytesRead - toRead;
if (bytesRead > 0) {
this.bytesRead += bytesRead;
b = thisPool.slice(start, start + bytesRead); It only works if different streams don’t concurrently use the same pool, though, so I’m not sure it’s something we’d want to go with. |
More details about the usecase: I needed to concatenate a large number of files into a small number of files with compression. Something like this schematic code: cat a*.txt | lz4 > a.txt.lz4
cat b*.txt | lz4 > b.txt.lz4 In Gzemnid, I created several lz4 encoder streams (using node-lz4), piped those to output files, and upon encoutering a file to read — piped it into one of those lz4 streams (depending on the file). Now, the thing is that node-lz4 uses block compression and retains everything thrown into it until a sufficient input size is reached (which is 4 MiB by default) — and that data effectively comes from |
@addaleax What is also interesting is how exactly does 'use strict';
const maxLength = 4 * 1024 * 1024; // 4 MiB
let built = 0, buf = [], length = 0;
function tick() {
built++;
if (built % 1000 === 0) {
console.log(`RSS [${built}]: ${process.memoryUsage().rss / 1024 / 1024} MiB`);
}
let data = Buffer.allocUnsafe(65536).slice(0, 128).fill(0);
//data = Buffer.from(data); // uncommenting this line fixes things
buf.push(data)
length += data.length
if (length >= maxLength) {
buf = []
length = 0
}
Buffer.alloc(65536); // commenting this line also significantly reduces memory usage
setImmediate(tick);
}
tick(); High memory usage is expected though. Probably something related to memory management. |
@addaleax I guess that this is what's going on (re: how
|
Testcase for that: 'use strict';
const bs = 1024 * 1024; // 1 MiB
const retained = [];
let i = 0, flag = false;
function tick() {
i++;
if (i % 1000 === 0) {
console.log(`RSS [${i}]: ${process.memoryUsage().rss / 1024 / 1024} MiB`);
}
const buf = Buffer.allocUnsafe(bs);
retained.push(buf);
if (i === 20000) {
console.log('Clearing retained and enabling alloc');
retained.length = 0;
flag = true;
}
if (flag) Buffer.alloc(bs); // Even Buffer.alloc(bs - 10) seems to be fine here
if (i < 40000) setImmediate(tick);
}
tick(); |
The above ( // Compile with -O0
#include <iostream>
#include <unistd.h>
using namespace std;
int main() {
const int bs = 1024 * 1024; // 1 MiB
const int count = 1000;
void * x[count];
for (int i = 0; i < count; i++) {
free(calloc(bs * 2, 1)); // Commenting out this line reduces memory usage
x[i] = malloc(bs);
}
cout << "Allocated" << endl;
cout << "Sleeping..." << endl;
sleep(20);
cout << "Freeing" << endl;
for (int i = 0; i < count; i++) {
free(x[i]);
}
cout << "Hello" << endl;
return 0;
} |
@addaleax Btw, testcases from #21967 (comment), #21967 (comment) and #21967 (comment) (but not the initial one) have significantly less memory usage when Why don't we use jemalloc, btw? |
@ChALkeR The question has come up a few times, but I don’t know that we’ve ever done a full investigation into the benefits/downsides. I don’t think the examples here are something that we should base a decision on; that the OS may lazily populate memory pages is not really something for Node to care about, imo. |
This hack should be reverted once nodejs/node#21967 gets included into a Node.js v8.x release. Fixes: #18
Linux yoga 4.17.2-1-ARCH #1 SMP PREEMPT Sat Jun 16 11:08:59 UTC 2018 x86_64 GNU/Linux
While trying to resolve Gzemnid memory problems at nodejs/Gzemnid#18, I eventually reduced those to the following testcase. It seems to me that it's not node-lz4 fault, but something is wrong on the Node.js side.
Testcase:
Adding
data = Buffer.from(data);
fixes things somehow, problems start when the exact same chunks from the stream are retained for some time and some larger file reading goes on.gc-ing manually does not help — this looks like a memory leak.
All that memory is allocated through
node::ArrayBufferAllocator::Allocate
./cc @addaleax @nodejs/buffer
The text was updated successfully, but these errors were encountered: