-
-
Notifications
You must be signed in to change notification settings - Fork 267
Memory leak when trying to import large data set via WriteStream #298
Comments
Quite possible, we even want to deprecate WriteStream #199. However, it should respect backpressure and not have this kind of issue My educated guess is that JSONStream put everything in 'old mode' and then we lose backpressure somehow. Or worse, the JSON structure causes a massive stack in JSONStream. @dominictarr your help is needed here :). Can you please try to see if the same happens with a newline-delimited-json thing (you can parse it with split2? I would recommend anyway against having a JSON thing of 500MB. Use something simpler, like newlines, and everything will be easier. JSON is not built for big files. |
Thank you for the recommendations. Unfortunately the memory still goes through the roof (I am getting var fs = require('fs'),
JSONStream = require('JSONStream'),
through2 = require('through2'),
split2 = require('split2'),
levelup = require('level');
var db = levelup('db');
var write_stream = db.createWriteStream();
fs.createReadStream('output/nodes.txt', { encoding: 'utf8' })
.pipe(split2())
.pipe(through2.obj(function(line, enc, next){
var parts = line.split(',');
this.push({
key: parts[0] + "",
value: parts[1] + "," + parts[2]
});
next();
}))
.pipe(write_stream);
Indeed is sounds like the backpressure is not working as expected, but I'm not sure if levelup is trying to write too frequently or that in effect split2/JSONStream is too fast and levelup's buffer spills over by the time it gets to write. [N.B. I'm still a n00b with streams, so my intuition might be horribly off :-)] |
I used to have this problem as well with old versions of levelup but I hadn't seen it for quite some time. I wonder if the object mode backpressure signalling is the issue with the size of your keys being large. Maybe try manually setting lower highWaterMarks to test... Also, you may want to check out https://github.com/maxogden/level-bulk-load which will attempt to auto-batch the writes which is a significant improvement in my experience. Another optimization you can do is pre-reverse sort your data by key, though I wouldn't expect that should be necessary. @maxogden has also spent a lot of time working with bulk-loading leveldb |
Let me suggest one trick I learned in the past: Replace:
with
Also, you should wait your level to be open before starting the import. |
Wow, with Going with the Keys are 64 bit integers and replacing them with 32 bit integers for the purpose of this exercise did help, but not enough to prevent the memory issue. I will also check out |
if you want it to be faster, you can call setImmediate every 100 (or more) samples. It's ugly but it works. |
Awesome 👍 For future reference, this is the final code that works well: var fs = require('fs'),
through2 = require('through2'),
split2 = require('split2'),
levelup = require('level');
var i = 0;
levelup('db', function(err, db) {
var write_stream = db.createWriteStream();
fs.createReadStream('output/nodes.txt', { encoding: 'utf8' })
.pipe(split2())
.pipe(through2.obj(function(line, enc, next){
var parts = line.split(',');
this.push({
key: parts[0],
value: parts[1] + "," + parts[2]
});
// Prevent memory leak
// See: https://github.com/rvagg/node-levelup/issues/298
if (i++ > 999) {
setImmediate(next);
i = 0;
} else {
next();
}
}))
.pipe(write_stream);
}); |
I am getting out of memory errors when I'm trying to import about 500MB worth of nodes from an OpenStreetMap JSON file. Here's my code, in case I am approaching this the wrong way:
This happens for level@0.18.0 (levelup@0.18.6 + leveldown@0.10.2), on Node v0.10.35 OS X.
The text was updated successfully, but these errors were encountered: