-
-
Notifications
You must be signed in to change notification settings - Fork 266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak in db.batch #171
Comments
do you have a test example that reproduces this? |
what I meant to say, is that I've encountered a similar problem, |
Test example: reading in the test dataset in the Norch package- It generates about 400000 entries (I think) from 21000 batch files of about 150-200 entries. Yes, in the past I have also managed to queue up way to many operations, leading to huge server hangs. Now however, search-index reads in batches in serial (one after the other). This seems to be more stable, but leads to a big memory leak in db.batch. |
I've found that large batches work pretty well - how many items do you have in your batches? |
We need a simple test case for this so we can replicate the behavior and track it down and also we need to know whether this is new behavior or has been present for a while. There was a LevelDOWN release this week that contains lots of new C++ which could possibly be leaky but it'd be nice to know f this is recent behavior. Also note that LevelDB does have some behavior that can look like a leak when you're doing heavy writes. Initially it'll just climb right up and not look like its going down but it eventually comes right back down and settles there. I don't think I have a graph of this but that'd probably be helpful to weed this kind of thing out. |
@dominictarr its just over 21000 batch files of maybe 1-200 entries each @rvagg OK- I will throw together a gist |
@rvagg See https://gist.github.com/fergiemcdowall/6239924 NOTE: this gist demonstrates the memory jump (about 3gb on my machine) when inserting many batches to levelUP, although I cant replicate the out-of-memory error I am experiencing when inserting to levelUP from Express.js- there the cleanup doesnt happen unless there is a start-stop of the application. |
added an adjusted version to the gist that gives some breathing space to V8 to let the callbacks come in. increasing the total number of batches triggers the problem you're talking about:
so there's something there but I'm not sure where and whether it's a real memory leak on our end or not (could plausibly be up the stack or down the stack!). |
OK- thanks for the tips- that breathing space was a good idea |
@rvagg could you push the adjusted version to the gist? :) |
see the comment in the gist |
Aha- got it, cheers! |
if I run the same stuff against leveldown directly I get exactly the same behaviour so it's not in levelup. I'm still not entirely convinced this isn't a leveldb thing. if you run leak-tester.js in the tests directory of leveldown and watch it over time you'll see memory usage balloon and then settle back down again over time, it's odd behaviour but I imagine if you push hard enough you could make that ballooning push you over Node's limit. Perhaps that's what's happening here? I've also opened a leakage branch of leveldown with some minor things I've found so far. |
The GC plot makes for scary reading https://gist.github.com/No9/fa818a9d63d22551a837 (See plot at bottom of page) |
that's an odd flamegraph.. for 20 mins worth of work there's a lot of string and module stuff going on in there and I can't see much related to levelup/leveldown. Can you skip the use of lorem-ipsum since it seems to be getting in the way and see what it does with plain Buffers with |
Yup and I will also provide one that filters on doBatch so we can see if that sheds any light. |
@rvagg here is the flamegraph you were looking for. |
@dominictarr suggested compaction and now we have a clearer flamegraph that could be worth some investigation I'll run a trace on
And try and get a correlation by timestamping the above and GC so we can see possible cause and effect: (Edit) Maybe compaction is causing demands on system resources that is forcing aggressive GC by V8? |
As I said to @dominictarr it could be compaction kicking in, but this might not be a problem at all. Im not sure, but I think that |
OK So actual compactions over a 10min period look like the following (Milliseconds)
The full range of data is available here in the funky github tsv format While nearly 3 seconds is not optimal and it isn't clear how this would be/is impacting. I think I am going to go for looking at memory paging next but I will also keep chasing the V8.so as analysis of the core dump could be handy. |
@maxogden is reporting a leak trying to import via batch: https://github.com/maxogden/dat/blob/master/test/insert-junk-data.js |
outputs this: https://gist.github.com/maxogden/0ddccdd28263391a2251 ouch |
did you try using the chained batch, eg db.batch().put()......write() ? |
I'm not so sure this is just a batch issue anymore. I tried loading data via looping single puts, and I eventually hit the same out-of-memory error. Puts with {sync: true} still leak, it just takes a lot longer to get there. var levelup = require('level');
var db = levelup('./leakydb', {valueEncoding: 'json'}); // modifying cache size here doesn't help
var putOne = function (i) {
var i = i + 1;
// log a little message every 10,000 puts
if (i % 10000 == 0) {
console.log(i + " ops " + (new Date()).toISOString());
}
// if we're under 9,000,001 ops, do another put, otherwise stop
if (i < 9000001) {
var keyFriendlyI = i + 1000000; // add a million so its sort friendly
var key = "aKeyLabel~" + keyFriendlyI;
var value = {
i: i,
createdDate: new Date(),
someText: "I thought that maybe single puts would make it further than the batch approach",
moreText: "but that isn't necessarily true.",
evenMoreText: "This hits a FATAL ERROR: CALL_AND_RETRY_2 Allocation failed - process out of memory."
};
// tried setting {sync: true} here and we still run out of memory
// it just takes a lot longer to get there
db.put(key, value, function (err) {
if (err) {
console.log(err);
} else {
putOne(i);
}
});
} else {
console.log("Finished!");
}
};
putOne(0); |
I concur, my initial testing suggests it's deeper than batch |
hmm, so, what happens if you run leveldown directly, without levelup? |
@tjfontaine I think we need some SmartOS magic to hunt down some kind of memory leak. It's likely in the C++ of LevelDOWN, perhaps even has something to do with the way Node interacts with LevelDB. How's that blog post coming along about finding leaks? Some pointers would be nice. |
I haven't finished the blog post, so here is a list of steps, some are obvious they are not meant to be patronizing, just for those who also might wander into this thread.
Catch me in IRC if there are more details you need or want an extra pair of eyes |
@maxogden I have leveldown-hyper fixed up and released, you'll have to let me know if there are any compile errors (or even warnings actually, it'd be nice to get rid of those too) on OSX, it's compiling on Linux but it's a stab in the dark for me for OSX. use level-hyper in place of level, it should be a drop-in replacement so you don't have to mess around with plugging leveldown-hyper in to levelup. |
fwiw, I hit |
@nornagon chained batch or array batch? |
Array. On Tuesday, 29 October 2013, Rod Vagg wrote:
j |
You definitely want chained batch for that -- switching to it here https://github.com/brycebaril/level-bufferstreams/blob/master/write.js#L41 made a vast improvement on memory. |
We are getting the same It gets two different behavior on levelup 0.12 and 0.17. On 0.12 it gets to a halt after writing around 1.200.000 records, the CPU skyrockets but the memory usage stay steady, then write something more and then stops again. It continues doing so, on and off, for every of the following 300.000 records. On 0.17 it keeps writing, but the memory usage explodes. In the heap I see my keys, values and callbacks passed to batch (I'm using level-writestream). It is writing to the database, as I see it growing. As far as I see it, LevelDown is not freeing some stuff, or it is calling the batch callback too early. Anybody has any clue on this? |
I think I made some serious progress. I actually was able to insert 12 millions non-ordered pairs into levelup using only 300MB of RAM. I used by branch of level-ws (Level/level-ws#1) that uses the chainable batch instead of the array batch. So, I used regular streams. The same code segfault on node v0.10.21, but it works perfectly on node v0.11.8. |
This is the backtrace I am getting from gdb.
|
static size_t ByteSize(const WriteBatch* batch) {
return batch->rep_.size();
} There's basically two things that can go wrong here. Either the batch parameter is NULL, or it's pointing to dead memory, i.e. someone has deleted the pointer. |
I think it's NULL:
|
OK, it's not NULL, as it is asserted, so it's dead memory. I'm hitting this precondition:
It seems the rep_ gets GCed after the write, but leveldb internals still need it. |
From node-leveldown/deps/leveldb/leveldb-1.14.0/db/db_impl.cc // REQUIRES: Writer list must be non-empty
// REQUIRES: First writer must have a non-NULL batch
WriteBatch* DBImpl::BuildBatchGroup(Writer** last_writer) {
assert(!writers_.empty());
Writer* first = writers_.front();
WriteBatch* result = first->batch;
assert(result != NULL);
size_t size = WriteBatchInternal::ByteSize(first->batch);
..
} We need to make sure that's the case. Before printf("batch pointer: %x\n", first->batch); |
Ok, nevermind my last comment. I wrote it before I saw your comment :) |
Do you know for sure that the asserts fire if something is wrong? Some only enable asserts in e.g. debug mode. |
Confirming it. the |
While debugging my segfault issue, I think I got the problem for the memory footprint of batches. Given my scarce C++ skills I might be wrong, but check out Level/leveldown#70. |
@tjfontaine managed to track down a missing Until this makes it into a release, someone on this thread who can reproduce this problem could try patching Node source and running against that. See the patch here https://gist.github.com/tjfontaine/7394912 - basically you just need Had anyone come up with a very simple script that will reproduce this problem reliably so the rest of us can dig? |
oh, and that dtrace script in the gist might be helpful, but currently we're (naughtily) not using |
I am working to make the script more generic, but the way v8 inlines a bunch of things it's difficult to get the probe point to work on handle creation, but I'm hoping to have a good solution for it soon. |
and ftr, the last mem leak we squashed in leveldown was a missing |
I believe this is fixed now in LevelDOWN@0.10.0 which comes with LevelUP/Level@0.18.0. Can everyone who's had the problem try it out and report back so I can close this? |
I tried this out and I think its resolved! Thanks everyone! I loaded a 4 million document/2.6 GB leveldb with level 0.18.0 and it finished successfully. My node process started at 160 MB and finished at 240 MB. With previous versions my process would hit 1.6 GB after loading the initial 10% of my documents, slow to a crawl and then die shortly after that. |
Wow! So fast! So little memory! Works for me, so closing issue... Thanks for the great fix! F |
Just to point out that I got an overall 25% increase in writing speed. I'm writing around 120.000 k-v pair per second in LevelGraph. |
I wrote details here if you want the gory innards of what's in leveldown@0.10 http://r.va.gg/2013/11/leveldown-v0.10-managing-gc-in-native-v8-programming.html |
Hei
Maintainer of search-index and Norch here :)
There seems to be a memory leak in db.batch. When inserting thousands of batch files with db.batch the memory allocation of the parent app shoots up rather alarmingly, and does not come down again.
This issue has appeared in the last few weeks
Any ways to manually free up memory or otherwise force garbage collection?
F
The text was updated successfully, but these errors were encountered: