Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chained batch should be persistent #620

Closed
vweevers opened this issue Apr 26, 2019 · 5 comments · Fixed by #621
Closed

Chained batch should be persistent #620

vweevers opened this issue Apr 26, 2019 · 5 comments · Fixed by #621
Assignees
Labels

Comments

@vweevers
Copy link
Member

A segfault can happen when a chained batch is GC-ed in between batch.write(callback) and BatchWriteWorker->DoExecute().

@vweevers
Copy link
Member Author

vweevers commented Apr 26, 2019

Wrote a test and ran it in v4 as well. Only v5 has the bug.

In v4 we do:

leveldown/src/batch.cc

Lines 137 to 139 in 0cb52da

// persist to prevent accidental GC
v8::Local<v8::Object> _this = info.This();
worker->SaveToPersistent("batch", _this);

@peakji
Copy link
Member

peakji commented Apr 27, 2019

I've obtained a core dump file on segfault 11/SEGV from a server running v5.0.0 (been waiting for a week!).

Back trace shows the segment fault happened on a uv worker pool thread, but not sure if it is running BatchWriteWorker->DoExecute() or not.

However, I found one unwritten batch which means the software crashed while writing batches. Since the software does not perform other operations during this phase, I think it is safe to say this PR fixed the segfault described in #601. 🎉

I'll deploy v5.0.3 today and see if it crashes.


Back trace generated by lldb & llnode:

backtrace

Objects:

objects

@vweevers
Copy link
Member Author

@peakji that's good! Your analysis makes sense. I assume you're running parallel batches? The presence of that unwritten batch seems to indicate GC did not collect it yet, so the segfault must have been caused by another batch (perhaps those "Invalid values"?).

I also deployed 5.0.3 a few hours ago. Ended up using chained batch. Low volume atm, increasing on monday. In a week or so, our use cases combined should give us all the confidence we need 💯

@vweevers
Copy link
Member Author

I've since recorded 204M operations without any problems. 🎈

@peakji
Copy link
Member

peakji commented May 18, 2019

Same here! I've been using v5.0.3 in production for two weeks (wrote ~400GB to disk), everything worked perfectly. 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants