Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

End iterator on GC or db close #601

Closed
wants to merge 4 commits into from
Closed

End iterator on GC or db close #601

wants to merge 4 commits into from

Conversation

vweevers
Copy link
Member

Removing the need for the user to do iterator.end(), following the suggestion by @chjj in #597 (comment).

For early review; needs a CI run, some light refactoring and comments to clarify the order of things.

@vweevers
Copy link
Member Author

office_so-close

test/iterator-test.js Outdated Show resolved Hide resolved
binding.cc Outdated Show resolved Hide resolved
@vweevers
Copy link
Member Author

Gonna take a break from this for a few days. I'm close and the code is fairly clean now, but I feel like I'm stuck in a bike shed and need a breather to regain clarity of the big picture.

binding.cc Outdated Show resolved Hide resolved
@chjj
Copy link
Contributor

chjj commented Mar 22, 2019

Wow, great work! I've thought about this for a long time but never realized the complexity involved in actually implementing it. I'll try to play around with this and see if I can break it.

@vweevers
Copy link
Member Author

@chjj Don't dive too deep yet because I plan to simplify it (but haven't had much time)

@vweevers
Copy link
Member Author

Flowchart:

leveldown iterator flowchart

@vweevers
Copy link
Member Author

Gonna open it up for review (and @chjj go! kick the tires!) though there's some cleanup left (reverting the repeated tests and then squashing). I prefer to keep repeating tests when changes are requested.

@vweevers vweevers marked this pull request as ready for review March 23, 2019 17:01
@vweevers vweevers requested a review from peakji March 23, 2019 17:02
iterator.js Outdated Show resolved Hide resolved
iterator.js Outdated Show resolved Hide resolved
/**
* Worker class for ending an iterator
*/
struct EndWorker final : public BaseWorker {
Copy link
Member

@peakji peakji Mar 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of ending on GC, but it will be nice if we keep the ability to manually end iterators as well.

One may not want to fully rely on GC on a busy server if there is a clean way to eagerly release an iterator (and the underlying snapshot and resources).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer either/or. Explicit resource management, or GC. Supporting both is not impossible (the EndWorker could do the same for a single iterator as what the CloseWorker does for all iterators now), but it raises the question of what we recommend as best practice - and whether modules that wrap iterators should end explicitly or not.

Would be nice to do some real world benchmarks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 for benchmarks. I thought ending on GC was purely for better promise support as @chjj mentioned in #597 (comment).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what to benchmark. Ending on GC will have some cost, but what cost is acceptable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of my private project use iterators for prefix searching, with lots of concurrent seek next and (early) end. I would like to benchmark the new GC-version with my project (especially on write intensive workloads), but I'm pretty busy for the next few days...

Shall we release v5.0.0 first and work on this PR later?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already released it :)

Copy link
Member

@peakji peakji Mar 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern is about the underlying snapshot. I remember that holding snapshots might affect compaction of new keys after the SequenceNumber, so LevelDB suggests to release snapshots as soon as possible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll assign you to this PR, no rush.

binding.cc Show resolved Hide resolved
@vweevers vweevers added the semver-minor New features that are backward compatible label Mar 31, 2019
@peakji
Copy link
Member

peakji commented Apr 10, 2019

🤔 In order to benchmark the two implementations, I've routed some real world workload to a test server. I started with the original v5.0.0, but it crashed on segmentation fault (status=11/SEGV) after about an hour.

Now I'm trying to reproduce the issue in local environment, I think it should be top priority since v5.0.0+ is already on NPM. I'll open a new issue or PR after locating the cause, and go back to this PR later.

@vweevers
Copy link
Member Author

I think it should be top priority since v5.0.0+ is already on NPM.

👍

@vweevers
Copy link
Member Author

@peakji Can you share some details about the test you did? What kind of operations, open/close states, a stack trace if you have it, etc?

@peakji
Copy link
Member

peakji commented Apr 21, 2019

@vweevers It was a segment fault so no useful stack trace. I've enabled core dump yesterday but unfortunately it didn't break (till now), and I have a feeling that it will break tonight 😜. I'll do a back trace with lldb and share the result after I got the core dump file.

In addition, I can confirm that the issue is not related to Iterator. The test software has two repeating phases: upsert and iterate, and it broke during a upsert phase, so we can focus on checking get, put, del, and Batch. I guess we can start with looking for dereferencing NULL?

@vweevers
Copy link
Member Author

it broke during a upsert phase, so we can focus on checking get, put, del, and Batch. I guess we can start with looking for dereferencing NULL?

Does the software open and close the database during this phase? If not, that rules out an obvious suspect

@peakji
Copy link
Member

peakji commented Apr 21, 2019

Does the software open and close the database during this phase? If not, that rules out an obvious suspect

No, the software opens a database on start and uses it forever, and I'm sure it opened successfully.

@peakji
Copy link
Member

peakji commented Apr 22, 2019

The software is still up and running, no sign of segment fault, monitor info looks perfect 🙄. I know its kinda weird to wish your own software to crash, but nothing could be fixed by simply enabling core dump and reboot...

In fact what worries me this most is that the segfault is NOT related to Iterator, because it means that:

  1. We cannot expect this PR to fix the issue.

  2. If we share binding.cc -> rocksdb or others, the issue might come along as well.

@vweevers With your knowledge of the current code base, if you could review and confirm there is no obvious illegal memory access (e.g. out-of-bound, dereferencing NULL, ...), then I'd rather suspect the segfault was caused by the old node v10.4.1 on the server. Sorry that I can't be of any useful help without a dump file, still haven't got the time to learn N-API.

@vweevers
Copy link
Member Author

vweevers commented Apr 22, 2019

Will hunt for issues at the end of this week. Do your operations have a predefined order (e.g. a get followed by a del) or is it random? Are you using db.batch([]) (array form) or db.batch() (chained form)? Lastly, are you reading/writing buffer or string keys/values?

I will deploy leveldown@5 to a production system myself in the next 2 weeks, but that one only does db.batch([]) (array form) and db.iterator() operations.

@peakji
Copy link
Member

peakji commented Apr 22, 2019

Do your operations have a predefined order (e.g. a get followed by a del) or is it random?

Yes, its always get -> put and/or del -> batch(s). However, I have two queues running simultaneously, so it's actually random from the database perspective. The workflow is simple: first get some metadata, then update with put and/or remove stale metadata with del, then finally write data chunks using one or more batchs.

Are you using db.batch([]) (array form) or db.batch() (chained form)?

I am using the chained form, with mixed put and del operations. The average number of keys per batch is quite big (~hundreds of puts and a couple of dels), since my original intention was to benchmark the two Iterator implementations.

I will deploy leveldown@5 to a production system myself in the next 2 weeks, but that one only does db.batch([]) (array form) and db.iterator() operations.

I will keep core dump enabled for the next ... umm... forever! until it breaks!

@vweevers
Copy link
Member Author

Thanks! I made an edit to my questions just a minute ago before you answered:

Lastly, are you reading/writing buffer or string keys/values?

@peakji
Copy link
Member

peakji commented Apr 22, 2019

Lastly, are you reading/writing buffer or string keys/values?

I'm using buffers for both key and value in all the operations.

@vweevers
Copy link
Member Author

I personally need #32 more than I need this (in fact I don't need this at all).

@peakji Any objections to me doing #32 first (utilizing PriorityWorker for all operations), releasing that as semver-patch, and us getting back to this PR later?

@peakji
Copy link
Member

peakji commented Apr 23, 2019

releasing that as semver-patch, and us getting back to this PR later?

👍 I'll deploy the new patched version on a separated server.

@vweevers
Copy link
Member Author

Released 5.0.2

I'll rebase this branch later.

@vweevers
Copy link
Member Author

Rebased. I'll also see if I can move some of the smaller refactorings I did here into separate PRs.

@vweevers
Copy link
Member Author

vweevers commented Apr 26, 2019

@peakji Good news (or bad, depending how you look at it): I found a segfault. Happens when a chained batch is GC-ed in between batch.write(callback) and BatchWriteWorker->DoExecute().

Opened #620

@peakji
Copy link
Member

peakji commented Apr 26, 2019

@vweevers Absolutely good news!

This solved my doubts, in fact I thought it might be related to batch, but not about GC. I was creating a lot of batches in my code, but not all of them were being populated. I thought it was the empty batches which caused the memory error.

BTW, what do you think is the best practice for empty batches (created but unused)? Call write or just leave them to the collector?

@vweevers
Copy link
Member Author

vweevers commented Apr 26, 2019

BTW, what do you think is the best practice for empty batches (created but unused)? Call write or just leave them to the collector?

There's no benefit to calling write(), I think. Especially when #619 lands, which makes BatchWriteWorker->DoExecute() a noop on empty batches. And when #621 lands, calling write would merely delay GC (and unnecessarily acquire a thread in the worker pool).

@peakji
Copy link
Member

peakji commented Apr 26, 2019

Just reviewed the new PRs, nice work @vweevers ! These should make a new semver-patch?

I'm now working on a standalone benchmark script for this PR (finally!).

@vweevers
Copy link
Member Author

These should make a new semver-patch?

Yup! Will try to get that out today or tomorrow.

I'm now working on a standalone benchmark script for this PR (finally!).

Awesome!

@vweevers
Copy link
Member Author

(Be advised, GitHub shows the commits in the wrong order due to rebasing)

@vweevers

This comment has been minimized.

@vweevers
Copy link
Member Author

@peakji @chjj @ralphtheninja A summary so we're all on the same page:

Unless there are other reasons for wanting to end on GC, I propose we close this.

One possible reason is "safety" - i.e. when the user forgets to call end(). IMO that's hand-holding, not worth the added code complexity.

@vweevers vweevers removed their assignment May 18, 2019
@vweevers vweevers closed this May 18, 2019
@vweevers vweevers deleted the end-iterator-on-gc branch May 18, 2019 08:17
@peakji
Copy link
Member

peakji commented May 18, 2019

One possible reason is "safety" - i.e. when the user forgets to call end(). IMO that's hand-holding, not worth the added code complexity.

👍 not worth the added code complexity.

Someone forgetting to call end() on iterators may also forget to call end() on http requests, file streams, database connections, sockets ... 😛

@vweevers vweevers mentioned this pull request Sep 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
semver-minor New features that are backward compatible
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants