End iterator on GC or db close #601

vweevers · 2019-03-15T15:34:16Z

Removing the need for the user to do iterator.end(), following the suggestion by @chjj in #597 (comment).

For early review; needs a CI run, some light refactoring and comments to clarify the order of things.

vweevers · 2019-03-15T15:49:36Z

test/iterator-test.js

binding.cc

vweevers · 2019-03-16T15:51:52Z

Gonna take a break from this for a few days. I'm close and the code is fairly clean now, but I feel like I'm stuck in a bike shed and need a breather to regain clarity of the big picture.

binding.cc

chjj · 2019-03-22T16:58:58Z

Wow, great work! I've thought about this for a long time but never realized the complexity involved in actually implementing it. I'll try to play around with this and see if I can break it.

vweevers · 2019-03-22T17:01:47Z

@chjj Don't dive too deep yet because I plan to simplify it (but haven't had much time)

vweevers · 2019-03-23T16:57:28Z

Flowchart:

vweevers · 2019-03-23T17:01:07Z

Gonna open it up for review (and @chjj go! kick the tires!) though there's some cleanup left (reverting the repeated tests and then squashing). I prefer to keep repeating tests when changes are requested.

iterator.js

peakji · 2019-03-25T04:23:02Z

binding.cc

-/**
- * Worker class for ending an iterator
- */
-struct EndWorker final : public BaseWorker {


I like the idea of ending on GC, but it will be nice if we keep the ability to manually end iterators as well.

One may not want to fully rely on GC on a busy server if there is a clean way to eagerly release an iterator (and the underlying snapshot and resources).

I prefer either/or. Explicit resource management, or GC. Supporting both is not impossible (the EndWorker could do the same for a single iterator as what the CloseWorker does for all iterators now), but it raises the question of what we recommend as best practice - and whether modules that wrap iterators should end explicitly or not.

Would be nice to do some real world benchmarks.

👍 for benchmarks. I thought ending on GC was purely for better promise support as @chjj mentioned in #597 (comment).

I'm not sure what to benchmark. Ending on GC will have some cost, but what cost is acceptable?

One of my private project use iterators for prefix searching, with lots of concurrent seek next and (early) end. I would like to benchmark the new GC-version with my project (especially on write intensive workloads), but I'm pretty busy for the next few days...

Shall we release v5.0.0 first and work on this PR later?

Already released it :)

My main concern is about the underlying snapshot. I remember that holding snapshots might affect compaction of new keys after the SequenceNumber, so LevelDB suggests to release snapshots as soon as possible.

I'll assign you to this PR, no rush.

binding.cc

peakji · 2019-04-10T15:58:42Z

🤔 In order to benchmark the two implementations, I've routed some real world workload to a test server. I started with the original v5.0.0, but it crashed on segmentation fault (status=11/SEGV) after about an hour.

Now I'm trying to reproduce the issue in local environment, I think it should be top priority since v5.0.0+ is already on NPM. I'll open a new issue or PR after locating the cause, and go back to this PR later.

vweevers · 2019-04-10T16:06:07Z

I think it should be top priority since v5.0.0+ is already on NPM.

👍

vweevers · 2019-04-21T10:28:10Z

@peakji Can you share some details about the test you did? What kind of operations, open/close states, a stack trace if you have it, etc?

peakji · 2019-04-21T10:58:13Z

@vweevers It was a segment fault so no useful stack trace. I've enabled core dump yesterday but unfortunately it didn't break (till now), and I have a feeling that it will break tonight 😜. I'll do a back trace with lldb and share the result after I got the core dump file.

In addition, I can confirm that the issue is not related to Iterator. The test software has two repeating phases: upsert and iterate, and it broke during a upsert phase, so we can focus on checking get, put, del, and Batch. I guess we can start with looking for dereferencing NULL?

vweevers · 2019-04-21T11:02:29Z

it broke during a upsert phase, so we can focus on checking get, put, del, and Batch. I guess we can start with looking for dereferencing NULL?

Does the software open and close the database during this phase? If not, that rules out an obvious suspect

peakji · 2019-04-21T11:09:33Z

Does the software open and close the database during this phase? If not, that rules out an obvious suspect

No, the software opens a database on start and uses it forever, and I'm sure it opened successfully.

peakji · 2019-04-22T10:33:06Z

The software is still up and running, no sign of segment fault, monitor info looks perfect 🙄. I know its kinda weird to wish your own software to crash, but nothing could be fixed by simply enabling core dump and reboot...

In fact what worries me this most is that the segfault is NOT related to Iterator, because it means that:

We cannot expect this PR to fix the issue.
If we share binding.cc -> rocksdb or others, the issue might come along as well.

@vweevers With your knowledge of the current code base, if you could review and confirm there is no obvious illegal memory access (e.g. out-of-bound, dereferencing NULL, ...), then I'd rather suspect the segfault was caused by the old node v10.4.1 on the server. Sorry that I can't be of any useful help without a dump file, still haven't got the time to learn N-API.

vweevers · 2019-04-22T15:59:00Z

Will hunt for issues at the end of this week. Do your operations have a predefined order (e.g. a get followed by a del) or is it random? Are you using db.batch([]) (array form) or db.batch() (chained form)? Lastly, are you reading/writing buffer or string keys/values?

I will deploy leveldown@5 to a production system myself in the next 2 weeks, but that one only does db.batch([]) (array form) and db.iterator() operations.

peakji · 2019-04-22T17:14:56Z

Do your operations have a predefined order (e.g. a get followed by a del) or is it random?

Yes, its always get -> put and/or del -> batch(s). However, I have two queues running simultaneously, so it's actually random from the database perspective. The workflow is simple: first get some metadata, then update with put and/or remove stale metadata with del, then finally write data chunks using one or more batchs.

Are you using db.batch([]) (array form) or db.batch() (chained form)?

I am using the chained form, with mixed put and del operations. The average number of keys per batch is quite big (~hundreds of puts and a couple of dels), since my original intention was to benchmark the two Iterator implementations.

I will deploy leveldown@5 to a production system myself in the next 2 weeks, but that one only does db.batch([]) (array form) and db.iterator() operations.

I will keep core dump enabled for the next ... umm... forever! until it breaks!

vweevers · 2019-04-22T17:18:40Z

Thanks! I made an edit to my questions just a minute ago before you answered:

Lastly, are you reading/writing buffer or string keys/values?

peakji · 2019-04-22T17:29:38Z

Lastly, are you reading/writing buffer or string keys/values?

I'm using buffers for both key and value in all the operations.

vweevers · 2019-04-23T12:28:48Z

I personally need #32 more than I need this (in fact I don't need this at all).

@peakji Any objections to me doing #32 first (utilizing PriorityWorker for all operations), releasing that as semver-patch, and us getting back to this PR later?

peakji · 2019-04-23T12:51:22Z

releasing that as semver-patch, and us getting back to this PR later?

👍 I'll deploy the new patched version on a separated server.

vweevers · 2019-04-23T18:50:16Z

Released 5.0.2

I'll rebase this branch later.

vweevers · 2019-04-26T07:16:59Z

Rebased. I'll also see if I can move some of the smaller refactorings I did here into separate PRs.

vweevers · 2019-04-26T10:47:10Z

@peakji Good news (or bad, depending how you look at it): I found a segfault. Happens when a chained batch is GC-ed in between batch.write(callback) and BatchWriteWorker->DoExecute().

Opened #620

peakji · 2019-04-26T12:55:49Z

@vweevers Absolutely good news!

This solved my doubts, in fact I thought it might be related to batch, but not about GC. I was creating a lot of batches in my code, but not all of them were being populated. I thought it was the empty batches which caused the memory error.

BTW, what do you think is the best practice for empty batches (created but unused)? Call write or just leave them to the collector?

vweevers · 2019-04-26T13:05:30Z

BTW, what do you think is the best practice for empty batches (created but unused)? Call write or just leave them to the collector?

There's no benefit to calling write(), I think. Especially when #619 lands, which makes BatchWriteWorker->DoExecute() a noop on empty batches. And when #621 lands, calling write would merely delay GC (and unnecessarily acquire a thread in the worker pool).

peakji · 2019-04-26T14:13:37Z

Just reviewed the new PRs, nice work @vweevers ! These should make a new semver-patch?

I'm now working on a standalone benchmark script for this PR (finally!).

vweevers · 2019-04-26T14:20:05Z

These should make a new semver-patch?

Yup! Will try to get that out today or tomorrow.

I'm now working on a standalone benchmark script for this PR (finally!).

Awesome!

vweevers · 2019-04-26T15:50:26Z

(Be advised, GitHub shows the commits in the wrong order due to rebasing)

vweevers · 2019-05-18T07:29:32Z

@peakji @chjj @ralphtheninja A summary so we're all on the same page:

Ending iterators on GC degrades performance, because holding too many snapshots negatively affects compaction of new keys (Benchmarks: End iterator manually vs on GC #627)
Ending iterators automatically in for await..of, even when an error is thrown by the user in the body of the loop, is feasible (Proof of concept: asyncIterator abstract-leveldown#338)
That invalidates the original reason for exploring the idea of ending on GC.

Unless there are other reasons for wanting to end on GC, I propose we close this.

One possible reason is "safety" - i.e. when the user forgets to call end(). IMO that's hand-holding, not worth the added code complexity.

peakji · 2019-05-18T08:19:16Z

One possible reason is "safety" - i.e. when the user forgets to call end(). IMO that's hand-holding, not worth the added code complexity.

👍 not worth the added code complexity.

Someone forgetting to call end() on iterators may also forget to call end() on http requests, file streams, database connections, sockets ... 😛

vweevers commented Mar 15, 2019

View reviewed changes

test/iterator-test.js Outdated Show resolved Hide resolved

vweevers commented Mar 16, 2019

View reviewed changes

binding.cc Outdated Show resolved Hide resolved

vweevers commented Mar 17, 2019

View reviewed changes

binding.cc Outdated Show resolved Hide resolved

vweevers marked this pull request as ready for review March 23, 2019 17:01

vweevers requested a review from peakji March 23, 2019 17:02

peakji reviewed Mar 25, 2019

View reviewed changes

vweevers assigned peakji Mar 29, 2019

vweevers added the semver-minor New features that are backward compatible label Mar 31, 2019

vweevers self-assigned this Apr 23, 2019

vweevers force-pushed the end-iterator-on-gc branch from 1506628 to e546575 Compare April 26, 2019 07:13

vweevers force-pushed the end-iterator-on-gc branch from e546575 to cca5004 Compare April 26, 2019 08:00

This was referenced Apr 26, 2019

Refactor worker cleanup into BaseWorker::DoFinally() #617

Merged

Throw error in iterator_seek if iterator has ended #618

Merged

Skip writing empty (chained) batch and make callbacks asynchronous #619

Merged

vweevers force-pushed the end-iterator-on-gc branch from cca5004 to ad9e102 Compare April 26, 2019 15:45

This comment has been minimized.

Sign in to view

vweevers added 4 commits April 26, 2019 18:13

Temporarily repeat tests and run test-gc in travis

50fb59d

End iterator on GC or db close

33e36b1

Update comments in cleanup-hanging-iterators-test

0b8636f

Test accessing an iterator after db.close()

3406bba

vweevers force-pushed the end-iterator-on-gc branch from ad9e102 to 3406bba Compare April 26, 2019 16:18

This was referenced Apr 27, 2019

Chained batch should be persistent #620

Closed

Benchmarks: End iterator manually vs on GC #627

Closed

vweevers mentioned this pull request Apr 27, 2019

Proof of concept: asyncIterator Level/abstract-leveldown#338

Closed

vweevers removed their assignment May 18, 2019

vweevers closed this May 18, 2019

vweevers deleted the end-iterator-on-gc branch May 18, 2019 08:17

vweevers mentioned this pull request Sep 20, 2019

weak reference cleanup #667

Closed

End iterator on GC or db close #601

End iterator on GC or db close #601

Conversation

vweevers commented Mar 15, 2019

vweevers commented Mar 15, 2019

vweevers commented Mar 16, 2019

chjj commented Mar 22, 2019

vweevers commented Mar 22, 2019

vweevers commented Mar 23, 2019

vweevers commented Mar 23, 2019

peakji Mar 25, 2019 • edited Loading

Choose a reason for hiding this comment

vweevers Mar 25, 2019

Choose a reason for hiding this comment

peakji Mar 25, 2019

Choose a reason for hiding this comment

vweevers Mar 29, 2019

Choose a reason for hiding this comment

peakji Mar 29, 2019

Choose a reason for hiding this comment

vweevers Mar 29, 2019

Choose a reason for hiding this comment

peakji Mar 29, 2019 • edited Loading

Choose a reason for hiding this comment

vweevers Mar 29, 2019

Choose a reason for hiding this comment

peakji commented Apr 10, 2019

vweevers commented Apr 10, 2019

vweevers commented Apr 21, 2019

peakji commented Apr 21, 2019

vweevers commented Apr 21, 2019

peakji commented Apr 21, 2019

peakji commented Apr 22, 2019

vweevers commented Apr 22, 2019 • edited Loading

peakji commented Apr 22, 2019

vweevers commented Apr 22, 2019

peakji commented Apr 22, 2019

vweevers commented Apr 23, 2019

peakji commented Apr 23, 2019

vweevers commented Apr 23, 2019

vweevers commented Apr 26, 2019

vweevers commented Apr 26, 2019 • edited Loading

peakji commented Apr 26, 2019

vweevers commented Apr 26, 2019 • edited Loading

peakji commented Apr 26, 2019

vweevers commented Apr 26, 2019

vweevers commented Apr 26, 2019

This comment has been minimized.

vweevers commented May 18, 2019

peakji commented May 18, 2019

peakji Mar 25, 2019 •

edited

Loading

peakji Mar 29, 2019 •

edited

Loading

vweevers commented Apr 22, 2019 •

edited

Loading

vweevers commented Apr 26, 2019 •

edited

Loading

vweevers commented Apr 26, 2019 •

edited

Loading