Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarks: End iterator manually vs on GC #627

Closed
peakji opened this issue Apr 27, 2019 · 4 comments
Closed

Benchmarks: End iterator manually vs on GC #627

peakji opened this issue Apr 27, 2019 · 4 comments

Comments

@peakji
Copy link
Member

peakji commented Apr 27, 2019

Original discussion: #601

Benchmark script and full outputs: https://gist.github.com/peakji/0fd6c1529951767697480e708806ae33


The benchmark script records time usage of each epoch. And an epoch comes with two flavors: sequential and parallel.

In sequential mode, we first insert some pseudo-random keys (deterministic) with a batch, then scan the keys using an iterator. In parallel mode we insert and scan at the same time, and of course unlike the sequential mode, the randomness makes the program uncertain / nondeterministic.

Between epochs, we do not clear the database.


Result shows that ending on GC is slower than ending eagerly on a busy server with mixed workload. I guess the main cause is that holding snapshots might affect compaction of new keys after the SequenceNumber.

master-sequential gc-sequential master-parallel gc-parallel
Total time (nanoseconds) 90005813332 104395061841 81394247896 94485047913
@vweevers
Copy link
Member

I guess the main cause is that holding snapshots might affect compression of new keys

To be sure: do you mean compaction or compression?

Maybe you can test this hypothesis by manually triggering GC with --expose-gc and global.gc()? That will make the test slower, so it's not suitable to compare to the above tests. Perhaps compare two different manual GC intervals. The expected outcome is that doing less GC cycles is slower.

@vweevers
Copy link
Member

I'm leaning towards dropping the idea (of ending on GC) entirely.

@chjj To get back to your original assertion that it's impossible to catch errors, it's not:

async function* test () {
  try {
    yield 'result'
  } finally {
    // End the iterator here
    console.log('end')
  }
}

async function main () {
  try {
    for await (let result of test()) {
      throw new Error('error')
    }
  } catch (err) {
    console.log(err.message)
  }
}

main()
$ node example.js
end
error

@peakji
Copy link
Member Author

peakji commented Apr 28, 2019

To be sure: do you mean compaction or compression?

My mistake, it should be compaction.

Maybe you can test this hypothesis by manually triggering GC with --expose-gc and global.gc()?

The benchmark was designed to simulate a busy system, where node.js often delays garbage collection until exceeding the memory limit, or simply until when it think it's the time. While relying on GC to end iterators, this behavior will retain a lot of unnecessary snapshots. My point is that we could not trust GC for these kind of tasks, unless using manual global.gc() in production.

Perhaps compare two different manual GC intervals.

Good point! I'll update the script and maybe add some batch.del() operations as well.

@vweevers
Copy link
Member

After thinking about it some more, I agree with your reasoning and see no reason to confirm it in different ways (like comparing GC intervals).

Closing this; let's decide on next steps in #601.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants