Skip to content
This repository has been archived by the owner on Dec 1, 2024. It is now read-only.

write stream redo using batch-write-stream #165

Closed
wants to merge 4 commits into from
Closed

Conversation

pgte
Copy link
Contributor

@pgte pgte commented Jul 23, 2013

For discussion:

Rewrote write_stream based on batch-write-stream.

(Unfortunately I can't extend readable-stream.Writable or stream.Writable to make it write in batches)

Benchmarks

Running the existing stream benchmark locally yielded very similar results to writestream-rewrite:

write-stream-optimization-benchmark-fix branch:

→ node stream-bench.js
--------------------------------------------------------------
Filled source! Took 15220ms, streaming to destination...
Reading took 7477ms,
Done! Took 7479ms, 49% of batch time

writestream-rewrite branch:

→ node stream-bench.js
--------------------------------------------------------------
Filled source! Took 15468ms, streaming to destination...
Read finished! Took 7946ms
Done! Took 7947ms, 51% of batch time

Still have to rerun a set of benchmarks I have and update the results page.

Magic values

All the magic values are in options:

These default values were empirically set: they simply were the values that yielded the most local performance.

As said, will rerun some benchmarks to extract some more significant numbers from this version.
My feeling is that if we want to optimize, as @dominictarr hinted, that must be done in run-time, since the numbers we'll extract will only be relevant in the local system for the db.copy pattern.

@pgte pgte mentioned this pull request Jul 23, 2013
@pgte
Copy link
Contributor Author

pgte commented Jul 23, 2013

Quick update: here are the results of the old benchmarks I had, now with the new code:

(backported this latest back to level-writestream for ease of running with and without my changes without switching branches):

Inserting 1M records in chunks of 1000 using a high-watermark of 4000 and a max of 4 concurrent batches:

pedroteixeira@Pedros-MacBook-Pro:~/projects/levelup (write-stream-optimisation)
→ git status
# On branch write-stream-optimisation
nothing to commit (working directory clean)
pedroteixeira@Pedros-MacBook-Pro:~/projects/levelup (write-stream-optimisation)
→ cd ../level-writestream
pedroteixeira@Pedros-MacBook-Pro:~/projects/level-writestream (master)
→ tests/benchmarks/old/run
Plain LevelUP...
conf: { max: 1000000,
  chunks: 1000,
  highWaterMark: 4000,
  maxConcurrentBatches: 4,
  wrap: false }
wrote 1000000 records in 18967 ms
Level-Writestream...
conf: { max: 1000000,
  chunks: 1000,
  highWaterMark: 4000,
  maxConcurrentBatches: 4,
  wrap: true }
wrote 1000000 records in 14543 ms

So the performance gains appear to be maintained in this version on my dev machine.

Would appreciate that someone contributed with his run of these plain benchmarks.

@dominictarr
Copy link
Contributor

hmm, so if we want to get rid of magic numbers,
we need to find inputs that cause them to fail badly.

then we can look for algorithms that support a range of inputs.

@pgte
Copy link
Contributor Author

pgte commented Jul 24, 2013

Quick update on benchmarks: reran benchmark suite, here are the results:
http://metaduck.com/levelup/writestream/benchmarks/plot.html
The numbers are consistent between runs and that locally this is generally faster than the write-stream-optimization branch.
Still don't have statistical relevance numbers (I'd have to run the same experience several times to infer that).

@pgte
Copy link
Contributor Author

pgte commented Aug 7, 2013

Been thinking about this, and I think, just like the socket.setNoDelay() API, we should let the user define the delay.

Anyway, this implementation looks faster for most cases I've run and passes all the tests, feel free to merge it.

@rvagg
Copy link
Member

rvagg commented Aug 7, 2013

I'd like to have a quick look at this before we release it but I'm travelling over the next 2 days. Will try and have a peek soon and then we can release and see how it goes in the wild!

@pgte pgte mentioned this pull request Aug 18, 2013
db.once('ready', ready)
this.once('finish', onFinish.bind(this))

BatchWriteStream.call(this, {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be best to just xtend(options, { objectMode: true }) here to save a few lines, unless we're trying to avoid passing something on?

@rvagg
Copy link
Member

rvagg commented Aug 20, 2013

Fstream stuff can be removed from the tests. Including the functional tests and the dependencies that go along with them.

@juliangruber
Copy link
Member

tried runnign this and get a couple of errors in the tests, some stream errors and this:

/Users/julian/pro/node-levelup/test/read-stream-test.js:18
            , bigBlob    = Array.apply(null, Array(1024 * 100)).map(function () { return
                                 ^
          illegal access

@rvagg
Copy link
Member

rvagg commented Aug 28, 2013

what what waht???

which version of Node is this.. there was a problem with Arrays this size in the V8 that shipped with 0.11.3 but it's fixed before and after that.

@juliangruber
Copy link
Member

oh, was using 0.11.6

@mcollina
Copy link
Member

mcollina commented Sep 2, 2013

I tried this WriteStream in LevelGraph, and it gave me a 30%+ improvement, so I'm definitely 👍 for this to get in.

@pgte
Copy link
Contributor Author

pgte commented Nov 12, 2013

Makes no sense anymore since writestream is getting pulled out.

@pgte pgte closed this Nov 12, 2013
@ralphtheninja ralphtheninja deleted the writestream-rewrite branch August 29, 2017 00:53
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants