(DO NOT LAND YET) experiment: socket.read(buf) #6923

indutny · 2016-05-22T20:59:23Z

+1's will be removed, please use GitHub reactions (- Fishrock123)

Raw numbers to convince

Throughput before this patch:

$ node client.js
2.2 gb/s

Throughput with this patch:

$ node client.js buf
5.6 gb/s

Benchmarks code

Rationale

Lots of time is spent in malloc/free calls and V8's GC to manage Buffer instances when reading data from the socket. However, many use cases could be rewritten in such way that the Buffer will be allocated only once and reused. Current API doesn't allow this, but with a slight modification to stream.Readable it could:

const buf = Buffer.alloc(1024 * 1024);
const chunk = stream.read(buf);
// NOTE: `chunk` may be either a slice of the `buf`, or a different buffer
// if the underlying stream does not support this mode

In case of this .read(buf) call, one more argument will be passed to _read function:

Socket.prototype._read = function _read(n, buf) {
  if (buf) {
    // Do something...
  }
};

Intention of this PR

This PR contains experimental implementation of this feature, and I would like to ask @nodejs/collaborators, @nodejs/streams, and @nodejs/ctc to weigh in and provide some feedback on it.

Thanks!

If we will reach some preliminary consensus on this here, I'll move the discussion over to https://github.com/nodejs/node-eps

indutny · 2016-05-22T21:01:57Z

cc @trevnorris @bnoordhuis

Fishrock123 · 2016-05-22T22:04:26Z

@indutny What's the benchmark you used for this?

vkurchatkin · 2016-05-22T22:04:37Z

@indutny I'm not sure how it's possible to make it work like you describe. read is synchronous, so if you pass the buffer, you can only receive it on next read. It will probably be very hard to guarantee that you can actually reuse the buffer safely at any point in the future

mafintosh · 2016-05-22T22:08:35Z

Whats the downside of implementing this as an option instead? Like

var stream = net.connect({
  reuseBuffers: true
})

stream.on('data', function (data) {
  // data is always a slice of the same buffer
})

That would make it easier to use with .pipe

vkurchatkin · 2016-05-22T22:15:51Z

@mafintosh there is no much value in this API, because you would have to always copy buffer, if you want to actually do something with it. Consumer needs a way to release buffer

vkurchatkin · 2016-05-22T22:19:21Z

Ideally, API should be:

socket.read(buffer, (err) => {});

mafintosh · 2016-05-22T22:33:25Z

@vkurchatkin you could pause the stream if you want to use the buffer async. or use the classic .read() api for that matter

indutny · 2016-05-22T22:39:47Z

@Fishrock123

What's the benchmark you used for this?

https://gist.github.com/indutny/1a004ef317fe62d923a87c084b7fd731

@vkurchatkin

I'm not sure how it's possible to make it work like you describe. read is synchronous, so if you pass the buffer, you can only receive it on next read. It will probably be very hard to guarantee that you can actually reuse the buffer safely at any point in the future.

It quite works in this PR. This PR is just an experiment though, it doesn't meant to be what the API will look like in the future. The way it works for .read() right now, is that the buffer argument is merely a hint for it, not a requirement.

We can certainly make this API more explicit, and make it copy the data from internal buffers if buf is passed. I'm totally open to any suggestion!

there is no much value in this API, because you would have to always copy buffer, if you want to actually do something with it. Consumer needs a way to release buffer

While I agree that this API can be rather uncomfortable to use and perhaps is a bit limiting, the point about always copy didn't really convince me. Evidently, this sort of API is aimed at users who want to achieve maximum possible performance at the price of code complexity.

There are many ways it could be useful without copying - streaming parsers, piping stuff to other socket (it copies input internally anyway), etc.

indutny · 2016-05-22T22:41:59Z

@vkurchatkin

Ideally, API should be: socket.read(buffer, (err) => {});

I don't think that it quite fits into readable event model that we have right now, but this kind of API is certainly a valid option. Though, I'd recommend calling it readInto instead.

vkurchatkin · 2016-05-22T22:59:43Z

and make it copy the data from internal buffers if buf is passed

don't see the point, I think the idea is to read directly into provided buffer.

While I agree that this API can be rather uncomfortable to use and perhaps is a bit limiting

This was a comment about reuseBuffers

I think what is required for achieving maximum performance is way for user to provide a buffer and a guarantee that buffer doesn't leak to other places, so no data events, etc. Passing along offsets may also be beneficial:

var message = pool.alloc(10);

socket.readInto(message, 0, 1, (err, n) => {
   if (n < 1) // retry
   var size = message[0];

  if (size > 10) {
     pool.release(message);
     message = new Buffer(size);
     message[0] = size;
  }

  // read the rest
});

indutny · 2016-05-22T23:02:21Z

@vkurchatkin good point about data events. Do you suggest that this API should be only for Socket instances, and not for stream.Readable in general?

vkurchatkin · 2016-05-22T23:08:48Z

Do you suggest that this API should be only for Socket instances, and not for stream.Readable in general?

Basically, yes, I don't think it's needed elsewhere.

indutny · 2016-05-23T00:27:16Z

@vkurchatkin how should it interop with the Streams API?

mcollina · 2016-05-23T07:03:52Z

Great work @indutny, but I am not sure how many users will benefit from this as most of the interaction with streams is through pipes. Probably HTTP and TLS in core might benefit a lot, and it might be good to have this as part of those, but I do not see how this can work for pipe in practice.

Random ideas to make this user friendly: instead of asking users to recycle their buffers, why don't we do it automatically for each socket? Can we track down if a buffer is about to be collected, and avoid that?
If doing it automatically is not possible, why don't we provide a queue of globally available buffers, so that applications can do their own recycling when they are done? Yes, this is very similar to the old "SlowBuffer" concept.

indutny · 2016-05-23T07:15:19Z

@mcollina HTTP and TLS already receive data directly from C++ with minimal possible allocations. Though, with these APIs we may migrate from StreamBase to JS solution that will be as fast (cc @bnoordhuis , you will like this)

indutny · 2016-05-23T07:16:40Z

Random ideas to make this user friendly: instead of asking users to recycle their buffers, why don't we do it automatically for each socket? Can we track down if a buffer is about to be collected, and avoid that?
If doing it automatically is not possible, why don't we provide a queue of globally available buffers, so that applications can do their own recycling when they are done? Yes, this is very similar to the old "SlowBuffer" concept.

I believe this won't help... Though, I haven't tried it. It looks like the costs of tracking are very likely going to be bigger than the costs of the same tracking done in C++ (V8 and node).

mcollina · 2016-05-23T07:32:34Z

It looks like the costs of tracking are very likely going to be bigger than the costs of the same tracking done in C++ (V8 and node).

Highly probable, but maybe slightly faster because of specialization. On the other end, this might reduce the number of objects that needs to be collected, thus reducing gc time. Plus all that time spent in malloc/free will be gone. It might turn out to be faster anyway (albeit not 3x fast, I'll be happy to be x1.5 fast)

indutny · 2016-05-23T07:40:48Z

@mcollina it won't remove the costs of malloc/free though, as we will need to keep the freelist ourselves now. Which means reimplementing part of malloc/free in our codebase. Nevertheless, the GC costs will go down indeed.

Fishrock123 · 2016-05-23T14:08:47Z

Random ideas to make this user friendly: instead of asking users to recycle their buffers, why don't we do it automatically for each socket? Can we track down if a buffer is about to be collected, and avoid that?

Technically yes (via the V8 weak apis, similar to what I proposed with promises), although I wonder if the the GC would like that much in this sort of way? Maybe it doesn't care, I'm not sure.

Fishrock123 · 2016-05-23T14:10:26Z

Also the benchmark doesn't actually write / read data from the buffers, which makes it probably not very correct?

vkurchatkin · 2016-05-23T14:25:03Z

@vkurchatkin how should it interop with the Streams API?

probably it should disable stream APIs altogether, so no interop at all.

To be fair, it would be nice to have some integration with tls and also http req/res, since they are basically socket proxies.

mcollina · 2016-05-23T14:33:45Z

if we consider an a.pipe(b) scenario, we can probably enable recycling with a flag, meaning that when write(buf, cb) finishes, we can feed buf back into _read.

I've just chatted a bit with @Fishrock123, and the approach he is using elsewhere uses v8::Persistent internally... which will cause horrible GC performances here.

I'm 👎 on adding a custom method to sockets, we need something that works for most streams out of the box.

indutny · 2016-05-23T19:11:04Z

@Fishrock123

Technically yes (via the V8 weak apis, similar to what I proposed with promises), although I wonder if the the GC would like that much in this sort of way? Maybe it doesn't care, I'm not sure.

Weak handles are rather expensive.

@vkurchatkin

To be fair, it would be nice to have some integration with tls and also http req/res, since they are basically socket proxies.

This is my thoughts too.

@mcollina

if we consider an a.pipe(b) scenario, we can probably enable recycling with a flag, meaning that when write(buf, cb) finishes, we can feed buf back into _read.

It sounds like it still needs this sort of API in order to implement this reliably.

I'm 👎 on adding a custom method to sockets, we need something that works for most streams out of the box.

I didn't really suggest adding it only to Sockets initally, this is what @vkurchatkin 's suggestion was.

mcollina · 2016-05-24T08:18:19Z

if we consider an a.pipe(b) scenario, we can probably enable recycling with a flag, meaning that when write(buf, cb) finishes, we can feed buf back into _read.

It sounds like it still needs this sort of API in order to implement this reliably.

@indutny yes! I'm in favor of the optional _read(n, buf) thing and all the work that has been done here. I'm not happy with read(buf) or readInto(buf), because I do not see a way to support this for pipe and asynchronous processing. We need to get a buffer to be recycled in https://github.com/nodejs/node/blob/master/lib/_stream_readable.js#L355 or https://github.com/nodejs/node/blob/master/lib/_stream_readable.js#L759-L767.

In my ideal world, buffer recycling can be a generic thing, and not just for streams, basically another module that stream implementations use.

Something like:

Buffer.recycler(oldBuf)
Buffer.fromRecycler(42)

We need to avoid leaking data that was previously written to the buffer.

As it seems for the above discussion, this seems an advanced API that only few could use. I would rather shot for something that benefit everyone, and on last resort provide something for advanced users.

trevnorris · 2016-05-24T12:36:26Z

@mcollina

In my ideal world, buffer recycling can be a generic thing, and not just for streams, basically another module that stream implementations use.

I'm not sure what you had in mind, but the cost of this would be far above the gains you'd have of recycling the Buffer. Don't forget that any number of buffers may have a view into your buffer, and your buffer may only be a view itself.

We need to avoid leaking data that was previously written to the buffer.

Not getting what you're saying.

@indutny TBH I couldn't care about streams, and it seems backwards if this couldn't be taken advantage of via process.binding('tcp_wrap'). The rough order of operations I'd see is:

Socket is created
Connection is received, a Buffer is created (64KB to accommodate max size of uv_alloc_cb)
The buffer pointer is used to read in the data, after which you pause (i.e. uv_read_stop()) the data
Pass the Buffer to the user-supplied callback
When user is done, they run something like readMore() which resumes the stream (i.e. uv_read_start()) and then goes back to (3)

I think something similar to the above would be the best API for improving throughput. I've already gotten the outbound writes on my machine to around 52 Gb/s, which matches iperf, but receiving data is maxing out around 30 Gb/s and I'd really like to close that gap.

mcollina · 2016-05-24T13:45:57Z

@trevnorris

We need to avoid leaking data that was previously written to the buffer.

Not getting what you're saying.

Avoiding situations were an attacker can make the application leak some data that was previously written into the socket.

Socket is created

Connection is received, a Buffer is created (64KB to accommodate max size of uv_alloc_cb)

The buffer pointer is used to read in the data, after which you pause (i.e. uv_read_stop()) the data

Pass the Buffer to the user-supplied callback

When user is done, they run something like readMore() which resumes the stream (i.e. uv_read_start()) and then goes back to (3)

The problem with this approach is that the Buffer must be either consumed or copied synchronously or it will be overwritten. This limits the applicability of this optimizations.

trevnorris · 2016-05-24T16:52:04Z

The problem with this approach is that the Buffer must be either consumed or copied synchronously or it will be overwritten. This limits the applicability of this optimizations.

Reusing old Buffer's automatically isn't an option. Trust me, I've tried and there's too much overhead. The other option is to allow the user to pass in a Buffer, or array of Buffers, to consume the stream. Thing is, in order to skip the memcpy() the buffers would need to be at least 64KB in size. This approach would be the most efficient, but we can't guarantee there aren't any slices of the buffer out there in the VM. Also, no matter what we choose, some part of the consumption will be synchronous, but that shouldn't matter b/c there's no point in bringing in more bytes if they can't be processed.

The streams path of allowing n bytes to be read into the Buffer won't remove the need for a memcpy(), which we don't currently have to do b/c the incoming data is directly turned into a Buffer. We're basically trading one performance hit for another, except it seems memcpy() is that much less expensive than creating new Buffers.

@indutny May be worth your time to look at #1671 (comment). You can see that Buffer pressure on GC is alleviated using gc(true). May be worth it to see if using that can match the numbers you're getting here (just for experimentation).

chrisdickinson · 2016-05-24T17:31:41Z

This is similar to the WHATWG Stream's "Bring your own Buffer" reader, which became part of the ReadableByteStreamController spec, IIUC. It might be worth pulling in some folks from that conversation into this one, since the problems & solutions are similar.

For my part, I would caution that:

If we land this (or something like this), we should land it flagged first,
There needs to be a backup route for interop with older streams: if a Buffer is passed to .read for reuse, but the Readable doesn't support reuse, it should act as if .read() was called, & not produce an error.
- (As I'm sure we're all in agreement on, but I want to state out loud for the record): this means we have to tread very carefully, because modifications to streams are wide-reaching & slow to propagate. readable-stream has one of the highest dependent counts on npm, and many of those packages may not immediately receive changes we make to the package due to a variety of reasons (including pinning to streams2, pinning to a version, shrinkwraps, etc.)

mcollina · 2016-05-24T17:47:18Z

Reusing old Buffer's automatically isn't an option. Trust me, I've tried and there's too much overhead. The other option is to allow the user to pass in a Buffer, or array of Buffers, to consume the stream. Thing is, in order to skip the memcpy() the buffers would need to be at least 64KB in size. This approach would be the most efficient, but we can't guarantee there aren't any slices of the buffer out there in the VM. Also, no matter what we choose, some part of the consumption will be synchronous, but that shouldn't matter b/c there's no point in bringing in more bytes if they can't be processed.

What I mean is that if processing is asynchronous, this needs to be a user-driven process, which means as a pipe user, this will not benefit me, because I would have no easy way to feed up an used buffer (even a view) to the top of the chain. I am missing how this fix would work in that scenario.

The streams path of allowing n bytes to be read into the Buffer won't remove the need for a memcpy(), which we don't currently have to do b/c the incoming data is directly turned into a Buffer. We're basically trading one performance hit for another, except it seems memcpy() is that much less expensive than creating new Buffers.

I agree that doing memcpy() is cheaper than creating buffers (or using persistent handles). I'm missing where this memcpy should happen.

trevnorris · 2016-05-24T19:57:12Z

What I mean is that if processing is asynchronous, this needs to be a user-driven process

Totally understand. As not a .pipe user I'd know I need to consume the data immediately so that it can continue receiving data. Honestly I'd think that .pipe theoretically should be able to bypass JS completely depending on where it's coming from/where it's going. And since I don't use .pipe can't give a good opinion if it'd help.

I'm missing where this memcpy should happen.

The incoming char* would be stored in C++ until the user asks for the data. At which time the data would be copied into the user-supplied Buffer. So it would be on the user to manage their own Buffer pool.

mcollina · 2016-05-25T07:56:40Z

What I mean is that if processing is asynchronous, this needs to be a user-driven process

Totally understand. As not a .pipe user I'd know I need to consume the data immediately so that it can continue receiving data. Honestly I'd think that .pipe theoretically should be able to bypass JS completely depending on where it's coming from/where it's going. And since I don't use .pipe can't give a good opinion if it'd help.

pipe is all done in JS-land, and data is processed in an asynchronous manner, as consumption might happen after many event loop runs. I think the best way is to implement this feature in (maybe as a second parameter to read(n, buf), so that it's fully backwards compatible. Then we can try to figure out how to leverage it inside pipe.

Really really 👍 for me.

jasnell · 2016-05-25T16:39:24Z

Overall, I think this discussion is heading in a positive direction but I'm not so keen on adding it as an additional parameter to the existing read() method for all of the API discoverability reasons that have been discussed in other threads when adding a significant new ability to an existing API is discussed. I'd be much happier with a separate readInto(buf, offset, len) method.

{brainstorming on}

One off the wall idea (and I'm just brainstorming out loud here just not sure if this makes sense) would be to explicitly put the buffer pool management into the hands on the users as suggested by @trevnorris (#6923 (comment)). As an alternative to readInto(buf, offset, len), we allow readInto(bufPool) where bufPool is a user-supplied pool of Buffers from which a Buffer is either reused if available or created. When the user knows the Buffer is no longer in use, it's released back into the pool.... e.g.

const myBufPool = new BufferPool(10, 512); // optimize for up to 10 instances, 512 bytes each.
stream.readInto(myBufPool, (err, buf, len) => {
  // len == the amount of data actually written into the buf
  // consume buf
  buf.release(); // explicitly zeroes and releases the buffer back into the pool, also released at gc
});

If we have an efficient means of ref counting the number of views that exist on a pooled buffer, then this buf.release() method would only actually release the buffer back into the pool if it's (a) not a view itself and (b) all views on it have be released.

In terms of how this could potentially be used in a pipe() scenario... again, I'm just brainstorming out loud, but if we did go with this hypothetical BufferPool, the Writable could initialize it's own BufferPool instance such that 'data' events push Buffer instances acquired from the Writable's pool. Whether or not doing so would actually be beneficial, I have no idea at this point ;-) ... but it would be possible at least.

{brainstorming off}

indutny · 2016-05-25T19:25:56Z

Not that we shouldn't brainstorm it, but I'm really not convinced that any pool implementation may outperform malloc/free. Just my 2 cents.

mcollina · 2016-05-26T09:06:47Z

I'm 👍 in adding readInto(buf, offset, length) and _read(n, buf) to Readable.

@jasnell I don't think readInto(pool) is needed. If using a pool give some benefits, we can just add it to https://github.com/nodejs/node/blob/master/lib/_stream_readable.js#L759-L767.

Having a BufferPool thing in core would be awesome (if it gives benefits).

indutny · 2016-06-01T02:00:11Z

Ok, so given the feedback, would it be correct to conclude that everyone here is fine with introducing following:

stream.readInto(buf, offset, length, callback);
stream._read(n, buf);

I guess with such API consumption of buf in _read can't be a MUST requirement, since existing streams are not even aware of it. Is it OK too?

mcollina · 2016-06-01T06:40:53Z

@indutny yes. I propose that both be flagged as "experimental" and possibly not released in node v6.

jasnell · 2016-06-01T15:25:14Z

+1 to experimental and holding off on putting this into v6 for now but this is good stuff.

On the issue of stream._read(n, buf), perhaps the best thing to do would be to separate that out also into a stream._readInto(n, buf) so that we don't have the uncertainty about whether it's supported or not.

indutny · 2016-06-01T15:27:41Z

OK, I'll file a eps then.

indutny · 2016-06-01T16:01:13Z

See: nodejs/node-eps#27

trevnorris · 2016-06-01T22:52:22Z

@indutny by "pool" i simply meant that the user tracks their own Buffer instances and are allowed to pass them back multiple times to be written into. Not that we keep a pool of allocated memory on the native side and grab from that.

experiment: socket.read(buf)

2a6bb4d

nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. net Issues and PRs related to the net subsystem. stream Issues and PRs related to the stream subsystem. labels May 22, 2016

mscdex added the wip Issues and PRs that are still a work in progress. label May 22, 2016

ChALkeR added the performance Issues and PRs related to the performance of Node.js. label May 23, 2016

vkurchatkin mentioned this pull request May 26, 2016

Suggestion: stream.read(buffer, offset, length, callback) #3403

Closed

indutny mentioned this pull request Jun 1, 2016

EP-003: stream.Readable#readInto nodejs/node-eps#27

Closed

indutny closed this Jun 1, 2016

jorangreef mentioned this pull request Jul 21, 2017

Use a buffer pool for fs.ReadStream to boost performance? #14136

Closed

(DO NOT LAND YET) experiment: socket.read(buf) #6923

(DO NOT LAND YET) experiment: socket.read(buf) #6923

Conversation

indutny commented May 22, 2016 • edited Loading

Raw numbers to convince

Rationale

Intention of this PR

indutny commented May 22, 2016

Fishrock123 commented May 22, 2016

vkurchatkin commented May 22, 2016

mafintosh commented May 22, 2016

vkurchatkin commented May 22, 2016

vkurchatkin commented May 22, 2016

mafintosh commented May 22, 2016

indutny commented May 22, 2016

indutny commented May 22, 2016

vkurchatkin commented May 22, 2016

indutny commented May 22, 2016

vkurchatkin commented May 22, 2016

indutny commented May 23, 2016

mcollina commented May 23, 2016

indutny commented May 23, 2016

indutny commented May 23, 2016

mcollina commented May 23, 2016

indutny commented May 23, 2016

Fishrock123 commented May 23, 2016

Fishrock123 commented May 23, 2016

vkurchatkin commented May 23, 2016

mcollina commented May 23, 2016

indutny commented May 23, 2016

mcollina commented May 24, 2016

trevnorris commented May 24, 2016

mcollina commented May 24, 2016

trevnorris commented May 24, 2016

chrisdickinson commented May 24, 2016

mcollina commented May 24, 2016

trevnorris commented May 24, 2016

mcollina commented May 25, 2016

jasnell commented May 25, 2016 • edited Loading

indutny commented May 25, 2016

mcollina commented May 26, 2016

indutny commented Jun 1, 2016

mcollina commented Jun 1, 2016

jasnell commented Jun 1, 2016

indutny commented Jun 1, 2016

indutny commented Jun 1, 2016

trevnorris commented Jun 1, 2016

indutny commented May 22, 2016 •

edited

Loading

jasnell commented May 25, 2016 •

edited

Loading