-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(DO NOT LAND YET) experiment: socket.read(buf) #6923
Conversation
@indutny What's the benchmark you used for this? |
@indutny I'm not sure how it's possible to make it work like you describe. |
Whats the downside of implementing this as an option instead? Like var stream = net.connect({
reuseBuffers: true
})
stream.on('data', function (data) {
// data is always a slice of the same buffer
}) That would make it easier to use with .pipe |
@mafintosh there is no much value in this API, because you would have to always copy buffer, if you want to actually do something with it. Consumer needs a way to release buffer |
Ideally, API should be: socket.read(buffer, (err) => {}); |
@vkurchatkin you could pause the stream if you want to use the buffer async. or use the classic .read() api for that matter |
https://gist.github.com/indutny/1a004ef317fe62d923a87c084b7fd731
It quite works in this PR. This PR is just an experiment though, it doesn't meant to be what the API will look like in the future. The way it works for We can certainly make this API more explicit, and make it copy the data from internal buffers if
While I agree that this API can be rather uncomfortable to use and perhaps is a bit limiting, the point about There are many ways it could be useful without copying - streaming parsers, piping stuff to other socket (it copies input internally anyway), etc. |
I don't think that it quite fits into |
don't see the point, I think the idea is to read directly into provided buffer.
This was a comment about I think what is required for achieving maximum performance is way for user to provide a buffer and a guarantee that buffer doesn't leak to other places, so no var message = pool.alloc(10);
socket.readInto(message, 0, 1, (err, n) => {
if (n < 1) // retry
var size = message[0];
if (size > 10) {
pool.release(message);
message = new Buffer(size);
message[0] = size;
}
// read the rest
});
|
@vkurchatkin good point about |
Basically, yes, I don't think it's needed elsewhere. |
@vkurchatkin how should it interop with the Streams API? |
Great work @indutny, but I am not sure how many users will benefit from this as most of the interaction with streams is through pipes. Probably HTTP and TLS in core might benefit a lot, and it might be good to have this as part of those, but I do not see how this can work for pipe in practice. Random ideas to make this user friendly: instead of asking users to recycle their buffers, why don't we do it automatically for each socket? Can we track down if a buffer is about to be collected, and avoid that? |
@mcollina HTTP and TLS already receive data directly from C++ with minimal possible allocations. Though, with these APIs we may migrate from |
I believe this won't help... Though, I haven't tried it. It looks like the costs of tracking are very likely going to be bigger than the costs of the same tracking done in C++ (V8 and node). |
Highly probable, but maybe slightly faster because of specialization. On the other end, this might reduce the number of objects that needs to be collected, thus reducing gc time. Plus all that time spent in malloc/free will be gone. It might turn out to be faster anyway (albeit not 3x fast, I'll be happy to be x1.5 fast) |
@mcollina it won't remove the costs of |
Technically yes (via the V8 weak apis, similar to what I proposed with promises), although I wonder if the the GC would like that much in this sort of way? Maybe it doesn't care, I'm not sure. |
Also the benchmark doesn't actually write / read data from the buffers, which makes it probably not very correct? |
probably it should disable stream APIs altogether, so no interop at all. To be fair, it would be nice to have some integration with tls and also http req/res, since they are basically socket proxies. |
if we consider an I've just chatted a bit with @Fishrock123, and the approach he is using elsewhere uses I'm 👎 on adding a custom method to sockets, we need something that works for most streams out of the box. |
Weak handles are rather expensive.
This is my thoughts too.
It sounds like it still needs this sort of API in order to implement this reliably.
I didn't really suggest adding it only to Sockets initally, this is what @vkurchatkin 's suggestion was. |
@indutny yes! I'm in favor of the optional In my ideal world, buffer recycling can be a generic thing, and not just for streams, basically another module that stream implementations use. Something like: Buffer.recycler(oldBuf)
Buffer.fromRecycler(42) We need to avoid leaking data that was previously written to the buffer. As it seems for the above discussion, this seems an advanced API that only few could use. I would rather shot for something that benefit everyone, and on last resort provide something for advanced users. |
I'm not sure what you had in mind, but the cost of this would be far above the gains you'd have of recycling the Buffer. Don't forget that any number of buffers may have a view into your buffer, and your buffer may only be a view itself.
Not getting what you're saying. @indutny TBH I couldn't care about streams, and it seems backwards if this couldn't be taken advantage of via
I think something similar to the above would be the best API for improving throughput. I've already gotten the outbound writes on my machine to around 52 Gb/s, which matches |
Avoiding situations were an attacker can make the application leak some data that was previously written into the socket.
The problem with this approach is that the Buffer must be either consumed or copied synchronously or it will be overwritten. This limits the applicability of this optimizations. |
Reusing old Buffer's automatically isn't an option. Trust me, I've tried and there's too much overhead. The other option is to allow the user to pass in a Buffer, or array of Buffers, to consume the stream. Thing is, in order to skip the The streams path of allowing @indutny May be worth your time to look at #1671 (comment). You can see that Buffer pressure on GC is alleviated using |
This is similar to the WHATWG Stream's "Bring your own Buffer" reader, which became part of the For my part, I would caution that:
|
What I mean is that if processing is asynchronous, this needs to be a user-driven process, which means as a
I agree that doing memcpy() is cheaper than creating buffers (or using persistent handles). I'm missing where this memcpy should happen. |
Totally understand. As not a
The incoming |
Really really 👍 for me. |
Overall, I think this discussion is heading in a positive direction but I'm not so keen on adding it as an additional parameter to the existing {brainstorming on} One off the wall idea (and I'm just brainstorming out loud here just not sure if this makes sense) would be to explicitly put the buffer pool management into the hands on the users as suggested by @trevnorris (#6923 (comment)). As an alternative to const myBufPool = new BufferPool(10, 512); // optimize for up to 10 instances, 512 bytes each.
stream.readInto(myBufPool, (err, buf, len) => {
// len == the amount of data actually written into the buf
// consume buf
buf.release(); // explicitly zeroes and releases the buffer back into the pool, also released at gc
}); If we have an efficient means of ref counting the number of views that exist on a pooled buffer, then this In terms of how this could potentially be used in a {brainstorming off} |
Not that we shouldn't brainstorm it, but I'm really not convinced that any pool implementation may outperform |
I'm 👍 in adding @jasnell I don't think Having a |
Ok, so given the feedback, would it be correct to conclude that everyone here is fine with introducing following: stream.readInto(buf, offset, length, callback);
stream._read(n, buf); I guess with such API consumption of |
@indutny yes. I propose that both be flagged as "experimental" and possibly not released in node v6. |
+1 to experimental and holding off on putting this into v6 for now but this is good stuff. On the issue of |
OK, I'll file a eps then. |
See: nodejs/node-eps#27 |
@indutny by "pool" i simply meant that the user tracks their own Buffer instances and are allowed to pass them back multiple times to be written into. Not that we keep a pool of allocated memory on the native side and grab from that. |
+1's will be removed, please use GitHub reactions (- Fishrock123)
Raw numbers to convince
Throughput before this patch:
Throughput with this patch:
Benchmarks code
Rationale
Lots of time is spent in
malloc
/free
calls and V8's GC to manageBuffer
instances when reading data from the socket. However, many use cases could be rewritten in such way that theBuffer
will be allocated only once and reused. Current API doesn't allow this, but with a slight modification tostream.Readable
it could:In case of this
.read(buf)
call, one more argument will be passed to_read
function:Intention of this PR
This PR contains experimental implementation of this feature, and I would like to ask @nodejs/collaborators, @nodejs/streams, and @nodejs/ctc to weigh in and provide some feedback on it.
Thanks!
If we will reach some preliminary consensus on this here, I'll move the discussion over to https://github.com/nodejs/node-eps