Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: allow reading data into a static buffer #25436

Merged
merged 1 commit into from
Aug 23, 2019

Conversation

mscdex
Copy link
Contributor

@mscdex mscdex commented Jan 10, 2019

This is more or less the changes I described in nodejs/node-eps#27 with some updates since the C++ side had changed a fair amount. I thought I would put this out there to see if there was still interest in merging a feature like this (even if it's not this PR as-is).

This implementation is opt-in, only for net (although it could obviously be adapted for other places where dynamic buffers are used), and it goes the simple and easy route and completely bypasses the data streaming feature of sockets (although 'end' and other such non-data-related events are still emitted as usual).

Having this kind of power is obviously not for all use cases, but for network protocol implementations it can make things a lot faster because the connection data is typically parsed synchronously and any raw data that needs to be passed on to end users could simply be copied.

Here are some example benchmark results (recvbuflen=0 indicates old/current behavior of allocating a new buffer for each chunk read):

net/net-s2c.js dur=5 recvbuflen=0 type="buf" sendchunklen=256: 1.071279628253138
net/net-s2c.js dur=5 recvbuflen=65536 type="buf" sendchunklen=256: 1.1356590323233846
net/net-s2c.js dur=5 recvbuflen=1048576 type="buf" sendchunklen=256: 1.129328387002154
net/net-s2c.js dur=5 recvbuflen=0 type="buf" sendchunklen=32768: 19.848600557880395
net/net-s2c.js dur=5 recvbuflen=65536 type="buf" sendchunklen=32768: 28.77196840127128
net/net-s2c.js dur=5 recvbuflen=1048576 type="buf" sendchunklen=32768: 28.674568832052028
net/net-s2c.js dur=5 recvbuflen=0 type="buf" sendchunklen=131072: 23.3627596199997
net/net-s2c.js dur=5 recvbuflen=65536 type="buf" sendchunklen=131072: 36.92736185280725
net/net-s2c.js dur=5 recvbuflen=1048576 type="buf" sendchunklen=131072: 39.21447761809467
net/net-s2c.js dur=5 recvbuflen=0 type="buf" sendchunklen=16777216: 21.913559134252168
net/net-s2c.js dur=5 recvbuflen=65536 type="buf" sendchunklen=16777216: 35.96126656747849
net/net-s2c.js dur=5 recvbuflen=1048576 type="buf" sendchunklen=16777216: 36.709655230358784

I did not add documentation or tests yet because API is not set in stone.

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • commit message follows commit guidelines

@mscdex mscdex added net Issues and PRs related to the net subsystem. performance Issues and PRs related to the performance of Node.js. labels Jan 10, 2019
@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. labels Jan 10, 2019
@mscdex mscdex changed the title lib,src: allow reading data into a static buffer net: allow reading data into a static buffer Jan 10, 2019
@mscdex
Copy link
Contributor Author

mscdex commented Jan 10, 2019

@addaleax
Copy link
Member

I like the feature idea. Regarding the implementation:

  • I don’t think passing the buffer along the constructors is a good idea, given the complexity that it introduces to our code. I know it’s an extra call to C++, but adding a single method would probably be much cleaner.
  • It’s also kind of odd to stream_buf as a member of StreamBase, and add if/else cases everywhere that handle the cases of having/not having a custom buffer each time – it would, imo, be cleaner to create a class similar to EmitToJSStreamListener (but with custom a OnStreamAlloc method), and attach an instance of it to the stream when calling the C++ method.

@mscdex
Copy link
Contributor Author

mscdex commented Jan 12, 2019

I'm open to suggestions (especially specific code changes), this is very much just a proof of concept to show the potential, significant performance gains.

@addaleax
Copy link
Member

@mscdex Since you’re asking for specific code changes: addaleax@d8b783f is about what I’d have in mind (applied on top of this PR). It also brings the combined diff nicely down from +225/−66 to +139/−45.

@mscdex
Copy link
Contributor Author

mscdex commented Jan 13, 2019

@addaleax Merged, thanks.

@mscdex
Copy link
Contributor Author

mscdex commented Jan 21, 2019

thoughts @nodejs/collaborators ?

@addaleax
Copy link
Member

addaleax commented Jan 21, 2019

@mscdex The code LGTM here, but I’d also like other people to weigh in more. I assume you don’t want to do unnecessary work in case this doesn’t get merged, but I also guess for a lot of people docs and tests help; so, tl;dr on the API here:


This PR adds an option to net.Socket instances (easy to extend to e.g. HTTP/2) where instead of using the traditional stream.Readable interface, a user can provide a fixed buffer and a callback which is called when data has been read from the socket~~, with no backpressure handling or similar features~~.

Usage looks something like this:

const socket = net.connect({
  host: 'example.com',
  port: 80,
  onread: {
    buffer: Buffer.allocUnsafe(65536),
    callback(bytesRead, buffer) {
      // Do something with the first `bytesRead` bytes of `buffer`.
      // The `buffer` will always be the same object, so all data
      // needs to be handled synchronously here.
    }
  }

@sam-github
Copy link
Contributor

There has been some talk of creating a node API layer that is closer to uv, so more efficient code can be written on top of it. Is that where this is going?

cf http://docs.libuv.org/en/v1.x/stream.html#c.uv_read_start

The onread use of a buffer seems to make unnecessary the uv alloc/read callback pair, that seems reasonable to me, it reduces roundtrip time in that only one C++ to js callback is needed. I guess the prices is the need for sync data handling. Maybe that's OK, as long as the data is immediately passed to a protocol parser, or unzipper, or something of the sort, though it wouldn't work for a simple pipe of data to fs.write(), for example (AFAICT).

re: lack of backpressure support, at the uv layer the caller has to explicitly start/stop data flow to exert backpressure. Isn't there a need for a readStart/Stop API as uv has? Or does the handle have that already? There is the Readable.pause() method, but I think the idea here is to bypass the streams API.

@mscdex
Copy link
Contributor Author

mscdex commented Jan 21, 2019

re: lack of backpressure support, at the uv layer the caller has to explicitly start/stop data flow to exert backpressure. Isn't there a need for a readStart/Stop API as uv has? Or does the handle have that already? There is the Readable.pause() method, but I think the idea here is to bypass the streams API.

The onread callback can explicitly return false to stop reading from the socket, so there is still backpressure support, it's just at the OS/C++ layer and not at the JS streams layer. So if you return false from the callback, you can call socket.resume() or socket.read() to resume reading from the socket.

@benjamingr
Copy link
Member

I like the idea, we had quite a long discussion about it in denoland/deno#387

@mscdex
Copy link
Contributor Author

mscdex commented Jan 28, 2019

ping @nodejs/collaborators ?

@sam-github
Copy link
Contributor

@mscdex Unanswered questions in #25436 (comment)

@mscdex
Copy link
Contributor Author

mscdex commented Jan 28, 2019

@sam-github I answered how starting and stopping is handled in this PR in #25436 (comment). Were you looking for some other information?

@sam-github
Copy link
Contributor

My apologies @mscdex, that comment exactly answers my question, I don't know how I missed it.

@jasnell
Copy link
Member

jasnell commented Jan 29, 2019

Ping @nodejs/streams to review ... in particular, @mcollina, I'd love your thoughts on this.

Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you planning to add this to other areas of core? Specifically, are you planning to add it to TLS?

What will happen if pause is called within callback?

I think this API could be a bit limiting to what it could be achieved. What I would like to see is more a situation where the same buffer moves between the producer and the consumer, like this:

let oldBuffer = null
const socket = net.connect({
  host: 'example.com',
  port: 80,
  onread: {
    buffer() { 
     return oldBuffer || Buffer.allocUnsafe(65536)
    },
    callback(bytesRead, buffer) {
      destination.write(buffer, (err) => {
        oldBuffer = buffer
      })
    }
  }

lib/net.js Show resolved Hide resolved
@mscdex
Copy link
Contributor Author

mscdex commented Feb 1, 2019

Are you planning to add this to other areas of core? Specifically, are you planning to add it to TLS?

Probably.

What will happen if pause is called within callback?

With the PR as it currently is, node will not read from the socket.

I think this API could be a bit limiting to what it could be achieved. What I would like to see is more a situation where the same buffer moves between the producer and the consumer, like this:

I don't understand what exactly is expected with the suggested implementation. Is buffer() being called from C++ land for every socket read? Is it only being called once inside the net.Socket() constructor? If it's the former, then what would you suggest for avoiding any performance hit in doing so?

Also, due to the asynchronous nature of destination.write() and the synchronous nature of callback(), when oldBuffer is set the same Buffer will be written to destination and oldBuffer's contents could change multiple times while that write is occurring, which goes against what people expect of streams, unless you meant something else by the example?

@mcollina
Copy link
Member

mcollina commented Feb 2, 2019

I don't understand what exactly is expected with the suggested implementation. Is buffer() being called from C++ land for every socket read? Is it only being called once inside the net.Socket() constructor? If it's the former, then what would you suggest for avoiding any performance hit in doing so?

The key idea is to provide a mechanism to reuse multiple buffers, not just one. Whenever there is a need for a read, C++ can call buffer to get one. We can maintain a list of allocated buffers, that will flow back into the pool after being read. This mechanism is driven by the user. This should answer your second question as well.
I'm suggesting an alternative implementation that could broaden the scope, I'm not referring specifically to this specific implementation.

@mscdex
Copy link
Contributor Author

mscdex commented Feb 4, 2019

Whenever there is a need for a read, C++ can call buffer to get one.

That would probably cause a noticeable performance hit, unless you were willing to compromise a bit and instead have the function that the C++ side calls when the socket is read from return the next Buffer to use (by calling buffer()). That would avoid any extra round trips between JS and C++ but the semantics would be different (asking for Buffer for next read vs asking for Buffer for current read) if that matters.

@mcollina
Copy link
Member

mcollina commented Feb 4, 2019

That would probably cause a noticeable performance hit, unless you were willing to compromise a bit and instead have the function that the C++ side calls when the socket is read from return the next Buffer to use (by calling buffer()). That would avoid any extra round trips between JS and C++ but the semantics would be different (asking for Buffer for next read vs asking for Buffer for current read) if that matters.

+1, that's what I meant, sorry for making it complicated.

BridgeAR pushed a commit that referenced this pull request Sep 3, 2019
Co-Authored-By: Anna Henningsen <anna@addaleax.net>

PR-URL: #25436
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
@BridgeAR BridgeAR mentioned this pull request Sep 3, 2019
BridgeAR added a commit that referenced this pull request Sep 3, 2019
Notable changes:

* deps:
  * Update npm to 6.10.3 (isaacs)
    #29023
* fs:
  * Add recursive option to rmdir() (cjihrig)
    #29168
  * Allow passing true to emitClose option (Giorgos Ntemiris)
    #29212
  * Add \*timeNs properties to BigInt Stats objects (Joyee Cheung)
    #21387
* net:
  * Allow reading data into a static buffer (Brian White)
    #25436

PR-URL: #29429
BridgeAR added a commit that referenced this pull request Sep 4, 2019
Notable changes:

* deps:
  * Update npm to 6.10.3 (isaacs)
    #29023
* fs:
  * Add recursive option to rmdir() (cjihrig)
    #29168
  * Allow passing true to emitClose option (Giorgos Ntemiris)
    #29212
  * Add \*timeNs properties to BigInt Stats objects (Joyee Cheung)
    #21387
* net:
  * Allow reading data into a static buffer (Brian White)
    #25436

PR-URL: #29429
BridgeAR added a commit that referenced this pull request Sep 4, 2019
Notable changes:

* deps:
  * Update npm to 6.10.3 (isaacs)
    #29023
* fs:
  * Add recursive option to rmdir() (cjihrig)
    #29168
  * Allow passing true to emitClose option (Giorgos Ntemiris)
    #29212
  * Add \*timeNs properties to BigInt Stats objects (Joyee Cheung)
    #21387
* net:
  * Allow reading data into a static buffer (Brian White)
    #25436

PR-URL: #29429
@jorangreef
Copy link
Contributor

Thanks @mscdex this is excellent. Will this also work for constructing net servers or is it limited to net clients?

@mscdex
Copy link
Contributor Author

mscdex commented Sep 10, 2019

@jorangreef Only clients at the moment.

puzpuzpuz added a commit to puzpuzpuz/node that referenced this pull request Oct 22, 2020
puzpuzpuz added a commit that referenced this pull request Oct 28, 2020
Refs: #25436

PR-URL: #35753
Refs: #25436
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Minwoo Jung <nodecorelab@gmail.com>
targos pushed a commit that referenced this pull request Nov 3, 2020
Refs: #25436

PR-URL: #35753
Refs: #25436
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Minwoo Jung <nodecorelab@gmail.com>
targos pushed a commit that referenced this pull request Aug 4, 2021
Refs: #25436

PR-URL: #35753
Refs: #25436
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Minwoo Jung <nodecorelab@gmail.com>
BethGriggs pushed a commit that referenced this pull request Aug 12, 2021
Refs: #25436

PR-URL: #35753
Refs: #25436
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Minwoo Jung <nodecorelab@gmail.com>
MylesBorins pushed a commit that referenced this pull request Aug 31, 2021
Refs: #25436

PR-URL: #35753
Refs: #25436
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Minwoo Jung <nodecorelab@gmail.com>
foxxyz pushed a commit to foxxyz/node that referenced this pull request Oct 18, 2021
Refs: nodejs#25436

PR-URL: nodejs#35753
Refs: nodejs#25436
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Minwoo Jung <nodecorelab@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. net Issues and PRs related to the net subsystem. performance Issues and PRs related to the performance of Node.js. semver-minor PRs that contain new features and should be released in the next minor version.
Projects
None yet
Development

Successfully merging this pull request may close these issues.