[WIP] Rough new C++ streams API #16414

jasnell · 2017-10-23T16:28:47Z

[DO NOT MERGE... This is a work in progress concept]

Consider everything in this mutable at this point... this is just a concept up for discussion

@mcollina @Fishrock123 @trevnorris @addaleax

This is an extremely rough WIP C/C++ level concept for the "new" pull-stream API we discussed.

@addaleax ... for context, this was something that @mcollina , @Fishrock123 , @trevnorris and I began brainstorming over dinner one night in Vancouver.

Look at the doc/io_wip.md to get a sense of how it works. This is extremely rough at this point but should cover most of the use cases. There is still a ton that would need to be worked through on this.

Please ask questions in the comments here and be as brutal as you'd like. If we want to move forward on something like this we need to make sure we get it right.

addaleax · 2017-10-23T16:39:29Z

One question upfront: This looks like it’s a mix between a C and a C++ API, which decreases readability a lot … can we pick one? 😄 (I’d prefer C++ – if we want something like public bindings for N-API, we can always do that the way N-API currently works.)

jasnell · 2017-10-23T16:41:32Z

can we pick one?

awww ;-) ... yeah, we certainly can. I went with this simply because it was easy to hack together and I didn't want to bikeshed on stuff too much just yet. But yes, it would be better to go with one or the other.

addaleax · 2017-10-23T17:05:57Z

src/node_io.h

+  IO_ERROR_EOF = -768,
+  // Source will not copy data into caller provided buffer
+  IO_ERROR_MUST_NO_COPY = -1024,
+};


Are errors distinguished any further than this? I think you’d need at least a way to forward all errno errors…

So far, not yet. This is just a rough error reporting mechanism right now, it would need to be significantly improved :-)

addaleax · 2017-10-23T17:07:19Z

doc/io_wip.md

+io_pull_set_pull_cb(&pull, AfterPull);
+
+int status = mySource->Pull(&pull, &buffer.len, &buffer, 1,
+                            IO_PULL_FLAG_MUST_COPY);


Is the idea here that pull is copied into memory managed by the source object?

pull is just an async handle, really. Used to maintain context in the callback.

The IO_PULL_FLAG_MUST_COPY tells the Source that it must copy it's data into the provided buffer rather than providing pointers to it's own buffers in the callback.

addaleax · 2017-10-23T17:09:20Z

doc/io_wip.md

+
+### Example 5: Binding a Source and Sink
+
+Binding is roughly the equivalent of `pipe()` in Readable, with the notable


I think I’d prefer to call this ‘piping’ then?

And just to be clear, I understand correctly that only one binding/piping operation is happening for any given source, right?

I think I'd prefer to call this 'piping' then?

I considered that, but given that it is pull based rather than push based, I wanted to avoid accidentally conflating the two.

... only one binding/piping operation is happening for any given source, right?

That's the idea but we need to decide that for certain. When we discussed this over dinner in Vancouver, the idea was that data flow would be either One-Source-to-One-Sink, or One-Sink-Many-Sources, in order to keep things as simple as possible.

addaleax · 2017-10-23T17:16:32Z

src/node_io.h

+  IO_PULL_FLAG_NONE = 0x0,
+  // Callback must be invoked synchronously (also sets the
+  // IO_PULL_FLAG_MUST_COPY and IO_PULL_FLAG_STRICT_LENGTH flags)
+  IO_PULL_FLAG_SYNC = 0xD,


Why does this imply IO_PULL_FLAG_MUST_COPY?

Because the only way for the Source to provide pointers to it's own buffers is via the callback, which is not being used here. This is telling the Source: fill these buffers now and return immediately when you do.

So maybe Pull could yield its own buffers synchronously, if there are any? This sounds like something that would otherwise create unnecessary memcpy()s otherwise

That's actually the intent. The two approaches are:

caller allocates the buffer, calls pull, Source memcpy's data into those

caller calls pull, Source provides pointers to it's own buffers without memcpy

Currently, for option 2, the Source must use the callback to deliver it's own buffer pointers.

We could change this, however, so that Pull can yield buffers back to allow Source to provide it's own pointers without requiring the callback.

Btw; one of my (and others’) pain points with streams is that they delay a lot of events, whether necessary or not.

It might be nice if we could drop this requirement, and calls would be allowed to call their callback synchronously or asynchronously, whichever fits more nicely?

Yep, I agree. With the callbacks here, that's exactly what I'm allowing for unless the code calling Pull explicitly says that it needs sync processing. Unless that flag is set, the Source is largely free to invoke the callback whenever it wants, sync or async, once the data is available. Unless the code calling Pull sets that sync flag, it must not make any assumptions at all about when the callback is going to be invoked.

(hopefully I didn't misunderstand your comment there tho :-) ...)

addaleax · 2017-10-23T17:17:17Z

src/node_io.h

+  // Peek at what data is available, do not actually read.
+  IO_PULL_FLAG_PEEK = 0x2,
+  // Puller considers the length given to be strict,
+  // Source must not overflow


overflow? i.e. return more data than requested?

Yes. There are a couple of options here:

The caller allocates it's own io_buf_t with a specific size, tells the Source that it must fill those (this implies that overflowing is not allowed)

The caller allocates it's own io_buf_t with a specific size, tells the Source to provide data up to length and no more. Source may choose to fill the provided io_buf_t, or provide pointers to it's own allocated buffers. The caller may or may not allow the Source provided buffers to be larger than what it asked for.

mscdex · 2017-10-23T17:43:26Z

Does this increase the likelihood/possibility of a pull-based JS stream API or is this mainly benefiting C++ node core internals only?

jasnell · 2017-10-23T17:47:39Z

@mscdex ... yes, the intent would be to extend a pull-based JS stream API also. I believe @Fishrock123 has been exploring that. The idea would be to make this and that sync up :-)

Another goal of this is to allow us to consistently bind at the native layer when possible... that is, if we have a JS Source and a JS Sink, both of which are backed by C/C++ Sink an C/C++ Source, we can do the Bind at the native layer to more efficiently transfer the data without pulling it into the JS layer... much like we do now with the TLS and http2 implementations writing to their underlying sockets.

addaleax · 2017-10-23T17:50:00Z

has been exploring that. The idea would be to make this and that sync up :-)

Fwiw, we have that; .on('readable') + .read() is pretty much a pull stream.

jasnell · 2017-10-23T17:57:05Z

Fwiw, we have that

Yes, I should have clarified: the idea would be a separate implementation that does not carry all of the existing Streams 1, 2, and 3 cruft along with it. It would be a separate API that is intentionally not backwards compatible with the existing stuff. Someone could write code that bridges the two, but that would be secondary.

mscdex · 2017-10-23T22:10:15Z

Fwiw, we have that; .on('readable') + .read() is pretty much a pull stream.

That's not quite the same thing as far as this API goes (from what I can tell). This pull-based API lets you supply a Buffer to use for writing the data, whereas 'readable' and read() is returning a new Buffer every time, which can be expensive (for parsers and other modules that consume the entirety of the data received).

jasnell · 2017-10-23T22:25:03Z

That's not quite the same thing as far as this API goes ...

Yep. This API actually supports both approaches. The caller can supply a buffer(s) to fill (and require that the Source use those), or may allow the Source to provide it's own buffer.

Fishrock123 · 2017-10-23T22:26:33Z

src/node_io.h

+enum io_pull_status {
+  IO_PULL_STATUS_OK = 0x0,
+  // After this call, wait before asking again
+  IO_PULL_STATUS_WAIT = 0x1,


Isn't it better to just wait in the source and return when data is available?

If the sink needed to I suppose it could peek or unbind on a timeout, but it seems better to wait in the source for data if there will be some rather than polling?

Isn't it better to just wait in the source and return when data is available?

Not necessarily. In the http/2 implementation, for instance, the callback from nghttp2 is sync. If data is not currently available, the stream is put into a deferred state until nghttp2 is explicitly told that data is now available. That is precisely the kind of use case this is meant to support.

The way Pull is defined, we have a simple set of primitives that allow the following scenarios, which can be used based on the needs of the Source and Sink:

Sync Pull ... Source waits to return until data is available (Blocking-Pull)

Async Pull ... Source waits to call callback until data is available (Nonblocking-Pull)

Sync Pull With Deferred Polling ... Source returns immediately saying data is not yet available, tells caller to check back later

Sync Pull With Deferred Signal ... Source returns immediately saying data is not yet available, will let caller know later when data is available (see Sink::Signal)

Async Pull With Deferred Polling (like above but using callback)

Async Pull With Deferred Signal (like above but using callback)

the stream is put into a deferred state until nghttp2 is explicitly told that data is now available. That is precisely the kind of use case this is meant to support.

So what you're saying is "4. Sync Pull With Deferred Signal"? i.e. a notification push to the sink? That would make sense but I was under the assumption we were avoiding that flow more entirely. I'll try it in the JS impl when I can.

Fishrock123 · 2017-10-23T22:35:54Z

Fwiw, we have that; .on('readable') + .read() is pretty much a pull stream.

Right, in a way. (Just way more complex than necessary.) And that's kinda a good thing. It means we should, with high certainty, be able to shim the ends of these "streams" to Streams3 - allowing for compatibility with the existing Streams3 ecosystem.

Fishrock123 · 2017-10-23T23:34:12Z

Aight, my work thus far is here: https://github.com/Fishrock123/bob

Rough, but works for fs. I'll update it to share some naming conventions with this soon-ish. I think some modifications are needed to make buffered transforms work correctly, going to try to do a zlib one soon.

"Isn't that just pull streams?" — in concept, yes. Error flow and buffer allocation is different than pull-stream though and should better suit our needs, I think.

@jasnell How does the bind structure work here to ferry re-calls to Pull()? Does it need to have it's own functions inbetween? Mine does binding more directly form the sink right now, allowing direct calls from sinks and sources.

jasnell · 2017-10-24T03:17:17Z

Aight, my work thus far is here ...

Awesome :-) ... I'll take a look in detail tomorrow.

How does the bind structure work here to ferry re-calls to Pull

TBD at this point. I'm hoping to work up an implementation of that by end of week.

jasnell · 2017-10-25T19:23:26Z

FYI: https://github.com/nodejs/node/projects/7

mcollina · 2017-10-27T17:39:27Z

I would prefer to land this into a separate Node repo, and prove a point by porting some bits of the previous API.

jasnell · 2017-10-27T18:41:20Z

+1 on that @mcollina ... should I do like I did with the http2 repo and create a new nodejs/new-streams repo?

mcollina · 2017-10-27T18:43:02Z

Yes go ahead!

jasnell · 2017-10-27T18:46:51Z

I've got a few higher priorities items to clear out but will do that by tuesday of next week.

Fishrock123 · 2017-11-24T21:21:04Z

src/node_io.h

+  virtual int Pull(io_pull_t* handle,
+                   size_t* length = 0,
+                   io_buf_t* bufs = nullptr,
+                   size_t count = 0,


What is this?

Fishrock123 · 2017-11-24T21:21:53Z

src/node_io.h

+  virtual ~Source() {}
+
+  virtual int Pull(io_pull_t* handle,
+                   size_t* length = 0,


Is this length written to? (Guaranteed to be?)

What does the passed-in length do? Set the maximum?

Fishrock123 · 2017-11-24T21:33:33Z

It seems like the two things my JS implementation is not currently capable of covering is peek and sync-required read. Maybe I'm missing something else?

Note that my impl currently requires callbacks and requires a buffer (either passed form the sink or directly from the source) be passed to the callback, and that the sink always uses that one.

gishmel · 2017-12-01T15:29:56Z

I would like to help with this effort. Is there anything that is low hanging fruit to do in order to get involved in this effort? Thanks for all your help and hard work on node.js core and I look forward to working with you all more in the future.

Updated naming to be more like James' C++ impl: nodejs/node#16414 Fixed some bugs. stdout-source works again.

BridgeAR · 2018-02-10T16:11:31Z

What is the status here? Any further progress?

jasnell · 2018-02-10T16:42:28Z

Progress is being made. Please do not close this.

…

On Feb 10, 2018 08:11, "Ruben Bridgewater" ***@***.***> wrote: What is the status here? Any further progress? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#16414 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAa2eQ4m-SKCynuX21OO7S-QJ_UXPK9Eks5tTb-6gaJpZM4QDINt> .

[WIP] Rough low level core pull streams API

dff1706

nodejs-github-bot added build Issues and PRs related to build files or the CI. c++ Issues and PRs that require attention from people who are familiar with C++. labels Oct 23, 2017

jasnell added the wip Issues and PRs that are still a work in progress. label Oct 23, 2017

addaleax changed the title ~~[WIP] Rough new streams API~~ [WIP] Rough new C++ streams API Oct 23, 2017

typos

7e82c99

addaleax reviewed Oct 23, 2017

View reviewed changes

Fishrock123 reviewed Oct 23, 2017

View reviewed changes

Fishrock123 mentioned this pull request Oct 24, 2017

Identifying core initiatives + Find Champions nodejs/TSC#390

Closed

Fishrock123 reviewed Nov 24, 2017

View reviewed changes

Fishrock123 added a commit to Fishrock123/bob that referenced this pull request Dec 1, 2017

Update naming, fix bugs.

1ffe4be

Updated naming to be more like James' C++ impl: nodejs/node#16414 Fixed some bugs. stdout-source works again.

MylesBorins force-pushed the master branch from b7405ab to 7f086dd Compare December 8, 2017 16:37

Fishrock123 mentioned this pull request Jan 15, 2018

Progress 15/1/2018 (2018 week 3) Fishrock123/bob#2

Closed

maclover7 force-pushed the master branch from bb5575a to 993b716 Compare January 26, 2018 22:02

cjihrig force-pushed the master branch from 993b716 to 082f952 Compare January 26, 2018 22:36

Fishrock123 mentioned this pull request Feb 16, 2018

Progress 16/2/2018 (2018 week 7) Fishrock123/bob#5

Closed

jasnell closed this Feb 16, 2018


		### Example 5: Binding a Source and Sink

		Binding is roughly the equivalent of `pipe()` in Readable, with the notable

[WIP] Rough new C++ streams API #16414

[WIP] Rough new C++ streams API #16414

Conversation

jasnell commented Oct 23, 2017 • edited Loading

addaleax commented Oct 23, 2017

jasnell commented Oct 23, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jasnell Oct 23, 2017 • edited Loading

Choose a reason for hiding this comment

mscdex commented Oct 23, 2017

jasnell commented Oct 23, 2017

addaleax commented Oct 23, 2017

jasnell commented Oct 23, 2017

mscdex commented Oct 23, 2017

jasnell commented Oct 23, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Fishrock123 commented Oct 23, 2017

Fishrock123 commented Oct 23, 2017

jasnell commented Oct 24, 2017

jasnell commented Oct 25, 2017

mcollina commented Oct 27, 2017

jasnell commented Oct 27, 2017

mcollina commented Oct 27, 2017

jasnell commented Oct 27, 2017

Choose a reason for hiding this comment

Fishrock123 Nov 24, 2017 • edited Loading

Choose a reason for hiding this comment

Fishrock123 commented Nov 24, 2017

gishmel commented Dec 1, 2017

BridgeAR commented Feb 10, 2018

jasnell commented Feb 10, 2018 via email

jasnell commented Oct 23, 2017 •

edited

Loading

jasnell Oct 23, 2017 •

edited

Loading

Fishrock123 Nov 24, 2017 •

edited

Loading