-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Rough new C++ streams API #16414
Conversation
One question upfront: This looks like it’s a mix between a C and a C++ API, which decreases readability a lot … can we pick one? 😄 (I’d prefer C++ – if we want something like public bindings for N-API, we can always do that the way N-API currently works.) |
awww ;-) ... yeah, we certainly can. I went with this simply because it was easy to hack together and I didn't want to bikeshed on stuff too much just yet. But yes, it would be better to go with one or the other. |
IO_ERROR_EOF = -768, | ||
// Source will not copy data into caller provided buffer | ||
IO_ERROR_MUST_NO_COPY = -1024, | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are errors distinguished any further than this? I think you’d need at least a way to forward all errno
errors…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far, not yet. This is just a rough error reporting mechanism right now, it would need to be significantly improved :-)
io_pull_set_pull_cb(&pull, AfterPull); | ||
|
||
int status = mySource->Pull(&pull, &buffer.len, &buffer, 1, | ||
IO_PULL_FLAG_MUST_COPY); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the idea here that pull
is copied into memory managed by the source object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pull
is just an async handle, really. Used to maintain context in the callback.
The IO_PULL_FLAG_MUST_COPY
tells the Source
that it must copy it's data into the provided buffer
rather than providing pointers to it's own buffers in the callback.
|
||
### Example 5: Binding a Source and Sink | ||
|
||
Binding is roughly the equivalent of `pipe()` in Readable, with the notable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I’d prefer to call this ‘piping’ then?
And just to be clear, I understand correctly that only one binding/piping operation is happening for any given source, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd prefer to call this 'piping' then?
I considered that, but given that it is pull based rather than push based, I wanted to avoid accidentally conflating the two.
... only one binding/piping operation is happening for any given source, right?
That's the idea but we need to decide that for certain. When we discussed this over dinner in Vancouver, the idea was that data flow would be either One-Source-to-One-Sink, or One-Sink-Many-Sources, in order to keep things as simple as possible.
IO_PULL_FLAG_NONE = 0x0, | ||
// Callback must be invoked synchronously (also sets the | ||
// IO_PULL_FLAG_MUST_COPY and IO_PULL_FLAG_STRICT_LENGTH flags) | ||
IO_PULL_FLAG_SYNC = 0xD, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this imply IO_PULL_FLAG_MUST_COPY
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the only way for the Source
to provide pointers to it's own buffers is via the callback, which is not being used here. This is telling the Source
: fill these buffers now and return immediately when you do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So maybe Pull
could yield its own buffers synchronously, if there are any? This sounds like something that would otherwise create unnecessary memcpy()s otherwise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's actually the intent. The two approaches are:
- caller allocates the buffer, calls pull,
Source
memcpy's data into those - caller calls pull,
Source
provides pointers to it's own buffers without memcpy
Currently, for option 2, the Source
must use the callback to deliver it's own buffer pointers.
We could change this, however, so that Pull
can yield buffers back to allow Source
to provide it's own pointers without requiring the callback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw; one of my (and others’) pain points with streams is that they delay a lot of events, whether necessary or not.
It might be nice if we could drop this requirement, and calls would be allowed to call their callback synchronously or asynchronously, whichever fits more nicely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I agree. With the callbacks here, that's exactly what I'm allowing for unless the code calling Pull
explicitly says that it needs sync processing. Unless that flag is set, the Source
is largely free to invoke the callback whenever it wants, sync or async, once the data is available. Unless the code calling Pull
sets that sync flag, it must not make any assumptions at all about when the callback is going to be invoked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(hopefully I didn't misunderstand your comment there tho :-) ...)
// Peek at what data is available, do not actually read. | ||
IO_PULL_FLAG_PEEK = 0x2, | ||
// Puller considers the length given to be strict, | ||
// Source must not overflow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overflow? i.e. return more data than requested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. There are a couple of options here:
-
The caller allocates it's own
io_buf_t
with a specific size, tells theSource
that it must fill those (this implies that overflowing is not allowed) -
The caller allocates it's own
io_buf_t
with a specific size, tells theSource
to provide data up tolength
and no more.Source
may choose to fill the providedio_buf_t
, or provide pointers to it's own allocated buffers. The caller may or may not allow theSource
provided buffers to be larger than what it asked for.
Does this increase the likelihood/possibility of a pull-based JS stream API or is this mainly benefiting C++ node core internals only? |
@mscdex ... yes, the intent would be to extend a pull-based JS stream API also. I believe @Fishrock123 has been exploring that. The idea would be to make this and that sync up :-) Another goal of this is to allow us to consistently bind at the native layer when possible... that is, if we have a JS Source and a JS Sink, both of which are backed by C/C++ Sink an C/C++ Source, we can do the |
Fwiw, we have that; |
Yes, I should have clarified: the idea would be a separate implementation that does not carry all of the existing Streams 1, 2, and 3 cruft along with it. It would be a separate API that is intentionally not backwards compatible with the existing stuff. Someone could write code that bridges the two, but that would be secondary. |
That's not quite the same thing as far as this API goes (from what I can tell). This pull-based API lets you supply a Buffer to use for writing the data, whereas |
Yep. This API actually supports both approaches. The caller can supply a buffer(s) to fill (and require that the Source use those), or may allow the Source to provide it's own buffer. |
enum io_pull_status { | ||
IO_PULL_STATUS_OK = 0x0, | ||
// After this call, wait before asking again | ||
IO_PULL_STATUS_WAIT = 0x1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it better to just wait in the source and return when data is available?
If the sink needed to I suppose it could peek or unbind on a timeout, but it seems better to wait in the source for data if there will be some rather than polling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it better to just wait in the source and return when data is available?
Not necessarily. In the http/2 implementation, for instance, the callback from nghttp2 is sync. If data is not currently available, the stream is put into a deferred state until nghttp2 is explicitly told that data is now available. That is precisely the kind of use case this is meant to support.
The way Pull is defined, we have a simple set of primitives that allow the following scenarios, which can be used based on the needs of the Source and Sink:
- Sync Pull ... Source waits to return until data is available (Blocking-Pull)
- Async Pull ... Source waits to call callback until data is available (Nonblocking-Pull)
- Sync Pull With Deferred Polling ... Source returns immediately saying data is not yet available, tells caller to check back later
- Sync Pull With Deferred Signal ... Source returns immediately saying data is not yet available, will let caller know later when data is available (see Sink::Signal)
- Async Pull With Deferred Polling (like above but using callback)
- Async Pull With Deferred Signal (like above but using callback)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the stream is put into a deferred state until nghttp2 is explicitly told that data is now available. That is precisely the kind of use case this is meant to support.
So what you're saying is "4. Sync Pull With Deferred Signal"? i.e. a notification push to the sink? That would make sense but I was under the assumption we were avoiding that flow more entirely. I'll try it in the JS impl when I can.
Right, in a way. (Just way more complex than necessary.) And that's kinda a good thing. It means we should, with high certainty, be able to shim the ends of these "streams" to Streams3 - allowing for compatibility with the existing Streams3 ecosystem. |
Aight, my work thus far is here: https://github.com/Fishrock123/bob Rough, but works for "Isn't that just pull streams?" — in concept, yes. Error flow and buffer allocation is different than @jasnell How does the bind structure work here to ferry re-calls to |
Awesome :-) ... I'll take a look in detail tomorrow.
TBD at this point. I'm hoping to work up an implementation of that by end of week. |
I would prefer to land this into a separate Node repo, and prove a point by porting some bits of the previous API. |
+1 on that @mcollina ... should I do like I did with the http2 repo and create a new |
Yes go ahead! |
I've got a few higher priorities items to clear out but will do that by tuesday of next week. |
virtual int Pull(io_pull_t* handle, | ||
size_t* length = 0, | ||
io_buf_t* bufs = nullptr, | ||
size_t count = 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this?
virtual ~Source() {} | ||
|
||
virtual int Pull(io_pull_t* handle, | ||
size_t* length = 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this length
written to? (Guaranteed to be?)
What does the passed-in length do? Set the maximum?
It seems like the two things my JS implementation is not currently capable of covering is peek and sync-required read. Maybe I'm missing something else? Note that my impl currently requires callbacks and requires a buffer (either passed form the sink or directly from the source) be passed to the callback, and that the sink always uses that one. |
I would like to help with this effort. Is there anything that is low hanging fruit to do in order to get involved in this effort? Thanks for all your help and hard work on node.js core and I look forward to working with you all more in the future. |
Updated naming to be more like James' C++ impl: nodejs/node#16414 Fixed some bugs. stdout-source works again.
What is the status here? Any further progress? |
Progress is being made. Please do not close this.
…On Feb 10, 2018 08:11, "Ruben Bridgewater" ***@***.***> wrote:
What is the status here? Any further progress?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#16414 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAa2eQ4m-SKCynuX21OO7S-QJ_UXPK9Eks5tTb-6gaJpZM4QDINt>
.
|
[DO NOT MERGE... This is a work in progress concept]
Consider everything in this mutable at this point... this is just a concept up for discussion
@mcollina @Fishrock123 @trevnorris @addaleax
This is an extremely rough WIP C/C++ level concept for the "new" pull-stream API we discussed.
@addaleax ... for context, this was something that @mcollina , @Fishrock123 , @trevnorris and I began brainstorming over dinner one night in Vancouver.
Look at the
doc/io_wip.md
to get a sense of how it works. This is extremely rough at this point but should cover most of the use cases. There is still a ton that would need to be worked through on this.Please ask questions in the comments here and be as brutal as you'd like. If we want to move forward on something like this we need to make sure we get it right.