-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add I/O interfaces to Roadmap.md #387
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking Go's approach seems like a good idea. ❤️
Roadmap.md
Outdated
#### I/O | ||
|
||
There will be two API layers for I/O: | ||
1. A low-level abstraction with (for sockets) non-blocking read/writes operating |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A low-level abstraction
with(for sockets) with non-blocking read/writes
Roadmap.md
Outdated
function write(fd: number, p: Uint8Array, nbytes: number): number; | ||
|
||
// Low-level close. | ||
function close(fd: number): void; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you've missed open
function open(pathname: string, flags: OpenFlags): number;
Hey, - I'm not sure this is the correct approach for the high level API because there are very nice native constructs we can utilize instead (the low level API looks good). Namely, instead of defining a new Node also does this with its streams in Node.js 10. Basically instead of doing the following which the proposed solution does: async function copy(dst: Writer, src: Reader): Promise<number> {
let n = 0;
const b = new Uint8Array(1024);
let got_eof = false;
while (got_eof === false) {
let result = await src.read(b);
if (result.eof) got_eof = true;
n += await dst.write(b.subarray(0, result.nread));
}
return n;
} You could do: async function copy(dst: Writer, src: Reader): Promise<number> {
let n = 0;
for await(const chunk of reader) { // chunks reuse the same memory internally
n += await dst.write(chunk);
// doing `break;` here calls `.return` on the iterator automatically
}
return n; // not really needed since the state is on the iterator now and not a number which is safer.
} This is especially convenient for two reasons:
|
@benjamingr I don't see the benefit of your proposal. Your solution is a higher level, because the buffer allocation is now part of the interface. With the API I'm proposing, allocation is handled by the user -- which is useful in many applications where the user has specific knowledge of how the memory will be used. |
@ry I totally see the value of the user controlling allocation and in fact I think any good I/O implementation should provide that capability. This is what I meant with BYOB reader. The above example assumed that the buffer was passed when the reader was acquired. It is also possible to do by passing it to the part acquiring the iteration like this. The rationale for using the built in standard JavaScript mechanism for this (async iterators) is that the language already allows for a construct where:
I definitely think that any reasonable API should let the user control allocation as this is something that personally pained me a lot of times. |
Oh, I missed that. Nevertheless this all seems way more complex than what I've outlined.
None of these points are excluded in my proposal. Since my proposal is lower-level and simpler, I think it would make more sense for users to implement the "streams standard" on top of this, if they felt so inclined. |
Why is the Go interface proposal lower-level? When you say:
If I understand - you mean that you don't like the syntax or semantics in the async iterators example above (totally fair). May I ask why? I want to understand if it's something inherent to the solution (async iterators) that can't be fixed or something that's just related to the way I wrote it down. btw - another thing you can do is implement |
@benjamingr Maybe "lower-level" isn't the right term since it seems you could implement this Go interface in terms of "Standard Streams". It's simpler. The ideal of UNIX is this concept where "everything is a file". That description isn't quite right... The ideal of UNIX is that FDs are these core abstractions that can be manipulated and combined for all sorts of purposes. The ideal FD interface is just read/write/open/close. We never quite got this ideal. Linux is massively more complex than that - but everyone considers it a laudable goal. The Go I/O interfaces are basically the simplest lowest common denominator that supports all real world situations: async File I/O, non-blocking sockets, half-opened connections, backpressure. This is the interface Rob Pike returned with 20 years after Plan9. The "Standard Streams" proposal is unnecessarily complex. I have nothing against async iterators - but just because they are possible doesn't mean that everything should be an async iterator. "read" is a fundamental operation and should be exposed as such. |
The main question to me is how do we adapt these Go interfaces to JavaScript. For example: what errors are treated as exceptions, if any? |
@ry to be clear: I'm not saying deno should implement the whatwg standards proposal. I also totally get why the Go interfaces make sense for Go. I also think they would make an OK interface for TypeScript with the right semantics. I also like the unix "everything is a file" philosophy - as you said not everything is random access, readable and writeable. Let's ignore for a moment anything but non-random-access readable things. For that case JavaScript already ships with async iterators which have very similar semantics to what you are describing (for reading). The interface is basically having a const reader = getReader(); // get async iterator from somewhere
await reader.next(buffer); // returns Promise<{done: false }>, can also return other data
reader.return(); // this closes the underlying reader
reader.throw(); // may also choose to support it I have some slides in this pretty old 2016 talk that predates it making the standard. Basically all it requires is that I see async iterators as a way to "JavaScript"ize Go's Reader. They are not the only way but I think it's really nice to happen to have a built-in construct that happens to have very reasonable reader semantics. |
Ok good.
It will be very easy to wrap any "Reader" interface (as I've defined above) to do this. I don't think the default, fundamental, read operation should be called "next". Implementing "read" is a separate concern to whether you want to iterate on that object. |
I think we can agree it will be very easy to wrap every interface you (or I) come up with with probably every other interface that has reasonable pull properties.
I am not sure I understand that fundamental difference between iteration and reading at all. My understanding is that iteration just encapsulates the state management in the iterator (managing the number of bytes read) whereas reading does not. My concern here is that by not embracing the language's abstractions (like the iteration protocol) Deno would be working 'against the grain' as the language grows - much like how Node is stuck with callback APIs forever now while everyone is using async/await. Again - I totally understand why the Go APIs make sense for Go. Also - It's totally fine if you don't see it that way or think that the problem of iterating a file and the problem of reading it are sufficiently difference or if you think the API is footgunny or bad. I just feel like this might turn out to be a missed opportunity like promises in Node.
By the way: There is a nice paper by Erik Meijer that explains where all the concurrency primitives made sense when they did this with async/await (for dart?). I'll try to dig it up (hopefully the ping works). It's also worth trying to get Gilad Bracha to see if it's possible to get feedback about what modern readers/writers might look like with async/await. Of course - feedback from Rob Pike would also be nice given the vast amount of experience he has with this problem.
Normally I'd just ping them (didn't want to be rude, is that ok with you?) - since this is a new platform the people who worked on the abstractions and APIs of other languages and platforms typically have a lot of interesting things to say that are worth exploring. |
Iterating (in this context) is reading in a loop - read is a single read. Maybe you just want the first 10 bytes into a given buffer. Here this is
which is pretty close to the natural language phrasing. How does this look in your proposal?
I acknowledge your concern. But I think async iterators is not the correct abstraction here. We're defining the foundational I/O interfaces, they should be as orthogonal as possible. That's not to say that I don't think async iterators shouldn't be used for I/O - I just don't think it should be forced at this layer. If someone has a class that they want to be used with the I/O utilities, like
That's fine with me. The point of this Roadmap is to get more eyes on it in order to make fewer design mistakes. |
@headinthebox @robpike @gbracha - sorry for the ping but this has worked out too well in the past to not try. Deno is a new server-side platform written with TypeScript that is working on new I/O interfaces. We are currently discussing what those interfaces look like. All proposed solutions are pull based (as in the consumer asks for the data in chunks), allow the consumer to control allocations and differ in what control flow structures they use. If you got to design these sort of APIs all over again - what would you do?
If you prefer discussing this in email or in a smaller (or less public) forum - do feel free to reach out at benjamingr@gmail.com (or to Ryan who wrote the thing directly at ry@tinyclouds.com). |
While iterating in a loop is the most common case, just grabbing "the next item" is also a pretty common use case. I think it is important for any proposal to support both of these.
Well: let buf = new Uint8Array(10);
await fileIterator.next(buf); Which I concede is a bit uglier for this particular use case. It is also possible to expose
Well, implementing // this example polls an 'async memory stream' that can be a file, database, or whatever
// it's only async since iteration itself is async and a synchronous memory stream would be pointless
// proposed API
function transformStream(stream, processFunction) {
// omitting the need to check how many bytes we actually read here
// synchronise read calls since consumers are not guaranteed to await reads
let queue = Promise.resolve();
read(buffer) {
queue = queue.then(async () => {
await stream.read(buffer); // also need to check for eof here and propagate etc
await processFunction(buffer);
});
await queue;
}
}
// async iterators
async function* transformStream(stream, processFunction) {
// no need for queuing and synchronization
// no need for eof check since closing the source will finish the loop
// we can pass the buffer here if we want or omit it
// and a reference to the underlying source will be used if possible with 0 copies
for await(const chunk of stream) {
yield await processFunction(chunk);
}
} |
And maybe only 5 bytes were actually read. How do I find that out? I suppose EOF is signaled by |
The version of let buf = new Uint8Array(10);
// result.done for eof and result.value for the number of bytes read
// you can also do result.n or result.nread if you prefer
let result = await fileIterator.next(buf);
Yes, and in general any reader finishing as well as any iterator returned from ES2018 code using |
@benjamingr Ok, so it's very close to the same interface - the function is called 'next' and returns |
@ry thinking about this more - I think Reader should just expose a That way people will be able to use both This is what the other two languages with async iterators already landed (Python and Dart) do as well. Since it's only exposing a symbol (much like arrays expose Symbol.species and Symbol.iterator) it doesn't really increase the API surface people use or expect either. |
// return of 0 and nil as indicating that nothing happened; in particular it | ||
// does not indicate EOF. | ||
// | ||
// Implementations must not retain p. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really sure what this means in this context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It means that anyone who is implementing the Reader interface should not store copies of the TypedArrays passed to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understood it to also imply that implementations should not store references to the TypedArrays passed to it as well (rather than just copies). I wonder if there is a better way to say this.
Roadmap.md
Outdated
// does not indicate EOF. | ||
// | ||
// Implementations must not retain p. | ||
async read(p: Uint8Array): Promise<ReadResult>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Maybe we should say
p: ArrayBufferView
, whereinterface ArrayBufferView { buffer: ArrayBuffer, byteOffset: number, byteLength: number }
-
I feel that
p
should be optional. Reading into a preallocated buffer is probably optimal if you can re-use the same buffer over and over again. But I think a lot of code will end up doing this:let buf = new Uint8Array(64436); const r = await conn.read(buf); r = r.subarray(0, nread);
The use of a preallocated buffer now gets in the way of deno doing something intelligent, e.g. allocate exactly the right size of buffer (if the stream type allows it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on ArrayBufferView
-1 on optional p. This is meant to be the most minimal interface that objects must implement to be a Reader - this ergonomics are left to higher layers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on ArrayBufferView too.
If we expose Symbol.asyncIterator
around .read
that works with a view (and hopefully no buffer allocations at all but just a view into the memory) we get the minimal .read
as well as the usable for await
.
Edit: by no copies I mean that when you pass a buffer in at least one copy has to be made (to the buffer) whereas there are cases it is possible to perform zero copies of the data.
@benjamingr I'd rather add utility function to convert any reader to an async iterator. Does this work for you? import { readerIterator } from "deno";
function readerIterator(r: deno.Reader): AsyncIterator<ArrayBufferView>;
for await (let buf of readerIterator(socket)) {
console.log(`read ${buf.byteLength} from socket`);
} |
I'd like to ping @davidfowl He designed the new IO primitives for .net core. |
I don't know TypeScript well enough (at all, really), so have little to say besides showing the successful example of Go's I/O interfaces, which are already informing the conversation. It is important to specify all the details around who owns the memory in the buffer. Also, define or explicitly leave undefined every possible detail about what happens on incomplete or failing or partially failing I/O operations. |
@milutinovici oh sure, @davidfowl was a lot of help a few years ago talking interfaces (on jabbr?)! @robpike thanks for weighing in! is there anything in particular you regret about Go's interfaces or you would do differently today? @ry it's better than not having it at all since now users will be able to use The biggest pro I can think of for making it external is that it might make implementing readers easier. The con is that if readers are expected to be worked with directly then this will be very verbose whenever a stream is iterated (which I assume is a very common if not the most common case). It is also different from what other languages did. Another concern is that if readers are not async-iterable people might pass around two types (readers and iterators). Other platforms solve that part with subclassing (which I don't really think we should do). It's also possible to expose a factory (like |
@milutinovici Thanks for pinging me. FWIW the new IO abstractions on .NET being referred to are System.IO.Pipelines. They are a bit higher level but are trying to solve some of the hard problems you face when doing high performance IO.
+1, that's fundamental to making a good IO API. Most of the IO APIs in the various standard libraries require the user to pass a buffer. This API decision could end up forcing implementations to do more copying (from an internal buffer into a user buffer). The other interesting side effect of forcing the caller to allocate/control the memory is that they may end up allocating memory before there's any data to read. That's of course that's all subject to how you expose concepts like sockets and websockets in deno but it's something to think about (https://eklitzke.org/goroutines-nonblocking-io-and-memory-usage) |
// does not indicate EOF. | ||
// | ||
// Implementations must not retain p. | ||
async read(p: ArrayBufferView): Promise<ReadResult>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does there need to be a way to cancel the current read/write operation without signaling EOF?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's worth mentioning neither Node nor Go support aborting a single read operation as far as I know and it hasn't been very commonly requested since these calls are typically cheap.
It's also worth mentioning we don't have proper cancellation semantics for async functions in JavaScript yet (CancelTokens were proposed but not really adopted and cancellable async functions are stuck). It might be another 2-3 years before we can have a good cancel story.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benjamingr shutdown is for half open TCP connections - not aborting read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the brainfart! That was incorrect from my end - I started writing about close semantics with abort and proceeded to write about cancellation semantics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, it may not be common to cancel a single IO operation. So to abort the whole thing you need a ReaderCloser and you'd just call Close right? Usually you end up with a chain of calls where Reader(s)/Writer(s) decorate other Reader(s)/Writer(s) and you need to be able to abort/cancel that chain reliably.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davidfowl You'd (hopefully) .close
one which would close the one it's wrapping and so on cascading to close the calls and reject all the pending operations.
Ideally I'd want it to abort (and not reject) the operations but since we don't have promise cancellations or cancellation semantics yet - rejecting pending Read
s on Close
is probably the best we may be able to do but it's worth prototyping and checking for footguns.
Roadmap.md
Outdated
// from Read as an error to be reported. | ||
// | ||
// https://golang.org/pkg/io/#Copy | ||
async function copy(dst: Writer, src: Reader): Promise<number> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be optimized if there was an interface that the reader could implement to write directly into the writer. If a reader is managing internal buffers then this logic might force copies. .NET has a virtual method CopyToAsync
on the Stream
abstraction that allows for this. See https://docs.microsoft.com/en-us/dotnet/api/system.io.stream.copytoasync?view=netcore-2.1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It certainly would be nice to have sendfile(2)
support.
I will leave adding a copyTo
method for further revisions - as it's an optimization that won't exist in general - but thanks for bringing it up. (I'm not sure how/if Go does this...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about this when reading the arguments for/against passing a buffer and forcing a copy and couldn't think of any case other than sendfile
where this works - it might be simpler to just support sendfile
directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here it is https://github.com/golang/go/blob/15ac56fd60a28879fd65f6d761cc30c4c57ec0f1/src/io/io.go#L181.
// WriterTo is the interface that wraps the WriteTo method.
//
// WriteTo writes data to w until there's no more data to write or
// when an error occurs. The return value n is the number of bytes
// written. Any error encountered during the write is also returned.
//
// The Copy function uses WriterTo if available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the impl of Copy:
There's a fast path check for WriterTo and ReaderFrom to avoid buffer copies.
Just to be clear, I wasn't suggesting async iterator, I was however suggesting inverting who allocates the buffer. There are various tradeoffs with doing that. I would like however encourage thinking about higher level APIs that will build on top of these low level APIs. Looking at a few real implementations of things like servers and clients and transforms like TLS and compression might help inform some of the design. |
Just to clarify (in case it wasn't clear) - after the discussion (thank you) and the addition of (I also think David raises a very interesting point but am still for letting users avoid allocations by passing buffers) |
The ironic thing is that users really can't always avoid allocations when you stack these things when you force them to allocate. |
@piscisaureus PTAL - I'd like to land this without many more changes and fix up in later commits - as this has been hanging around for too long now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For posterity - I am not super happy with the fact that, to receive data, the caller always has to pre-allocate a buffer. We will need an alternative arrangement for C100K servers.
But let's land it and patch it up as we get closer to implementing it.
(also needs squash or rebase)
…oland#387) This commit adds "RuntimeOptions::wait_for_inspector_disconnect_callback" that allows to pass a callback that will be called when the event loop runs out of work, but there are active, non-blocking inspector sessions - that is sessions from "Chrome DevTools". This allows to print a message notifying user that the program has finished, but is waiting for DevTools to disconnect to exit the process.
No description provided.