Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support reading bytes into buffers allocated by user code on platforms where only async read is available #253

Closed
domenic opened this issue Dec 10, 2014 · 121 comments · Fixed by #296
Assignees

Comments

@domenic
Copy link
Member

domenic commented Dec 10, 2014

Had a thought-provoking conversation with @piscisaureus today. He works on libuv and was giving me their perspective on how they do I/O and how it interfaces with Node.js's streams.

He pointed out that resources like files and pipes shared with other processes (seekable resources, he called them) don't have the epoll + read interface that ReadableByteStream's current design is based on. In Node.js, they do blocking I/O in a thread pool with pre-allocated buffers.

This works for a ready + read() JS interface, since then the implementation can pre-allocate (say) a 1 MiB buffer, read into it off-thread using blocking I/O, then fulfill the ready promise. When JS calls read(), it just returns the pre-allocated buffer.

It doesn't work as well with ReadableByteStream's ready + readInto(ab, offset, sizeDesired) interface, since we don't know what size buffer to allocate until too late. If we pre-allocate something too small, readInto will always be returning a number below the desired size, which is a bit user-hostile. If we pre-allocate too large, we are at the least wasting memory, and at the worst getting ourselves into a bunch of trouble as we need to figure out how to merge the "leftovers" with other chunks we read in the future.

(I am assuming here it is easy in V8 to map the [offset, offset + max(sizeRead, sizeDesired)] portion of ab onto a given C++ backing buffer.)

A readIntoAsync(ab, offset, desiredSize) model would work a bit better, since then we'd know to pre-allocate a buffer of size desiredSize. But, I was curious if you guys had any other thoughts? Did you think of this issue when designing the API, @tyoshino? I imagine @willchan could be helpful here too.

@tyoshino
Copy link
Member

If the blocking I/O:

  • accepts a buffer
  • based on the size of the given buffer, controls how much data it pulls from the source

Then, we want to allow the user code to allocate a buffer with size determined based on their status. To realize this, yes, we need some API like readIntoAsync(). But between ready + read() and ready + readInto(ab, offset, sizeDesired), I don't think there's so much difference. ready + read() doesn't know what size is appropriate. 1MiB may be too big or too small. The only disadvantage of ready + readInto() for this kind of I/O is an extra copy (internal buffer to user allocated buffer).

In what form can we provide readIntoAsync() together with read()? To allow sync read() we have on the ReadableStream, we need to pull data from the source at least when we're in the "waiting" state. Otherwise, we never enter "readable" state. But this spoils the readIntoAsync()'s benefit. We'll have some data already pulled (to serve for read() user) when readIntoAsync() is called.

So, we need to add some interface to suppress automatic pull when we want to use readIntoAsync() style reading?


If the blocking I/O:

  • tells us how much data is readable on unblocking
  • and then, accepts a buffer

Then, ready + read()'s

  • pros: can avoid using too small buffer (includes cost of leftover merging as you say, etc.). can avoid allocating too big buffer.
  • cons: the user cannot limit size to read. all available data will be passed to the user code.

(I assumed that the unblocking event leads to fulfillment of ready, and read() allocates a buffer with suitable size and populates it)

For this case, there's no big difference between readIntoAsync() and readInto(), I think.

@domenic
Copy link
Member Author

domenic commented Feb 10, 2015

Sorry for the delay on this. I had a hard time paging it back in.

Unfortunately I think it is the first case:

  • accepts a buffer
  • based on the size of the given buffer, controls how much data it pulls from the source

This can be seen by looking at e.g. Win32 ReadFile or Unix fread and pread.

In what form can we provide readIntoAsync() together with read()? To allow sync read() we have on the ReadableStream, we need to pull data from the source at least when we're in the "waiting" state. Otherwise, we never enter "readable" state. But this spoils the readIntoAsync()'s benefit. We'll have some data already pulled (to serve for read() user) when readIntoAsync() is called.

Yes, I agree, this is a problem. The only thing I can think of immediately is to remove sync read() and the whole waiting/readable state transition :(. Assuming that we would carry that over to ReadableStream that would be a lot of churn this late in the game which is frustrating. Maybe we can think of a better solution. E.g. maybe SeekableReadableByteStream only matches ReadableStream in its pipeTo API? :-/

@tyoshino
Copy link
Member

Shall we add a new mode to ReadableByteStream, say, manual feed mode, named after manual paper feed of printers.

ReadableByteStream would have a method named .feedArrayBuffer(arrayBuffer, offset, size). Via this method, the user feeds a tuple of an ArrayBuffer, offset and size. The tuples are queued and used by the ReadableByteStream to create the result ArrayBuffers to return from the .read() method. How to interpret the size argument is up to the API that returns the ReadableByteStream. The state and read() are kept the same as ReadableStream but ArrayBuffer feeding happens asynchronous to them.

@tyoshino
Copy link
Member

PR #282 is quick prototype of the manual feedable stream.

@wanderview
Copy link
Member

Warning: naming bikeshed ahead... feel free to ignore.

I think an interface for manual allocation might be useful, but not sure the "feedable" metaphor resonates. I've seen this kind of thing called an "allocation strategy" other places, etc.

It seems this would definitely be a power user feature, though. So lets make sure its optional?

@domenic
Copy link
Member Author

domenic commented Feb 10, 2015

@tyoshino this is sounding promising. I think I need a bit more detail to understand how it's supposed to work though. Especially how it interacts with readInto. I think I like the general approach though, of a more-specialized variant of ReadableByteStream that helps with this case.

Would love if you could flesh out the PR with some examples of consumer code, using read/readInto/feedArrayBuffer.

As for naming, I don't have any strong preferences. "Strategy" is kind of already taken, but something about allocation might be nice.

@tyoshino
Copy link
Member

wanderview: I agree that this kind of stuff is called allocation something or allocator. I used feeding since it's not passive (asked to allocate) but the user tells it to pull by putting a new ArrayBuffer via the feedArrayBuffer() method. I don't care so much about the naming but that's my feeling. Feedable might be misleading. ManualArrayBufferFeeding... is the most descriptive name I come up with but too long.

@wanderview
Copy link
Member

Could we use the revealing constructor pattern here again? Something like:

var rbs = new ReadableByteStream({
  allocate: function() { return new ArrayBuffer(myPreferredSize); }
});

Then we don't need a new stream name. :-)

If we add a free hook, then you could build a block pool allocator strategy:

// use ping pong buffers
var blocks = [];
blocks[0] = new ArrayBuffer(4096);
blocks[1] = new ArrayBuffer(4096);

var rbs = new ReadableByteStream({
  allocate: function() { return blocks.pop(); },
  free: function(buf) { return blocks.push(buf); }
});

If we don't like confusing these attributes with the constructor "source" concept, then it could be set in a different method. Something like:

var rbs = new ReadableByteStream(/*...*/);

// use ping pong buffers
var blocks = [];
blocks[0] = new ArrayBuffer(4096);
blocks[1] = new ArrayBuffer(4096);

rbs.setAllocator({
  allocate: function() { return blocks.pop(); },
  free: function(buf) { return blocks.push(buf); }
});

Would this address the problem?

@domenic
Copy link
Member Author

domenic commented Feb 11, 2015

I don't think so, unless I'm misunderstanding. The scenario I think we're trying to solve is that the consumer of the stream wants to be able to do "fill up count bytes of ab starting at offset" and then the stream will use this to do an async (i.e., blocking in another thread) operation like fread.

I guess setAllocator would do that, since the consumer gets access to it. The revealing constructor would not though, because it assumes the consumer is the same as the creator of the stream, which is rarely the case.

@domenic
Copy link
Member Author

domenic commented Feb 11, 2015

What would be most helpful for me to understand for all these proposed ideas is, how would a consumer---given a (Feedable?)ReadableByteStream from the UA---use the stream.

For example, if we took a drastic approach and overhauled everything so that readInto and read became async, it would be pretty clear:

// Passes ab, 0, CHUNK_SIZE to fread, which uses it
rbs.readInto(ab, 0, CHUNK_SIZE).then(...)

// Uses the default chunk size, currently specified by the stream constructor (although we could allow it to be specified by consumers, hmm).
rbs.read().then(...)

What would it look like for the feedBuffer and setAllocator situations?

@wanderview
Copy link
Member

Actually, I had not thought about setAllocator() in the context of a DOM API producing the stream. I'm not sure we want to make the UA depend on content script to allocate buffers. Seems like a recipe for a bad time.

Async read seems ok to me, but I think it opens some questions. What does it do for concurrent read() calls? Do we allow overlap? Ordering guaranteed?

@domenic
Copy link
Member Author

domenic commented Feb 11, 2015

Well, that was kind of the entire point of ReadableByteStream, was to make it possible to do zero- or one-copy reads into user-allocated ArrayBuffers. For example you could do repeated readInto()s into the same array buffer that was pre-allocated to be the size of the file (determined by stat-ing) or response (determined by Content-Length).

Async read was more of a strawman, and I'd rather not dive into it right now because it'd be a lot of spec churn. (I previously explored it here, although at the time I wasn't aware of this particular case.) It's also less efficient for read(2)-style streams like HTTP streams, at least in theory. If we do decide to go that way then ... I'd feel bad about the churn at what seems like a pretty late stage, but it'd be OK I guess. But you guys had some interesting ideas for meshing the existing wait-for-ready-then-sync-read(Into) design with this use case, and I mainly wanted to see what those would look like in practice.

@wanderview
Copy link
Member

Sorry. You're right. Sorry for my confusion. I do think setAllocator() could work.

To answer your question, I think an allocator function could handle this like so:

var rbs = new ReadableByteStream(/* ... */);

var bufferSize = 4096;
rbs.setAllocator({
  allocate: function() { new ArrayBuffer(bufferSize); }
});

rbs.read(); // uses 4096 byte buffer;

bufferSize = 136;
rbs.read() // uses 136 byte buffer

I don't really like depending on state between two statements like that, but it seems the feedBuffer() approach has the same problem.

@domenic
Copy link
Member Author

domenic commented Feb 11, 2015

I think I see.

OK, I think what we need to make this more concrete now is a series of use cases that we want to write sample code for.

Here is what I have off the top of my head. Please give me more :).

  • (all) Want to read a 10 MiB file into a single ArrayBuffer
  • (chunkwise) Want to read a 10 MiB file, 1 MiB at a time, re-using a single 1 MiB ArrayBuffer
  • (ping-pong) Want to read a 10 MiB file, 1 MiB at a time, re-using two separate 1 MiB ArrayBuffers (e.g. because you are asynchronously processing the 1 MiB chunks and want to get a potential 1 MiB headstart in parallel with the async processing)

Here is my sample code for setAllocator, let me know if I got this right

// Common prelude
const rbs = getReadableByteStreamForFile("/path/to/file.flac");
const ONE_MIB = 1024 * 1024;
const TEN_MIB = 10 * ONE_MIB;
// (all)
const ab = new ArrayBuffer(TEN_MIB);

// This begins the fread(ab, 0, ab.byteLength) call
rbs.setAllocator({
  allocate() { return ab; }
});

// fulfills when the fread() call finishes
rbs.ready.then(() => {
  const readValue = rbs.read();
  assert(readValue === ab);
  console.log("done!", ab);

  // stream is now closed
});
// (chunkwise)
const ab = new ArrayBuffer(ONE_MIB);

// This begins the fread(ab, 0, ab.byteLength) call
rbs.setAllocator({
  allocate() { return ab; }
});

// fulfills when the first fread(ab, 0, ONE_MIB) call finishes
rbs.ready.then(() => {
  const readChunk1 = rbs.read();
  assert(readChunk1 === ab);
  console.log("first chunk", ab);

  // queue is now empty but we're not done, so stream automatically kicks off
  // fread(ab, ONE_MIB, ONE_MIB). This next rbs.ready will fulfill when that finishes

  rbs.ready.then(() => {
    const readChunk2 = rbs.read();
    assert(readChunk2 === readChunk1); // as desired; re-using the same buffer
    console.log("second chunk", ab);

    // Etc., until chunk 10, at which point the stream is closed.
    // Recursion left as an exercise for the reader.
  });
});
// (ping-pong)
const abs = [new ArrayBuffer(ONE_MIB), new ArrayBuffer(ONE_MIB)];
const counter = 0;

// This begins the fread(abs[0], 0, abs[0].byteLength) call
rbs.setAllocator({
  allocate() {
    return abs[counter++ % 2];
  }
});

// fulfills when the first fread(abs[0], 0, abs[0].byteLength) call finishes
rbs.ready.then(() => {
  const readChunk1 = rbs.read();
  assert(readChunk1 === abs[0]);
  console.log("first chunk", readChunk1);

  // after the rbs.read() call, the stream's queue is now empty but we're not
  // done, so stream automatically kicks off fread(abs[1], ONE_MIB, ONE_MIB).
  // This next rbs.ready will fulfill when that finishes

  Promise.all([
    processChunkAsync(readChunk1),
    rbs.ready
  ])
  .then(() => {
    const readChunk2 = rbs.read();
    assert(readChunk2 === abs[1]); // as desired
    console.log("second chunk", readChunk2);

    // Note that after calling rbs.read() this second time the stream is doing
    // fread(abs[0], 2 * ONE_MIB, ONE_MIB), so abs[0] === readChunk1 is starting
    // to be reused.

    // Anyway, keep going until chunk 10, abstract into recursive function, etc.
  });
});

Notably I didn't use your free hook for the (ping-pong) case, so maybe I'm missing something?

@wanderview
Copy link
Member

I think you're right that free does not make sense. If anything, the consumer would need to be doing the free'ing not the stream. So it can just handle that logic itself if necessary.

I guess that raises the question if allocate() can return a promise to delay the next fread() until a re-usable block is ready to be processed again.

I didn't consider that fread() would actually be triggered by setAllocator(). I understand that a bit better now. Would we want a better name?

Its not completely clear to me how this would interact with readInto(), though. Consider:

rbs.setAllocator(function() { return new ArrayBuffer(1024);}
rbs.readInto(buf, 0, 512);
rbs.read();  // what exactly does this return?  a slice of the allocator returned buffer? 

@domenic
Copy link
Member Author

domenic commented Feb 11, 2015

I didn't consider that fread() would actually be triggered by setAllocator(). I understand that a bit better now. Would we want a better name?

I think maybe that is the motivation behind @tyoshino's feedArrayBuffer() paradigm. setAllocator() is a bit more automatic since you call it once instead of every time, but the ideas are starting to converge I think :).

Its not completely clear to me how this would interact with readInto(), though.

Hmm, yeah, readInto() makes less sense here. Maybe the right way to phrase it is, I'm not sure what use case it would address. Perhaps readInto() makes more sense for the read(2)-esque streams instead of the fread-based streams, and wouldn't exist on AllocatorControlledReadableByteStream (or whatever)?

@domenic
Copy link
Member Author

domenic commented Feb 11, 2015

One advantage of setAllocator over feedArrayBuffer is that once the allocator is set, you can pass off the ReadableByteStream to someone else and they can consume it as if it's a normal readable stream. I think with feedArrayBuffer they need to be aware of the feed/read cycle and can't just do ready/read like normal.

@wanderview
Copy link
Member

It would be nice if this didn't require a new type.

I think setAllocator + readInto would be ok. It would just copy the data an extra time. The fread into the allocator returned buffer and then again into the readInto buffer. Not recommended, but not breaking.

Any readInto call would be limited to the size of the allocator returned buffer. If readInto completely drains the buffer, then the allocate() function is called again.

@domenic
Copy link
Member Author

domenic commented Feb 11, 2015

I don't have much against new types, as long as they are duck-compatible on a large subset of the API. shrug

@wanderview
Copy link
Member

Well yea, but you were talking about removing a method... so a duck with only one foot. :-)

@tyoshino
Copy link
Member

Hmm, yeah, readInto() makes less sense here.

Yeah. Streams with fread style underlying source would be queue-backed stream (though it's likely to be a single element queue) (see #275 (comment)). So, readInto() is not useful but introduces an extra copy. Ben's #253 (comment) should just work.


One advantage of setAllocator over feedArrayBuffer is

I agree that the new user doesn't need to do the feeding, but the new user needs to:

  • understand the internal mechanism of AllocaterControlledReadableStream,
  • know what kind of allocator the previous user set to the stream,
  • and be aware of the timing the previously used ArrayBuffers will be reused.

These were not required for normal ReadableStream user. So, I think streams to pass to someone else should have an allocator that doesn't reuse ArrayBuffers.

For use cases in which the consumer code wants to adjust the size to pull (pass to fread()):

  • In such cases, .feedArrayBuffer() is simpler than updating some value in the allocator closure. Another approach is calling setAllocator() again and again for updating allocator. This might be as simple as .feedArrayBuffer(), but the user should be aware of when the internal mechanism invokes fread().
  • .feedArrayBuffer() approach can tell the stream to stop pulling (calling fread()) simply by not calling .feedArrayBuffer() while setAllocator() needs to return a pending Promise as Ben said like the new underlyingSource.pull().

So, I suggest that we encapsulate fread() style underlying byte source by say, ManualBufferFeedReadableStream that has .feedArrayBuffer() but also has some switch to make it "auto-pull" or even .setAllocator() method to accept custom ArrayBuffer allocator so that it can be passed to ReadableStream consumers. Or, we could also introduce kind of "joint" (like transform-stream concept) that encapsulates the fread() style ReadableStream, calls .feedArrayBuffer appropriately and exports ReadableStream interface.

Hmm, but now I wonder if this is something we should define in the Streams standard or not. Maybe we should just include some advice how to encapsulate blocking I/Os with the ReadableStream interface?

@domenic
Copy link
Member Author

domenic commented Feb 12, 2015

Hmm, but now I wonder if this is something we should define in the Streams standard or not. Maybe we should just include some advice how to encapsulate blocking I/Os with the ReadableStream interface?

What is really worrying to me is that we seem to have painted ourselves into a corner, so that now all the ideas that we're considering involve passing off the resulting complexity to authors. Whether it be feedArrayBuffer() or setAllocator() or "follow these guidelines in the spec," in all cases it feels like this kind of sucks. All of the stuff we're discussing now seems too low-level---it's more stuff that the stream creator should be concerned with, when developing their underlying source object, and not stuff the consumer should be concerned with.

Stated another way. One of the things the current design does really well, I think, is focus on unifying push and pull sources into a single author-facing API: a "readable stream." I think it should be within our charter to unify read(2)-backed and fread-backed readable byte streams. (And, I also think it's important to make readable byte streams a special case of readable streams---or an extension of them, depending on how you look at it.)

At this point I really don't see any truly good solution except moving to async read---everywhere. But I feel really bad about this. One of the very original points of contention between your W3C streams draft and my WHATWG streams draft, inspired by Node streams, was saying that sync read was important. Now, I don't think at the time either of us had a complete understanding of all the issues involved, and how it would impact the byte-stream case due to this fread vs. read(2) dichotomy combined with the desire to specify up front how many bytes you want to read. (Or maybe you did and I just wasn't listening? :-/) We've learned a lot about the problem domain since and in general it's good to revise APIs in light of new information. But it's late! You guys have an implementation in Chrome that you want to ship soon! And making this big of a change this late gives off instability signals that make me quite sad.

I guess what I'm asking is---if I can draft up complete semantics for a switch to async read over the next week, how disruptive do you think this would be? To the implementation you guys are hoping to ship soon for Fetch, and to the ecosystem overall? Here's a tentative sketch of what I'm thinking, happy to revise:

ReadableStream

  • ReadableStream.prototype.read() becomes promise-returning
    • It fulfills with ReadableStream.EOS ("end of stream") when there is no more data
  • ReadableStream.prototype.state combines "readable" and "waiting" into a single state, probably named "readable".
  • ReadableStream.prototype.ready disappears
  • Multiple calls end up asking for subsequent chunks, i.e. rs.read().then(...); rs.read().then(...); should pull two chunks out, instead of e.g. rejecting the second promise or fulfilling both with a single chunk.
    • Alternately: if the second call rejects, then we might be able to get rid of exclusive readers, since in that case if you do rs.read().then(chunk => { rs.read().then(...); }), i.e. if you immediately react to the read promise fulfilling by reading from it again, then I don't think anyone else can interfere.

ReadableByteStream

  • ReadableByteStream.prototype.readInto(ab, offset, size) becomes promise-returning, fulfilling with the number of bytes written into ab.
    • For fread-backed streams, it calls blocking fread(ab, offset, size) in another thread
    • For read(2)-backed streams, it can behave similarly to now, except that instead of writing zero bytes into ab it can block until there are bytes to write. (And maybe it should in general block until there are size bytes to write, except at end-of-stream?)

Please let me know ASAP if you think I should work on this. If you think it's too late for such a big change and we should explore other options, I can understand. But if you think it's worth doing then I want to get it ready ASAP so as to minimize disruption.

@tyoshino
Copy link
Member

I'd like to do some survey and create a table of APIs and their characteristics (sync/async, blocking/nonblocking, bring our own buffer/returns API allocated buffer, pull and read are separate/pull and read are combined, ...). Includes some non software API stuff for metaphor.

  • pull without size arg + pause
    • XON/XOFF
  • pull without size arg
    • returns API allocated buffer
      • XMLHttpRequest (each HTTP transaction viewed as pull)
  • pull with size arg
    • TCP (pull: ACK packet, receiver window update)
    • HTTP/2 flow control (pull: WINDOW_UPDATE)
    • bring your own buffer
      • fread(3) (pull/read: fread)
      • Win32 ReadFile() (pull/read: ReadFile)
  • always pull + manual pause
    • Ethernet PAUSE frame
  • always pull but with backpressure
    • tap water (backpressure: turning off a faucet)
    • socket: epoll(7) + read(2)
  • always pull without backpressure
    • WebSocket
    • XMLHttpRequest (response body receiving)

@tyoshino
Copy link
Member

Domenic: Oh, OK. I'll think about revisiting everything. But for now, I'm optimistic about sync read(). Like .feedArrayBuffer(), we could add, say .feedReadToken() to ReadableStream to allow users to manually control how much objects to pull. In the "precise flow control" world I envisioned, these pull/read separated interfaces are the base than the current ReadableStream in the spec.

Current ReadableStream is relieving users from doing this manually by having an automatic token (without size arg) feeder named "strategy". Current ReadableByteStream is relieving users from allocating ArrayBuffers manually by allocating them automatically but with size=underlyingByteSource.readBufferSize. I'd view them as easy to use variants of the base interface, ManualBufferFeed.*.

It looks that the pros/cons of the promise-returning version you've sketched in #253 (comment) are:

  • pros
    • we ask everyone to use the same interface while covering various kinds of underlying sources efficiently
    • pull (read()) and read (fulfillment of the promise) are associated. we can map request-response semantics to it
  • cons
    • more microtasks
    • I'm not fully sure if there're real use cases, but separation of pull and read basically gives users more degree of freedom. This approach loses it since pull and read are associated.

@domenic
Copy link
Member Author

domenic commented Feb 12, 2015

Current ReadableStream is relieving users from doing this manually by having an automatic token (without size arg) feeder named "strategy". Current ReadableByteStream is relieving users from allocating ArrayBuffers manually by allocating them automatically but with size=underlyingByteSource.readBufferSize. I'd view them as easy to use variants of the base interface, ManualBufferFeed.*.

This does make me feel better. If we can make everything fit into one conceptual model, maybe we can even add async read/readInto on top as new methods (named e.g. readWhenReady()) for people who don't want to have to worry about it. Would love to get some more fleshed out examples from you about how to fit this all together. I like the idea of strategy being a special case of a more general feeder concept, as a way of unifying things.

@yutakahirano
Copy link
Member

  • reader.read() returns { value: undefined, done: true }, or { value: view, done: true } for byte streams
  • whenever the reader is released, all pending read requests (which must have come from this reader) get fulfilled with { value: ..., done: true }, and then the list of pending read requests gets cleared.

Do we support a platform which has a BYOB async read but doesn't have a sync cancel-read? If we do, we cannot get back the buffer from the platform until the read ends.

@tyoshino
Copy link
Member

Maybe we say that we can get buffers back not immediately but at some point in the future.

@domenic
Copy link
Member Author

domenic commented Mar 10, 2015

Do we support a platform which has a BYOB async read but doesn't have a sync cancel-read?

Hmm, I think so. At least I am not sure how to cancel an ongoing read(2) call. But I do not know my Unix APIs all that well :-/.

If we do, we cannot get back the buffer from the platform until the read ends.

Yeah, as @tyoshino says, I think it is OK for the promises to take a while to fulfill with { value: view, done: true }, as long as they do eventually.

@domenic
Copy link
Member Author

domenic commented Mar 10, 2015

Question based on some stuff @yutakahirano I have been discussing. And probably related to the last couple posts, now that I re-read them. Given this scenario:

reader.read(view1).then(...);
reader.read(view2).then(...);
reader.releaseLock();

what should happen?

We think releaseLock() should throw if there are pending reads. Because, consider how reader.read(view1) has started a read in the background thread into view1. If we were to let the release succeed, the bytes being read would be skipped by future consumers who get new readers. That seems bad.

GIven that releaseLock throws, what else should happen? We have two good alternatives:

  • The stream and the reader become errored. This means we don't have to worry about anyone getting bytes that have "skipped ahead" since nobody gets any bytes at all, after you screwed up by trying this.
  • Nothing happens: the lock even stays in place. If you in the future wait long enough for the pending read requests to fulfill, you can even do releaseLock() again, and it works!

I am leaning toward the latter now that I write them out. But I wanted to record them.

domenic added a commit that referenced this issue Mar 12, 2015
Based on discussions in #253. The key differences here from the previous async read() commits are:

- ReadableStreams no longer have read() methods directly; those only exist on readers. This drastically simplifies the stream/reader interaction, and also allows the possibility of different types of readers which have different reading behavior.
- read() promises fulfill with { value, done } instead of using an EOS sentinel value. This avoids a number of problems, and also provides a mechanism by which readable byte streams can smuggle out "unused" buffers given to them (using { value: zeroLengthViewOntoBuffer, done: true }).

Another new semantic worth mentioning is that you cannot release a reader if the reader has read()s pending; doing so will throw. This slightly complicates the pipe algorithm in the { preventCancel: true } case.

This commit also adds some new infrastructure for _templated tests_, and ports some portion of the existing tests there. This is our solution for #217 and #264.

Finally, we re-merge all related code into a single readable-stream.js file, as the setup with the three separate files (readable-stream.js, exclusive-stream-reader.js, and readable-stream-abstract-ops.js) was problematic in causing circular dependencies.
domenic added a commit that referenced this issue Mar 12, 2015
Based on discussions in #253. The key differences here from the previous async read() commits are:

- ReadableStreams no longer have read() methods directly; those only exist on readers. This drastically simplifies the stream/reader interaction, and also allows the possibility of different types of readers which have different reading behavior.
- read() promises fulfill with { value, done } instead of using an EOS sentinel value. This avoids a number of problems, and also provides a mechanism by which readable byte streams can smuggle out "unused" buffers given to them (using { value: zeroLengthViewOntoBuffer, done: true }).
- state property is removed (from readable stream)

Another new semantic worth mentioning is that you cannot release a reader if the reader has read()s pending; doing so will throw. This slightly complicates the pipe algorithm in the { preventCancel: true } case.

This commit also adds some new infrastructure for _templated tests_, and ports some portion of the existing tests there. This is our solution for #217 and #264.

Finally, we re-merge all related code into a single readable-stream.js file, as the setup with the three separate files (readable-stream.js, exclusive-stream-reader.js, and readable-stream-abstract-ops.js) was problematic in causing circular dependencies.
@domenic
Copy link
Member Author

domenic commented Mar 12, 2015

Should we allow a readable stream to be cancelled while locked to a reader? I would lean toward "no," except: consider the case where you have set up a pipe chain (which locks the stream). How do you cancel that pipe chain upon e.g. the user navigating to a different page? (You could abort the ultimate destination stream, but we have the same question there: are you allowed to abort a writable stream that is locked to a probably-will-exist-soon exclusive writer?)

domenic added a commit that referenced this issue Mar 12, 2015
This replaces the dual ready + read() approach previously, which was derived from the epoll(7) + read(2) paradigm. In #253 we did a deep dive on real-world implementation strategies for byte streams, and discovered that the ready + read() model causes a conflict with the semantics we want for byte streams. Briefly, because some byte streams will demand to know the size of the buffer they must fill before doing any I/O (the fread(3) model), the byte stream method that reads into a given typed array must be asynchronous. If such byte streams are then to conform to the readable stream interface, with a no-argument read() method that is an auto-allocating version of the read-into method, then read() must also be async, across all readable streams.

This is a usability upgrade for consumers, in many cases. However, for non-byte streams, it potentially costs more microtasks when multiple chunks of data would be available synchronously. We are hopeful that even if current JS engine microtask queues are not as fast as they could be, they will improve over time until this overhead is negligible (which in theory it should be).

Alongside this change, we moved the read() method from the readable stream to the reader (now called "readable stream reader" instead of "exclusive stream reader"). This drastically simplifies the stream/reader interaction, and also allows the possibility of different types of readers which have different reading behavior---again, very helpful for byte streams. The usability impact is also positive, as it guides consumers toward using piping instead of directly reading chunks from the stream.

Note that read() promises fulfill with { value, done } instead of using an EOS sentinel value. This avoids a number of problems (see extensive discussion in #253), and also provides a mechanism by which readable byte streams can smuggle out "unused" buffers given to them, by using { value: zeroLengthViewOntoBuffer, done: true }.

Finally, the state property is now removed (from readable stream), since there is no longer a "waiting" vs. "readable" distinction.

This commit also adds some new infrastructure for _templated tests_, and ports some portion of the existing tests there. This is our solution for #217 and #264.

Note that we re-merged all related code into a single readable-stream.js file, as the setup with the three separate files (readable-stream.js, exclusive-stream-reader.js, and readable-stream-abstract-ops.js) was problematic in causing circular dependencies.
domenic added a commit that referenced this issue Mar 12, 2015
This replaces the dual ready + read() approach previously, which was derived from the epoll(7) + read(2) paradigm. In #253 we did a deep dive on real-world implementation strategies for byte streams, and discovered that the ready + read() model causes a conflict with the semantics we want for byte streams. Briefly, because some byte streams will demand to know the size of the buffer they must fill before doing any I/O (the fread(3) model), the byte stream method that reads into a given typed array must be asynchronous. If such byte streams are then to conform to the readable stream interface, with a no-argument read() method that is an auto-allocating version of the read-into method, then read() must also be async, across all readable streams.

This is a usability upgrade for consumers, in many cases. However, for non-byte streams, it potentially costs more microtasks when multiple chunks of data would be available synchronously. We are hopeful that even if current JS engine microtask queues are not as fast as they could be, they will improve over time until this overhead is negligible (which in theory it should be).

Alongside this change, we moved the read() method from the readable stream to the reader (now called "readable stream reader" instead of "exclusive stream reader"). This drastically simplifies the stream/reader interaction, and also allows the possibility of different types of readers which have different reading behavior---again, very helpful for byte streams. The usability impact is also positive, as it guides consumers toward using piping instead of directly reading chunks from the stream.

Note that read() promises fulfill with { value, done } instead of using an EOS sentinel value. This avoids a number of problems (see extensive discussion in #253), and also provides a mechanism by which readable byte streams can smuggle out "unused" buffers given to them, by using { value: zeroLengthViewOntoBuffer, done: true }.

Finally, the state property is now removed (from readable stream), since there is no longer a "waiting" vs. "readable" distinction.

This commit also adds some new infrastructure for _templated tests_, and ports some portion of the existing tests there. This is our solution for #217 and #264.

Note that we re-merged all related code into a single readable-stream.js file, as the setup with the three separate files (readable-stream.js, exclusive-stream-reader.js, and readable-stream-abstract-ops.js) was problematic in causing circular dependencies.

- Closes #253!
- Closes #266 by simplifying reader-related stuff, removing the problematic `ready` promise, and ensuring that cancel() always returns a new promise instead of reusing [[closedPromise]].
- Closes #291 since the relevant tests have been re-written and the related infrastructure around pull simplified.
- Closes #264 by introducing templated tests.
domenic added a commit that referenced this issue Mar 12, 2015
This replaces the dual ready + read() approach previously, which was derived from the epoll(7) + read(2) paradigm. In #253 we did a deep dive on real-world implementation strategies for byte streams, and discovered that the ready + read() model causes a conflict with the semantics we want for byte streams. Briefly, because some byte streams will demand to know the size of the buffer they must fill before doing any I/O (the fread(3) model), the byte stream method that reads into a given typed array must be asynchronous. If such byte streams are then to conform to the readable stream interface, with a no-argument read() method that is an auto-allocating version of the read-into method, then read() must also be async, across all readable streams.

This is a usability upgrade for consumers, in many cases. However, for non-byte streams, it potentially costs more microtasks when multiple chunks of data would be available synchronously. We are hopeful that even if current JS engine microtask queues are not as fast as they could be, they will improve over time until this overhead is negligible (which in theory it should be).

Alongside this change, we moved the read() method from the readable stream to the reader (now called "readable stream reader" instead of "exclusive stream reader"). This drastically simplifies the stream/reader interaction, and also allows the possibility of different types of readers which have different reading behavior---again, very helpful for byte streams. The usability impact is also positive, as it guides consumers toward using piping instead of directly reading chunks from the stream.

Note that read() promises fulfill with { value, done } instead of using an EOS sentinel value. This avoids a number of problems (see extensive discussion in #253), and also provides a mechanism by which readable byte streams can smuggle out "unused" buffers given to them, by using { value: zeroLengthViewOntoBuffer, done: true }.

Finally, the state property is now removed (from readable stream), since there is no longer a "waiting" vs. "readable" distinction.

This commit also adds some new infrastructure for _templated tests_, and ports some portion of the existing tests there. This is our solution for #217 and #264.

Note that we re-merged all related code into a single readable-stream.js file, as the setup with the three separate files (readable-stream.js, exclusive-stream-reader.js, and readable-stream-abstract-ops.js) was problematic in causing circular dependencies.

- Closes #253!
- Closes #266 by simplifying reader-related stuff, removing the problematic `ready` promise, and ensuring that cancel() always returns a new promise instead of reusing [[closedPromise]].
- Closes #291 since the relevant tests have been re-written and the related infrastructure around pull simplified.
- Closes #264 by introducing templated tests.
@tyoshino
Copy link
Member

What's your opinion about making pipeTo() itself cancellable, Domenic?

@tyoshino
Copy link
Member

What's your opinion about making pipeTo() itself cancellable, Domenic?

Discussed at #297 (comment)

domenic added a commit that referenced this issue Mar 16, 2015
This replaces the dual ready + read() approach previously, which was derived from the epoll(7) + read(2) paradigm. In #253 we did a deep dive on real-world implementation strategies for byte streams, and discovered that the ready + read() model causes a conflict with the semantics we want for byte streams. Briefly, because some byte streams will demand to know the size of the buffer they must fill before doing any I/O (the fread(3) model), the byte stream method that reads into a given typed array must be asynchronous. If such byte streams are then to conform to the readable stream interface, with a no-argument read() method that is an auto-allocating version of the read-into method, then read() must also be async, across all readable streams.

This is a usability upgrade for consumers, in many cases. However, for non-byte streams, it potentially costs more microtasks when multiple chunks of data would be available synchronously. We are hopeful that even if current JS engine microtask queues are not as fast as they could be, they will improve over time until this overhead is negligible (which in theory it should be).

Alongside this change, we moved the read() method from the readable stream to the reader (now called "readable stream reader" instead of "exclusive stream reader"). This drastically simplifies the stream/reader interaction, and also allows the possibility of different types of readers which have different reading behavior---again, very helpful for byte streams. The usability impact is also positive, as it guides consumers toward using piping instead of directly reading chunks from the stream.

Note that read() promises fulfill with { value, done } instead of using an EOS sentinel value. This avoids a number of problems (see extensive discussion in #253), and also provides a mechanism by which readable byte streams can smuggle out "unused" buffers given to them, by using { value: zeroLengthViewOntoBuffer, done: true }.

Finally, the state property is now removed (from readable stream), since there is no longer a "waiting" vs. "readable" distinction.

This commit also adds some new infrastructure for _templated tests_, and ports some portion of the existing tests there. This is our solution for #217 and #264.

Note that we re-merged all related code into a single readable-stream.js file, as the setup with the three separate files (readable-stream.js, exclusive-stream-reader.js, and readable-stream-abstract-ops.js) was problematic in causing circular dependencies.

- Closes #253!
- Closes #266 by simplifying reader-related stuff, removing the problematic `ready` promise, and ensuring that cancel() always returns a new promise instead of reusing [[closedPromise]].
- Closes #291 since the relevant tests have been re-written and the related infrastructure around pull simplified.
- Closes #264 by introducing templated tests.
domenic added a commit that referenced this issue Mar 17, 2015
This replaces the dual ready + read() approach previously, which was derived from the epoll(7) + read(2) paradigm. In #253 we did a deep dive on real-world implementation strategies for byte streams, and discovered that the ready + read() model causes a conflict with the semantics we want for byte streams. Briefly, because some byte streams will demand to know the size of the buffer they must fill before doing any I/O (the fread(3) model), the byte stream method that reads into a given typed array must be asynchronous. If such byte streams are then to conform to the readable stream interface, with a no-argument read() method that is an auto-allocating version of the read-into method, then read() must also be async, across all readable streams.

This is a usability upgrade for consumers, in many cases. However, for non-byte streams, it potentially costs more microtasks when multiple chunks of data would be available synchronously. We are hopeful that even if current JS engine microtask queues are not as fast as they could be, they will improve over time until this overhead is negligible (which in theory it should be).

Alongside this change, we moved the read() method from the readable stream to the reader (now called "readable stream reader" instead of "exclusive stream reader"). This drastically simplifies the stream/reader interaction, and also allows the possibility of different types of readers which have different reading behavior---again, very helpful for byte streams. The usability impact is also positive, as it guides consumers toward using piping instead of directly reading chunks from the stream.

Note that read() promises fulfill with { value, done } instead of using an EOS sentinel value. This avoids a number of problems (see extensive discussion in #253), and also provides a mechanism by which readable byte streams can smuggle out "unused" buffers given to them, by using { value: zeroLengthViewOntoBuffer, done: true }.

Finally, the state property is now removed (from readable stream), since there is no longer a "waiting" vs. "readable" distinction.

This commit also adds some new infrastructure for _templated tests_, and ports some portion of the existing tests there. This is our solution for #217 and #264.

Note that we re-merged all related code into a single readable-stream.js file, as the setup with the three separate files (readable-stream.js, exclusive-stream-reader.js, and readable-stream-abstract-ops.js) was problematic in causing circular dependencies.

- Closes #253!
- Closes #266 by simplifying reader-related stuff, removing the problematic `ready` promise, and ensuring that cancel() always returns a new promise instead of reusing [[closedPromise]].
- Closes #264 by introducing templated tests.
@paulbrucecotton
Copy link

Now that this Issue 253 is close when will "Streams API Binary Extension" be folded into the main Streams spec?

Resolution of W3C MSE bug 27239 depends on this happening:
https://www.w3.org/Bugs/Public/show_bug.cgi?id=27239

/paulc

@domenic
Copy link
Member Author

domenic commented Apr 10, 2015

@paulbrucecotton the meta-bug for ReadableByteStream is #300, in case it helps. @tyoshino and I were just talking the other day about when we could move it into the spec. I want to defer to him for first opinion.

@paulbrucecotton
Copy link

@domenic Thanks for the update. The W3C HTML WG Media Task Force is meeting next week and it would certainly help us to know when the merge is going to happen.
@tyoshino: What is your opinion?

/paulc

@tyoshino
Copy link
Member

@domenic @paulbrucecotton I'd like to work on that asap but have been busy for other issues about the main non-binary-specific one. I'll try to start working on it this week and get something out on streams.spec.whatwg.org next week.

@paulbrucecotton
Copy link

@domenic and @tyoshino:

Can you confirm that ReadableByteStream at https://streams.spec.whatwg.org/#rbs is what the MSE spec should reference for W3C MSE bug 27239:
https://www.w3.org/Bugs/Public/show_bug.cgi?id=27239

/paulc

@domenic
Copy link
Member Author

domenic commented May 7, 2015

@paulbrucecotton sorry for the delay in answering. Yes, indeed that is the corret reference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

7 participants