Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change the model for ReadableStream to have async read() #288

Closed
wants to merge 4 commits into from

Conversation

domenic
Copy link
Member

@domenic domenic commented Feb 20, 2015

This replaces the dual ready + read() approach previously, which was derived from the epoll(7) + read(2) paradigm. In #253, we discussed about how the ready + read() model causes a conflict with the semantics we want for byte streams. Briefly, because some byte streams will demand to know the size of the buffer they must fill before doing any I/O (the fread(3) model), the readInto(arrayBuffer, ...) method for byte streams must be asynchronous. If such byte streams are then to conform to the readable stream interface, with a read() method derived from their readInto() method, then read() must also be async, across all readable streams.

This is a slight usability upgrade for consumers, in some cases. However, it potentially costs more microtasks when multiple chunks of data would be available synchronously.

In the process of updating the tests to reflect async read, they were given a number of small unrelated tweaks (e.g. to wording, or to eliminate some setTimeout(,0)s).

TODO:

  • This commit eliminates ExclusiveStreamReader, but this was done in error based on mistaken assumptions. It will be reversed.
  • Almost none of the spec is updated. Examples.md was updated and the examples in the spec were updated, but none of the algorithms or non-normative notes.

Getting this up there largely so that people can get a sense of how much of a change this is. The changes worth looking at are in lib/readable-stream.js, lib/readable-stream-abstract-ops.js, Examples.md, and index.bs. The rest are tests.

I thought we could get rid of ExclusiveStreamReader since people would be able to get exclusive access to the chunks by just reading them as soon as they are available. However sometimes you want to be able to prevent interference from a stream, but you also want to let chunks accumulate in the internal buffer instead of reading them immediately---for backpressure purposes. So we still need a lock to prevent other consumers from snatching them from you.

I think adding back ExclusiveStreamReader will actually reduce the diff pretty significantly so this is not that big of a change to the non-test files. That is heartening to me. So feel free to put off reviewing for another day. I wanted to get this up here though to show that I am working on the problem and that's why I haven't had much time to review e.g. OperationStream or respond to comments in #253. I will do my best to catch up tomorrow and maybe a bit over the weekend.

This replaces the dual ready + read() approach previously, which was derived from the epoll(7) + read(2) paradigm. In #253, we discussed about how the ready + read() model causes a conflict with the semantics we want for byte streams. Briefly, because some byte streams will demand to know the size of the buffer they must fill before doing any I/O (the fread(3) model), the readInto(arrayBuffer, ...) method for byte streams must be asynchronous. If such byte streams are then to conform to the readable stream interface, with a read() method derived from their readInto() method, then read() must also be async, across all readable streams.

This is a slight usability upgrade for consumers, in some cases. However, it potentially costs more microtasks when multiple chunks of data would be available synchronously.

In the process of updating the tests to reflect async read, they were given a number of small unrelated tweaks (e.g. to wording, or to eliminate some setTimeout(,0)s).

TODO:
- This commit eliminates ExclusiveStreamReader, but this was done in error based on mistaken assumptions. It will be reversed.
- Almost none of the spec is updated. Examples.md was updated and the examples in the spec were updated, but none of the algorithms or non-normative notes.
@domenic
Copy link
Member Author

domenic commented Feb 24, 2015

@tyoshino I am actually having a really hard time getting ExclusiveStreamReader working here. I remember you were able to make it work quite nicely in #277 and was wondering if you saw a similar straightforward solution here. I'm not sure which commit is a better starting place either, 2ee6a77 or 35c3e5b; at first I thought a readPromise would be helpful but now I am not so sure.

@tyoshino
Copy link
Member

What semantics do you plan to give to the locking?

I can come up with ...

  1. getReader() rejects any pending read() operation
  2. getReader() fails if there's pending read() operation

Regarding release:

  1. There's no way to get notified of release of the active reader
  2. Add some rs.state enum item to indicate state of locking. Revive rs.ready to allow notification (this is only-for-get-notified-of-release stuff. basically don't want to have it?)?
  3. Allow the user to call rs.read() but have it wait until release of the reader?

@domenic
Copy link
Member Author

domenic commented Feb 24, 2015

I liked the semantics we have in the current spec, where the stream presents itself as empty while locked, and the reader presents as a closed stream after released.

  1. getReader() rejects any pending read() operation
  2. getReader() fails if there's pending read() operation

This works too right?

  1. getReader() causes any pending read() operation to stay pending until release, at which time it re-polls the queue.

Of these I think I prefer 3 > 1 > 2.

Regarding release:

Of these I think 3 is best and fits well with "the stream presents itself as empty while locked".

@yutakahirano
Copy link
Member

  1. getReader() rejects any pending read() operation
  2. getReader() fails if there's pending read() operation

I think in some platforms we can't cancel an async read operation. With such platforms 1 doesn't work.

@domenic
Copy link
Member Author

domenic commented Feb 24, 2015

We don't need to cancel it though, just redirect the result to the reader's next read() promise instead of the already-returned read() promise from the stream.

@yutakahirano
Copy link
Member

We don't need to cancel it though, just redirect the result to the reader's next read() promise instead of the already-returned read() promise from the stream.

Interesting. Both 1 and 3 redirect pending read data to the reader, and the only difference is whether it rejects the result promise or keep it pending, right?

Either way, detaching the given buffer is necessary.

@domenic
Copy link
Member Author

domenic commented Feb 24, 2015

@yutakahirano oh oops, I hadn't though about how this works with readInto + exclusive readers. Let me reevaluate. If you have given fread() the array buffer, that might be problematic... I need to go stare at your suggestion about detaching more.

@domenic
Copy link
Member Author

domenic commented Feb 24, 2015

OK so.

  • rbs.readInto(ab, offset, size) will use ArrayBuffer.transfer to create a new array buffer, rbsAB, whose contents are transferred from ab. Thus ab is detached. Eventually the returned promise is fulfilled with ab2 (instead of bytesRead). That is pretty nice! Except for the issue mentioned in Support reading bytes into buffers allocated by user code on platforms where only async read is available #253 (comment).
  • const p1 = rbs.readInto(ab, size); const reader = rbs.getReader(); const p2 = reader.readInto(ab2, size2) will use ArrayBuffer.transfer to create a new array buffer, rbsAB, whose contents are transferred from ab. ab is now detached. Eventually the read finishes and rbsAB has bytes in it. Then, under 1, p1 is rejected, whereas under 3, it remains pending. In both cases, do we fulfill p2 with rbsAB? Do we just ignore ab2?

Hmm. Now that we have this detaching-returning-different-buffers thing, maybe async readInto is not the right model, and feedArrayBuffer or setAllocator would make more sense. It feels weird to supply an argument to readInto whose backing memory is used, while the actual array buffer in question gets detached and is no longer useful. Even if we changed the name (readInto no longer makes sense, but e.g. readBytes is fine) it's still an awkward API.

@yutakahirano
Copy link
Member

Discussed with @tyoshino offline.

We think introducing ByteSource and removing ExclusiveStreamReader is the right direction.

class ByteSource {
  // Return an associated readable byte stream.
  stream(option)

  // Read |size| bytes from this source and write it into |sink|.
  // Return a promise that will be fulfilled when the operation is completed.
  pipeBytes(sink, size)
}

While a stream returned by stream() is not closed or pipeBytes() is running, stream() and pipeBytes() will fail.
Consequences:

  • There is no ExclusiveStreamReader. ReadableStream.pipeTo can be defined, but it is completely user-level.
  • Instead, ByteSource.pipeBytes offers the off-thread data transfer.

This greatly simplifies the situation. Regarding fetch, Body.body is a ByteSource.

@domenic
Copy link
Member Author

domenic commented Feb 26, 2015

I really don't like that. pipeTo having exclusive access is a feature that should exist on all streams (byte streams or not). And people should not have to think about whether they're dealing with a byte stream or another stream if they want to just pipe it somewhere else.

It seems like we could accomplish the same simplification if we gave up on having pipeTo be explainable in author code. That was actually the original idea of #241.

@tyoshino
Copy link
Member

Domenic, one question. We designed ExclusiveStreamReader to be almost indistinguishable from ReadableStream. This was IIRC to allow user of the reader not to be aware of that he/she is touching a reader, not the original stream. Does this mean that the reader should also have getReader() to create a nested reader?

@domenic
Copy link
Member Author

domenic commented Feb 26, 2015

I think that's not quite right. Rather, we want users to be able to accomplish their tasks in full without having to switch back to the stream. That is why (in some iterations) we had a closed promise that just delegated to the stream itself. Otherwise anyone writing manual-reading code would have to remember to use the stream for closed and the reader for other operations.

I don't think it needs to be a complete substitute though. So no getReader, and no piping. That is, the purpose of similar API is to reduce cognitive load, not to actually be interchangeable to arbitrary consumers.

@tyoshino
Copy link
Member

OK. So, just my misunderstanding. Thanks for confirmation.

@yutakahirano
Copy link
Member

We are about to introduce a pool mechanism (or pending read ops, if you prefer async read) into ReadableByteStream. We should replicate the mechanism into ExclusiveReader and it will get more and more complex. When a user defines a custom stream, it should provide its own exclusive reader - Is it possible, given that we've had difficulty in doing that?

At the time of #241 (comment), I expected that the problem was easier than it really is.

@tyoshino
Copy link
Member

I recommend that we stop and think about what's needed for the WritableStream to add an ExclusiveStreamWriter to it. It would give us some insight about what states async reading should have for representing feeding aspect in order to support lots of requirements we have but without dragged by reading (result emitting) aspect (thinking about both at the same time leads to confusion).

For example, "waiting" means that you cannot read() on ReadableStream but it doesn't disallow write() on WritableStream. If we add Writer to WritableStream, we cannot tell the user that they cannot call write() by returning state of "waiting". We should add "locked" state.

Regarding the question I placed in #288 (comment), I now prefer (2) for release behavior. I've been feeling that one who called getReader() should be aware of that the stream is locked and should wait until the reader is released. Allowing something like (3) (not only for this point but for any other stuff such as that .closed can be used for getting notified of transition to "close") are huge source of complexity and doesn't benefit user so much, I guess.

I'd just add get released() and "locked" state so that the code that called getReader() and passed the reader to someone else can wait until the lock is released. Radically speaking, I think even get released() is unnecessary and the original caller who obtained the reader should ask the user of the reader to notify of completion of its processing manually.

@domenic
Copy link
Member Author

domenic commented Feb 27, 2015

@yutakahirano

We are about to introduce a pool mechanism (or pending read ops, if you prefer async read) into ReadableByteStream. We should replicate the mechanism into ExclusiveReader and it will get more and more complex.

I don't think it's quite that bad. The design of ExclusiveReader is largely meant to just wrap the logic already present in Readable(Byte)Stream. There shouldn't be any duplication, at least in theory.

When a user defines a custom stream, it should provide its own exclusive reader - Is it possible, given that we've had difficulty in doing that?

I agree this is a concern. Hopefully we can come up with a fairly straightforward template for doing so. Maybe @tyoshino's suggestions can help simplify the model in that direction.

@tyoshino

I recommend that we stop and think about what's needed for the WritableStream to add an ExclusiveStreamWriter to it. It would give us some insight about what states async reading should have for representing feeding aspect in order to support lots of requirements we have but without dragged by reading (result emitting) aspect (thinking about both at the same time leads to confusion).

Good idea, I agree.

For example, "waiting" means that you cannot read() on ReadableStream but it doesn't disallow write() on WritableStream. If we add Writer to WritableStream, we cannot tell the user that they cannot call write() by returning state of "waiting". We should add "locked" state.

Hmm. Writable streams already have so many states. And the states there mean fairly different things from readable streams. So I am a bit reluctant. But maybe adding "locked" to both readable and writable streams would be a good simplification.

Regarding the question I placed in #288 (comment), I now prefer (2) for release behavior. I've been feeling that one who called getReader() should be aware of that the stream is locked and should wait until the reader is released. Allowing something like (3) (not only for this point but for any other stuff such as that .closed can be used for getting notified of transition to "close") are huge source of complexity and doesn't benefit user so much, I guess.

Yeah, I think I agree now---having to try to implement even the closed getter does just make things annoying and complicated.

I am not 100% sure that a "locked" state will be better. But it is worth trying.

I'd just add get released() and "locked" state so that the code that called getReader() and passed the reader to someone else can wait until the lock is released.

Where would you put released---on the reader, or on the stream? I think having it on the reader is pretty OK. For the stream I would like to make ready or wait() (or pull()) notify of all state transitions, including from "locked" to some other unlocked state. But that might also be a source of complexity.

Radically speaking, I think even get released() is unnecessary and the original caller who obtained the reader should ask the user of the reader to notify of completion of its processing manually.

Yeah. My mental model is generally that you should not pass readers around. Once you take a reader, you control the stream until you release it. And if you call a function (like pipeTo) that you expect to exclusively lock the stream, you should probably stop touching the stream until that function signals that it's done---e.g. by returning a promise, like pipeTo does.


What I am mainly getting from this is a few things:

  • Both here and in New proposal for byte stream uses cases #289, we need to make sure the whole system works together. That should ideally include readable streams, writable streams, exclusive readers, exclusive writers, plus byte versions of all four of those. To validate our ideas here I think it will largely suffice to write examples, but they will have to be in-depth examples that consider a variety of underlying sources and sinks and use cases.
  • We need to carefully consider how we can simplify the complexity of ExclusiveReader. It should not require the level of hard work we've had to put in to designing it. As @yutakahirano reminds me, we took a lot of time to get something for ReadableStream, and it was fairly complex; I got stuck trying to do it for async ReadableStream; and we haven't even thought about it yet for ReadableByteStream. It needs to be easier, both for us and for anyone who creates custom streams. One proposal is the "locked" state which might simplify things a lot. To validate our ideas here we'll need to prototype them.
  • We need to figure all this out fast. Chrome would like to ship fetch + streams soon, and Igalia is implementing ReadableStream in WebKit, and Firefox is looking into it. I have faith that we can find the right solution, but we can't keep going back and forth too much. So my approach here is to try to evolve existing stuff, since we have worked out a lot of the bugs in the existing design already.

@tyoshino
Copy link
Member

Where would you put released ...

This was comment about the functionality we have that we can wait for release of the reader by then()-ing rs.ready(). I suggested that if we really want to do that, I'd realize it as rs.released() than rs.ready().

Sorry for not elaborating motivation/context about this proposal much. But this was from analysis about what states we should have for WritableStream with Writer I did yesterday.

Here is a list of states in the form of X -> A, B, C, ... where X is state name or a name of a group of states and A, B, C, ... are the states X may transit to. Sorry but this is written for WritableOperationStream, not WritableStream.

  • "locked" -> available
  • available -> "locked"
    • (write()/close() are allowed) -> "closed", "aborted", "cancelled", "errored"
      • "writable" (backpressure applied) -> "waiting"
      • "waiting" (backpressure not applied) -> "writable"
    • "closed" -> "aborted", "cancelled", "errored"
    • "aborted"
    • "cancelled"
    • "errored"

For convenience, ready on WritableStream is catching all transitions from "waiting". But, transitions that may happen unsolicitedly to the user of the stream are actually a few of them. For WritableOperationStream's case:

  • locked -> something
  • something -> errored
  • something -> writable

I thought it may lead to simplification in general if we have sufficient enough promises for capturing only these transition. Sorry, I cannot give concrete rationale for now. But this is background...

@domenic
Copy link
Member Author

domenic commented Feb 27, 2015

It's helpful background!

@yutakahirano
Copy link
Member

We are about to introduce a pool mechanism (or pending read ops, if you prefer async read) into ReadableByteStream. We should replicate the mechanism into ExclusiveReader and it will get more and more complex.

I don't think it's quite that bad. The design of ExclusiveReader is largely meant to just wrap the logic already present in Readable(Byte)Stream. There shouldn't be any duplication, at least in theory.

It depends on the design: If we choose (2) in #288 (comment), you are right. Otherwise, we should have two separate pools, one for the stream and the other for the active reader. They sometimes interact with each other.

@yutakahirano
Copy link
Member

Sorry, I overlooked that you agreed with (2). Then you're right, the reader can use byte stream's pool when active.

@domenic
Copy link
Member Author

domenic commented Feb 27, 2015

Hmm, I thought there would be a way to do (3) without such work, but maybe I am wrong. It was in the async-read universe anyway which I think we are probably not going to do.

@tyoshino
Copy link
Member

tyoshino commented Mar 3, 2015

pull() + sync read() async read() sync read() + OperationStatus
handling read data easy. for-loop + readableness check promise based. one microtask for each chunk for-loop + readableness check as long as status.state is "complete". Otherwise, need to wait using status.ready. Cumbersome. Complex.
correspondence between pull() and read() less clear clear clear
affinity with locking infrastructure bad good good

source.closed.catch(abortDest);
dest.closed.then(
() => {
if (!closedPurposefully) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is a little unfortunate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. If we reach here, that means the user called pipe after closing dest or dest became closed unsolicitedly under locked situation which is just a bug of dest.

So, ... we need this handling in pipeTo algorithm which must detect any bad things, but under assumption that the dest works correctly and the application code is bug-free, this block is unnecessary to handle streams?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are right. I didn't consider locks on the dest when writing this so if we have those it might be taken care of.

We might even be able to omit this block if the consequences of doing so are not so bad. That is, if the developer has set up a weird situation (like using a duck-typed dest with a getWriter() method that returns a writer that conforms to the superficial API contract but doesn't actually prevent closing from the outside), they get weird results.

We'll just want to see which tests fail when we remove this block to make sure.

@domenic domenic force-pushed the async-read-again branch from 0673906 to bfcc05c Compare March 9, 2015 06:21
@domenic
Copy link
Member Author

domenic commented Mar 9, 2015

In bfcc05c I have uploaded an initial set of examples for the templated tests. (None of them pass or anything, as they are currently testing against a dummy implementation.) Happy to get feedback.

You can already see the templating in action: I am reusing the same tests for an errored stream three times, for three different ways of erroring the stream.

domenic added 2 commits March 12, 2015 12:49
Based on discussions in #253. The key differences here from the previous async read() commits are:

- ReadableStreams no longer have read() methods directly; those only exist on readers. This drastically simplifies the stream/reader interaction, and also allows the possibility of different types of readers which have different reading behavior.
- read() promises fulfill with { value, done } instead of using an EOS sentinel value. This avoids a number of problems, and also provides a mechanism by which readable byte streams can smuggle out "unused" buffers given to them (using { value: zeroLengthViewOntoBuffer, done: true }).
- state property is removed (from readable stream)

Another new semantic worth mentioning is that you cannot release a reader if the reader has read()s pending; doing so will throw. This slightly complicates the pipe algorithm in the { preventCancel: true } case.

This commit also adds some new infrastructure for _templated tests_, and ports some portion of the existing tests there. This is our solution for #217 and #264.

Finally, we re-merge all related code into a single readable-stream.js file, as the setup with the three separate files (readable-stream.js, exclusive-stream-reader.js, and readable-stream-abstract-ops.js) was problematic in causing circular dependencies.
@domenic
Copy link
Member Author

domenic commented Mar 12, 2015

Updated and squashed newest design into a couple commits, but layered them on top of the previous ones so we have those around for reference. All tests passing, including templatized ones. Now to update examples/spec.

@domenic
Copy link
Member Author

domenic commented Mar 12, 2015

Closing in favor of a new clean pull request for the new model :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants