Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What types does ReadableByteStream's reader's read(x) accept and return? #295

Closed
domenic opened this issue Mar 11, 2015 · 32 comments
Closed

Comments

@domenic
Copy link
Member

domenic commented Mar 11, 2015

Per #253 the "bring your own buffer" reader for ReadableByteStream will generally be of the form

byobReader.read(view : ArrayBufferView) -> Promise<ArrayBufferView>

The reason for an ArrayBufferView for the argument is that it allows authors to supply the (buffer, offset, maxLength) trio all at once. The reason for the ArrayBufferView for the return type is that it gives back (buffer, bytesRead), plus the offset if one was supplied, all at once.

For the input, there are a few possibilities:

  • Only allow a specific type: Uint8Array and DataView are the big candidates
  • Allow any array buffer view (including DataView), with nominal typing
  • Allow any { buffer, byteOffset, byteLength } structure; all array buffer views match this, but it would also allow user-supplied structures if they only have a buffer handy (and we could even default byteOffset = 0 and byteLength = buffer.byteLength).
  • Allow any array buffer view (either nominal or structural), but also allow array buffers directly.

I think the most lenient option here is probably best: allow structural array buffer views or array buffers directly. Alternately we could just allow structural array buffer views. The only difference is that the former allows .read(ab) whereas the latter requires .read({ buffer: ab }).

For the return type, I think our choices are: Uint8Array or DataView. Here DataView feels a bit more general---less like "picking favorites". (Why not Int8Array? Why not Uint8ClampedArray?) But it is also less useful, I think---it doesn't have the nice array methods like map/filter/etc., or the typed array methods like set, and it exposes you to endianness concerns (I think)...

I think Uint8Array is best here.

@tyoshino
Copy link
Member

Here's today's snapshot of thin BYOB byte stream reader I've been prototyping.

Some notes

  • currently the tuple returning approach is employed.
  • two different error concepts. fatal and non-fatal. non-fatal errored reader will return the buffers. the fatal one, not.

@tyoshino
Copy link
Member

For the return type,
...
I think Uint8Array is best here.

I understand all of your analysis here.

I prefer Uint8Array and Int8Array to the others (including Uint8Clamped). But yeah, it would be nice if we could also eliminate the implication of sign and byte-wise processing.

It looks beautiful if we had a new concrete type on which only stuffs defined on the ArrayBufferView interface are exposed. I like it, but it doesn't add any practical benefit. I feel that we shouldn't add more type just for purity. If no one has strong opinion, I'd go with Uint8Array to finish the bikeshed :)

As I think we want to allow passing the read data directly to someone expecting ArrayBufferView, returning a tuple of (buffer, offset, size) is not an option.

@tyoshino
Copy link
Member

Is this the right issue to discuss how EOF signal should be returned from reader.read(view)? If we choose to return only a view, we need to represent EOF using any of the value on the ArrayBufferView. Are you fine with the idea that byteLength === 0 is EOF?

Do we want to give the EOF meaning to a view with byteLength === 0 more generally? I mean ByteStreamWriter should handle a view with byteLength === 0 as close? If so, piping will be a little simplified. For normal users, it's not useful, I think.

@domenic
Copy link
Member Author

domenic commented Mar 12, 2015

Wait! I was re-reading my thread on es-discuss about this question and I realized @jorendorff had a great solution that I forgot to list in the OP: just make the return type be the same type of view as gets passed in to the read(view) method!

I guess in that solution we disallow .read(arrayBuffer) directly. (We could still allow .read({ buffer: arrayBuffer }) which is equivalent to .read({ buffer: arrayBuffer, byteOffset: 0, byteLength: arrayBuffer.byteLength, constructor: Uint8Array }), if we wanted. Or we could demand .read({ buffer: arrayBuffer, constructor: Uint8Array }) for explicitness.) What do you think?

Are you fine with the idea that byteLength === 0 is EOF?

We should be able to avoid making this assumption (even though it will always be true, I think) by using the done part of the { value, done } tuple.

@tyoshino
Copy link
Member

I think the most lenient option here is probably best: allow structural array buffer views or array buffers directly. Alternately we could just allow structural array buffer views. The only difference is that the former allows .read(ab) whereas the latter requires .read({ buffer: ab }).

I'd just keep it consistent with other web APIs such as XHR (accepting ArrayBufferView or ArrayBuffer) unless we find any strong benefit in using the structural array buffer view.

@domenic
Copy link
Member Author

domenic commented Mar 12, 2015

Yeah, I am OK with that. (Although maybe we only allow ArrayBufferView if we go with the type-of-input-determines-type-of-output idea.)

@tyoshino
Copy link
Member

just make the return type be the same type of view as gets passed in to the read(view) method!

Is the benefit of that approach that we explicitly pass the underlying buffer than passing the view implying that the whole buffer behind it is subject to detaching? Hmm, I understand. Unlike others such as XHR, the BYOD stream temporarily takes the ownership of the provided buffer. If we really set a high value on that point, we should rather stop accepting ArrayBufferViews directly, I guess. And, ... that's just inconvenient?

Maybe we could establish another view point for ArrayBufferView type arguments:

  • an ArrayBufferView argument passes the ownership of the underlying buffer together with the target range using a single object for convenience.

in addition to the view point we already have, that is:

  • an ArrayBufferView argument shows the contents of the underlying buffer of the specified region to the API.

We should be able to avoid making this assumption (even though it will always be true, I think) by using the done part of the { value, done } tuple.

OK

@tyoshino
Copy link
Member

(Although maybe we only allow ArrayBufferView if we go with the type-of-input-determines-type-of-output idea.)

Makes sense.

@domenic
Copy link
Member Author

domenic commented Mar 12, 2015

an ArrayBufferView argument passes the ownership of the underlying buffer together with the target range using a single object for convenience.

Yes, I definitely agree with this.

@domenic
Copy link
Member Author

domenic commented Mar 16, 2015

OK. I think the conclusion is that it should only accept an ArrayBufferView (not an ArrayBuffer, and not a structural ArrayBufferView, but DataView is allowed) and the promise should fulfill with the same type as was passed in. Agreed?

@tyoshino
Copy link
Member

OK. If we passed Float32Array, Float32Array is returned?

@domenic
Copy link
Member Author

domenic commented Mar 16, 2015

Yep!

@tyoshino
Copy link
Member

Sorry. The consensus about the return type for non-BYOB byte stream is Uint8Array? IIRC, we haven't made conclusion for it explicitly.

@domenic domenic reopened this Mar 19, 2015
@domenic
Copy link
Member Author

domenic commented Mar 19, 2015

I think that is good. I was tempted to say ArrayBuffer... but Uint8Array might allow more flexibility and will probably be more uniform.

@tyoshino
Copy link
Member

Oh, yeah. We don't have to use a view, but maybe having it consistent with BYOB (use ArrayBufferView for both) is beneficial in some points? (Sorry, I just forgot that we could also just use the ArrayBuffer and said Uint8Array :) )

@yutakahirano
Copy link
Member

Oh, yeah. We don't have to use a view, but maybe having it consistent with BYOB (use ArrayBufferView for both) is beneficial in some points?

+1

@domenic
Copy link
Member Author

domenic commented Mar 19, 2015

Yeah, I think the consistency is good. Plus I could imagine a hypothetical scenario where the stream allocates one large ArrayBuffer then hands out smaller Uint8Array views into it as it fills up (all with the same backing buffer). Probably will never happen given multi-threading means lots of transfers anyway, but imagining it helped convince me.

@annevk
Copy link
Member

annevk commented Apr 13, 2015

The pattern across most APIs so far has been to accept either ArrayBufferView/ArrayBuffer as input (now called BufferSource) and output ArrayBuffer (e.g. fetch(), XMLHttpRequest, and WebSocket). The exception is the Encoding Standard (which outputs Uint8Array), but we're not necessarily happy with that.

I'm missing the reason why we want to output a view in this thread. Please forgive my reopening if I missed it.

@annevk annevk reopened this Apr 13, 2015
@domenic
Copy link
Member Author

domenic commented Apr 13, 2015

A view gives the tuple (buffer, bytesRead, offsetIntoBuffer) which is necessary information anyway if you're reusing buffers. (ReadableByteStreams definitely are used for reusing buffers in the BYOB reader case; for the naive reader case they can't right now, but e.g. #324 would give a mechanism for doing so).

In balance an array buffer view seems better than a custom { buffer, bytesRead, offset } tuple (as Jason convinced me on es-discuss). The choice between DataView and Uint8Array is somewhat arbitrary; I thought Uint8Array would be more likely to be useful since it has all those nice array methods.

@annevk annevk closed this as completed Apr 13, 2015
@tyoshino tyoshino reopened this Jun 4, 2015
@tyoshino
Copy link
Member

tyoshino commented Jun 4, 2015

In prototyping, I found that:

  • accepting TypedArrays with non-1 BYTES_PER_ELEMENT, and
  • returning how many bytes have been filled by adjusting the size of the view (multiple of BYTES_PER_ELEMENT)

complicates the library much.

  • I think it's common that we pass ArrayBufferViews of different types to the same ReadableByteStream (E.g. read one Uint32 containing the number of polygons, and then read Float64Arrays of the number * 3). So, we should support this kind of use case.
    • Then, in what data type should the view passed to the underlying byte source? As-is? Maybe ok. It's the underlying byte source's responsibility to fill the view passed through the pullInto interface and call controller.respond() with the same TypedArray with an adjusted viewed range.
  • But, how about controller.enqueue() (see ReadableByteStream: should support an internal queue #353)? controller.enqueue() may have been called with insufficient bytes. E.g. Float64Array of 1 element is passed to the BYOB reader after just one enqueue() with 2 bytes as Uint8Array was done by the underlying byte source?
    • One simple solution is to error the stream when the underlying byte source enqueues a chunk that isn't aligned to match the data type the consume expects (as represented by the TypedArray type passed to the BYOB reader).
      • I think we shouldn't do this. E.g. TCP socket may deliver chunks cut at arbitrary points, even if the server sends aligned data.
    • So, what we should do for the case "2 bytes in queue, a Float64Array came". Should we do
      1. Call pullInto with new Uint8Array(view.buffer, 2, 6)?
      2. Call pullInto with a tuple of view, 2?
      3. Wait for enqueue()s enough to fill at least one element in the view?

@tyoshino
Copy link
Member

tyoshino commented Jun 4, 2015

I also considered returning (fulfill the returned promise with) a tuple of the view and the number of filled bytes.

But, ... if the consumer wants to fill only part (not aligned to the boundaries of elements) of a TypedArray with non-1 BYTES_PER_ELEMENT, they can still choose to extract the underlying buffer and pass wrapped with a Uint8Array to the BYOB interface. So, maybe, it makes more sense to give a functionality of aligned filling (i.e. returns only when at least one element is fully filled, and with no partially filled elements) to invocation of read(view) with a TypedArray with non-1 BYTES_PER_ELEMENT.

@domenic
Copy link
Member Author

domenic commented Jun 4, 2015

I see. This is a tough one. Hmm.

Then, in what data type should the view passed to the underlying byte source? As-is? Maybe ok. It's the underlying byte source's responsibility to fill the view passed through the pullInto interface and call controller.respond() with the same TypedArray with an adjusted viewed range.

Hmm. At first I thought as-is would be reasonable. But I am not sure underlying sources should have to care about this---it would probably be inconvenient for them, and they would always just unwrap it into an ArrayBuffer and/or a Uint8Array or DataView. Maybe underlying sources only get Uint8Arrays? See below.

I think we shouldn't do this. E.g. TCP socket may deliver chunks cut at arbitrary points, even if the server sends aligned data.

Agreed.


Here are three major possibilities:

  1. Require Uint8Array usage everywhere. Don't even accept other types as arguments. (I guess we could allow DataView... meh.) This puts the burden on the user. They have to say "hmm, I got back 2 bytes; that is not enough for me to extract and create a Float64 array, so I will call read() again using a view that is 2 bytes further into the backing ArrayBuffer."
    • Pro: gives control back to the user ASAP, without any long trips into I/O land.
    • Pro: will never need to error due to misaligned bytes, even at EOF.
    • Con: annoying for users who have to assemble larger structures more manually. (Especially since their ArrayBuffer objects will keep getting detached, and they have to retrieve new ones via returnedView.buffer.)
  2. Do our best to always fill the entire supplied view. So let's say in your case we have: controller.enqueue(new Uint8Array([1, 2])), then const view = new Float64Array(2); rbs.read(view). The implementation then copies the two bytes (1 and 2) into the passed view, then calls ubs.pullInto(new Uint8Array(view.buffer, 2, 14)). If pullInto responds with fewer than 14 bytes, it keeps calling pullInto with the appropriate range, until eventually all 16 total bytes (2 filled from queue plus 14 filled from pullInto) are filled. Only then does the promise returned by rbs.read(view) fulfill.
    • Pro: less annoying for users. Note how the only time you will get back fewer bytes than you requested is EOF.
    • Con: does more work behind the scenes, instead of giving full low-level control. A single call to read() could cause multiple I/O operations.
    • Con: if we get a request for a Float64Array but there are fewer than 8 bytes left in the file, we have to error (or do something weird like return a Uint8Array instead).
  3. Do our best to align to BYTES_PER_ELEMENT. Similar to (2), but we would go for 6 more bytes instead of 14 in that case.
    • Pro: if the user passes in a single-byte view, they get full control, and will not cause multiple I/O operations (obviating the above Con).
    • Con: if the users passes in a non-single-byte view, this could do more work behind the scenes, instead of giving full low-level control. A single call to read() could cause multiple I/O operations. (But, not as much as (2) would.)
    • Con: a bit annoying for users who really want to read a certain number of bytes.
    • Con: if we get a request for a Float64Array but there are fewer than 8 bytes left in the file, we have to error (or do something weird like return a Uint8Array instead).

What do you think of this analysis? It sounds like #295 (comment) is leaning more toward 2 or 3. Both seem kind of nice, although I am worried about how they are brittle in the face of EOF. I also imagine that they add a lot more complication than 1 does.

Maybe 1 is good enough? It is readable byte stream after all, not readable float64 stream.

@tyoshino
Copy link
Member

tyoshino commented Jun 5, 2015

(2) sounds good but it disables ability to issue multiple pullInto requests. E.g. suppose we got pullInto0 of 8 byte and pullInto1 of 8 byte. If we controller.respond(2), then the stream wants to re-issue pullInto to continue filling the first view, but the underlying byte source already received pullInto1. So, we need to refrain from issuing multiple pullInto to the underlying byte source.

@tyoshino
Copy link
Member

tyoshino commented Jun 5, 2015

Maybe, multiple pullInto() is not something we want to support by the default controller library. I'll make only one pullInto issued at a time.

@domenic
Copy link
Member Author

domenic commented Jun 5, 2015

Oh, yeah, I'd always been kind of assuming only a single pullInto at a time instead of allowing multiple concurrent. When thinking about read(2) and friends it seems like multiple pullIntos could be bad, but please correct me if I'm wrong.

@tyoshino
Copy link
Member

tyoshino commented Jun 5, 2015

@tyoshino
Copy link
Member

Reference implementation in https://github.com/whatwg/streams/tree/bytestream is now ready for review.

Current implementation is kinda combination of (2) and (3) of domenic's idea.


When getByobReader() and read(view) is called, underlyingByteSource.pullInto() is called.

When getReader() and read() is called, underlyingByteSource.pull() is called.

As far as the stream is readable and controller.close() has not yet been called, the underlyingByteSource can call controller.enqueue().

If there's pending pullInto call, the underlyingByteSource can call controller.respond().


If byobReader.read(new Float64Array(2)) is called when:

  • 1 byte queued -> fill the first byte, and repeat pullInto() with the not-yet-filled region
  • 8 byte queued -> fulfill with 1-element filled
  • 9 byte queued -> fulfill with 1-element filled, and then keep the remaining 1 byte queued
  • 16 byte queued -> fulfill whole view
  • 17 byte queued -> fulfill and keep the remaining 1 byte queued

When there is a pending read request of new Float64Array(2) with 0 byte filled:

  • if controller.respond(1) is called -> fill the first byte and repeat calling pullInto
  • if controller.respond(8) is called -> fulfill with 1-element
  • if controller.close() is called -> close the stream. as the buffer is currently owned by the underlying byte source, wait for respond() call. when respond() call happens, fulfill the pending request with the returned buffer and done of true.

When there is a pending read request of new Float64Array(2) with 1 byte filled:

  • if controller.respond(1) is called -> fill the second byte and repeat calling pullInto
  • if controller.respond(7) is called -> fulfill with 1-element
  • if controller.close() is called -> error the stream and throw

When to call pull or pullInto again:

  • if we're inside pull or pullInto, instruct the pull caller or pullInto caller to repeat pull or pullInto if needed
  • otherwise:
    • if it's a result of invocation of a method on the controller, queue a microtask to run pull or pullInto later if needed
    • otherwise, i.e. it's a result of read(), read(view) on a reader, run pull or pullInto immediately and repeat if needed.

@domenic
Copy link
Member Author

domenic commented Jun 11, 2015

Awesome! I'll do a PR so I can do some line comments.

When getByobReader() and read(view) is called, underlyingByteSource.pullInto() is called.

When getReader() and read() is called, underlyingByteSource.pull() is called.

What is the motivation for having pull() at all, instead of just using pullInto()? I imagine there is one and I just forgot it. Let's be sure to write it down, either in FAQ.md or in the spec or something?

If byobReader.read(new Float64Array(2)) is called when:

  • ...

When there is a pending read request of new Float64Array(2) with 0 byte filled:

  • ...

These sound pretty great. So to be clear, pullInto() always gets a Uint8Array? I like that.

if controller.close() is called -> close the stream. as the buffer is currently owned by the underlying byte source, wait for respond() call. when respond() call happens, fulfill the pending request with the returned buffer and done of true.

This one is a big strange. I am not sure I understand the reasoning. Maybe instead we should just give the buffer back as a zero-length view?

@tyoshino
Copy link
Member

What is the motivation for having pull() at all, instead of just using pullInto()? I imagine there is one and I just forgot it. Let's be sure to write it down, either in FAQ.md or in the spec or something?

pullInto(buffer, offset, length) passes the buffer given by the consumer while pull() just tells that a non-BYOB reader attached received a .read() call. In this (.read()) case, there's no buffer to pass to the underlying byte source.

Create #363 for reminder.

These sound pretty great. So to be clear, pullInto() always gets a Uint8Array? I like that.

Yes. buffer, offset and length.

This one is a big strange. I am not sure I understand the reasoning. Maybe instead we should just give the buffer back as a zero-length view?

Hmm, which point? We require the argument of respond() to be 0. The buffer returned by the respond() call is returned to the consumer with the byteLength set to 0, yes.

@domenic
Copy link
Member Author

domenic commented Jun 16, 2015

Hmm, which point? We require the argument of respond() to be 0. The buffer returned by the respond() call is returned to the consumer with the byteLength set to 0, yes.

I guess I kind of was thinking that close() should automatically do respond(0) for you. But actually that is perhaps not a good idea. Nevermind!

@tyoshino
Copy link
Member

I guess I kind of was thinking that close() should automatically do respond(0) for you. But actually that is perhaps not a good idea. Nevermind!

Yeah, we could. But it's possible if the underlying byte source wants to issue controller.respond(0). If the underlying byte source has detached the given buffer, and then noticed that it's done, it has to issue controller.close() and then controller.respond(0, buffer) with the active buffer reference.

@domenic
Copy link
Member Author

domenic commented Jun 18, 2015

OK, I think we got this tied off; closing again.

@domenic domenic closed this as completed Jun 18, 2015
tyoshino added a commit that referenced this issue Aug 5, 2015
- For now, most of the spec side texts are removed. To be added by a
  separate patch
  - Added some abstract operations on typed arrays to be used by the readable
    byte stream algorithms
- Added ReadableByteStreamByobReader
- The underlying byte source instance is now owned by the controller
- The interface exposed to the underlying byte source has been revised to have
  less constraints
- Handles typed arrays with multibyte elements as discussed in #295
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

4 participants