Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backpressure on fetch integrated with Streams #452

Closed
yutakahirano opened this issue Sep 10, 2014 · 121 comments
Closed

Backpressure on fetch integrated with Streams #452

yutakahirano opened this issue Sep 10, 2014 · 121 comments

Comments

@yutakahirano
Copy link
Contributor

Imagine ReadaleStream is integrated with fetch. The most naive way is to add stream() method to Body interface.

interface Body {
    ...
  Promise<ReadableStream> stream();
};

Streams is an API that enables the data producer to know that the consumer doesn't want the data and stop producing data. On the other hand, other data types (text, blob, ...) don't have such feature.
My question is, should the loader start loading body while the Body representation is undetermined?

fetch(request).then(function(res) {
  return wait(longtime).then(function() {
    return res.stream();
  });
}).then(function(stream) {
  // Here we have a readable stream.
});

The above code waits longtime and then demonstrates that it wants to get the body data via ReadableStream interface. Should the loader load the body data while waiting and use the internal buffer unlimitedly, or stop loading while the representation is undetermined?

We can provide that information explicitly when calling fetch, but I don't know if it's a good API.

@yutakahirano
Copy link
Contributor Author

I'm not sure this is the right place to discuss this issue, but I chose here because it could affect the fetch interface.

@yutakahirano
Copy link
Contributor Author

@annevk @tyoshino

@annevk
Copy link
Member

annevk commented Sep 10, 2014

It would be good to get input from @domenic.

Note that I expect we would not expose a Stream through a promise. I would expect it to be returned from response.body synchronously. And then if you use one of the methods such as json() it would no longer work or not work predictably.

@domenic
Copy link
Contributor

domenic commented Sep 14, 2014

To explain @annevk's comment a bit more, I believe our thinking was that body would be an instance of RequestBodyStream, which is a ReadableStream subclass that also has methods like json() etc. The .stream() idea doesn't seem bad on first glance, but I haven't thought about it too hard; in what follows I'll assume we're going with RequestBodyStream design.

(Conceptually, .json()s implementation would look roughly like the readableStreamToArray example, except it would decode and concat the chunks into a string instead of putting them in an array, and it would do JSON.parse before returning the result. In reality, it would probably be done much more efficiently, e.g. pre-allocate an ArrayBuffer, read all the data into it, and then decode all at once.)

My expectation would then be that upon fetch() being called, the body created and data is buffered internally (or left in the kernel buffer, perhaps?) up to the stream's high water mark (HWM). Thus, if the user never calls .json() or .read() or .pipeTo(...) or anything else that would read the stream, the maximum memory consumed is equal to max(HWM, Content-Length). (Or perhaps slightly above the HWM if backpressure cannot be exerted fast enough.) If the user later calls .json(), it un-pauses the stream until all the data is read and converted into JSON. Similarly if they call .read(), they consume buffered chunks, until they consume all the chunks that were buffered at which point the stream becomes "waiting" and they need to call .wait().

The choice of HWM could in theory be given to users (e.g. as an option to fetch()), but in practice in Node.js we see that nobody uses such a capability and they just accept the default. Node.js chooses a default of 16 MiB, but implementations could use heuristics (e.g. use a lower number on mobile phones). A HWM of 0 would be fine too.

/cc @tyoshino in case there's something I am forgetting

@annevk
Copy link
Member

annevk commented Sep 15, 2014

@domenic json() and friends moved directly to Response/Request. So body would be a "pure" stream object I think.

@yutakahirano
Copy link
Contributor Author

Thank you. My understanding is the following - Are those right?

  • Request / Response have 'body' property of type ReadableStream.
  • When json(), text(), etc are called the stream gets closed.
  • When read() is called on the stream bodyUsed becomes true.

@annevk
Copy link
Member

annevk commented Sep 18, 2014

Yeah, although I'm not sure if we can make the third bullet point work if we keep the stream object pure. Perhaps through some temporary observer? But even that would have to be synchronous for bodyUsed to return the correct value...

@domenic
Copy link
Contributor

domenic commented Sep 18, 2014

The third bullet point is definitely achievable since we'd control the creation of the stream. E.g.

var that = this;
this.body = new ReadableByteStream({
  readInto(/* ... */) {
    // actual work
    that.bodyUsed = true;
  }
});

@tyoshino
Copy link
Contributor

@domenic #452 (comment)

max(HWM, Content-Length)

min?

The proposed back pressure mechanism looks correct. Regardless of where json() etc. are placed, we want to just start receiving and buffer some amount of data to reduce latency. The amount would be limited by some "high water mark". The buffering could be implemented not by ReadableByteStream but some platform (e.g. Blink) specific buffering library.

third bullet point

I agree that it works.

To simplify implementation, we would want to use ReadableByteStream not only for the body property but also as a backend for json(), etc. as @domenic proposed.

But even if we adopt this approach, it seems we can still choose to determine the body reading method to provide to the user at access on body property. Not on body.read() call.

  1. The body attribute's getter must run the steps below:
    1. If the used flag is set, return null.
    2. Set used flag.
    3. Return a ReadableByteStream representing the associated body.

Unless we want to allow the user to investigate body before deciding what body reading method to use. I guess there's no such need.

@tyoshino
Copy link
Contributor

Fixed to return the ReadableByteStream for the second and later evaluation on the body property once we choose to use the stream.

Objects implementing the Body ... an associated byteStream (initially set to null), ...

  1. If byteStream is not null, return byteStream.
  2. If the used flag is set, return null.
  3. Set used flag.
  4. Set byteStream to a ReadableByteStream representing the associated body.
  5. Return byteStream.

@annevk
Copy link
Member

annevk commented Sep 19, 2014

@tyoshino I guess that could work, but perhaps we should make it a method then given all the side effects. But instead of returning a promise it would be synchronous. Not sure though.

@tyoshino
Copy link
Contributor

make it a method

@annevk Sounds good

@domenic
Copy link
Contributor

domenic commented Sep 19, 2014

I don't really like the side-effecting method. I'd rather just go with the approach from #452 (comment). It also makes the name bodyUsed a lie as it's more someoneOnceAccessedTheBodyPropertyForUnknownPurposes ;)

@yutakahirano
Copy link
Contributor Author

Thanks, I will write a draft for the integration. @tyoshino's proposal looks fine. @domenic, can you tell me why you don't like it?

@domenic
Copy link
Contributor

domenic commented Oct 21, 2014

@yutakahirano because it's more complex (both in internal code and user-facing code/concepts) than #452 (comment) and doesn't gain anything.

@tyoshino
Copy link
Contributor

@domenic

I see. json() and text() are only-once methods. So, we don't have to introduce a little complicated algorithm like #452 (comment).

In your idea, bodyUsed is set when json(), text() or any ReadableStream method is called, and are used to prohibit json() and text() but it must not disable the ReadableStream methods (we'll read() many times). Right? Do we want to prohibit touching ReadableStream methods as well when json() or text() takes place first? Then, we need something else than bodyUsed. Otherwise, we don't (bodyUsed suffices).

@domenic
Copy link
Contributor

domenic commented Oct 27, 2014

Do we want to prohibit touching ReadableStream methods as well when json() or text() takes place first?

Nah, if you do that you'll get what you asked for. (A mess.)

@tyoshino
Copy link
Contributor

Right. To choose not to do that, I want to give a justification supporting that we're putting a guard against json()-after-read() but not against read()-after-json(). Both of them look unexpected usage. json() would be pumping on the stream. It sounds good to give json() exclusive access to the stream.

@yutakahirano said that it might be useful to keep cancel() available even after json() call as it allows us to abort json(). Is this included in your motivation to keep ReadableStream methods available after json() call?

Regarding guard against read()-after-json(), alternative proposed by @yutakahirano is making stream() only-once method. It's cumbersome that you need to save the returned stream to some variable. But we can reduce the complexity of algorithm inside.

@domenic
Copy link
Contributor

domenic commented Oct 28, 2014

We don't really have any concept of "exclusive access to a stream." If you have a stream object, you can read from it; this is similar to a file handle object. it would be possible to introduce a library or subclass that adds this concept, but I would prefer to let libraries prove the worth of that idea and if we see everyone building such tooling them consider standardizing it.

.cancel() is indeed a good reason to keep ReadableStream methods available.


That said, this kind of situation where a "C++ reader" and a "JS reader" might share a stream seems very reminiscent of the off-main-thread piping question, whatwg/streams#97. In fact you can view it as a subset of that if you define .json() using a concat-stream equivalent:

Response.prototype.json = function () {
  var concatenator = new ConcatStream();
  this.body.pipeTo(concatenator);
  return concatenator.asArrayBuffer.then(convertToString).then(JSON.parse);
};

(npm concat-stream is for Node streams and uses a callback; our hypothetical version uses a promise-returning .asArrayBuffer property.)

I will open a new issue on whatwg/streams and consider if maybe we want to lock the stream during piping.

@tyoshino
Copy link
Contributor

We don't really have any concept of "exclusive access to a stream."

By "exclusive", I meant exclusive between arrayBuffer(), blob(), ... text() and direct access on the ReadableStream interface. I didn't mean exclusiveness between consumers who see the ReadableStream interface.

My idea in #452 (comment) realizes this by hiding the stream object itself when json(), etc. is called. The script can never touch any method of the ReadableStream interface. We can read from it only by waiting for the promise returned by json(), etc. which he/she called gets fulfilled.

.cancel() is ...

OK

... new issue ...

Good

@yutakahirano
Copy link
Contributor Author

It seems that whatwg/streams#241 allows "read-and-then-pipeTo", but you don't want to allow "read-and-then-call-json" in this issue, right?
The code in #452 (comment) allows that, I think.

With stream() method, we can solve the problem, though.

@domenic
Copy link
Contributor

domenic commented Oct 29, 2014

We definitely want to allow read and then call json(), e.g. for a file format that has descriptive headers then JSON content

@tyoshino
Copy link
Contributor

We definitely want to allow read and then call json(), e.g. for a file format that has descriptive headers then JSON content

Yeah, it's useful. But now there's no way to tell the stream how big the header is. Using ReadableByteStream.prototype.readInto()? Then, don't we want to provide json(num_bytes) than just jsonUntilEndOfStream() which we have now? Or, readOneJsonThenStopReading()?

@yutakahirano
Copy link
Contributor Author

We definitely want to allow read and then call json(), e.g. for a file format that has descriptive headers then JSON content

Oh, sorry, I mistook your intention, but it is a bit different from what I said at #452 (comment) .

And for your use case, we need read(size) which we don't have now.

@domenic
Copy link
Contributor

domenic commented Oct 29, 2014

You would use readInto, yes. Not sure about the utility of json(num_bytes). In my view json() is just a stopgap until we have real streams, and/or a convenience for the 80% case.

@yutakahirano
Copy link
Contributor Author

What is "readInto"? Is it part of the Streams API?

@tyoshino
Copy link
Contributor

I got some offline comment about politically incorrectness of naming. I've renamed it to "body passed".

@tyoshino
Copy link
Contributor

Copied #452 (comment) to
https://github.com/tyoshino/streams_integration/blob/master/FetchBodyPrecondition.md and added two more ideas (A)' and (C). Domenic, which is depicting your idea?

@tyoshino
Copy link
Contributor

Sorry. (A)' was not depicting Domenic's #452 (comment). Added (A)'' which should be depicting it more correctly.

@tyoshino
Copy link
Contributor

BTW, Yutaka questioned what we should do with the ReadableStream when e.g. .text() fails due to invalid UTF-8 data. We came up with the following options:

  1. stays "readable" with unconsumed data (or stays "closed" if the UTF-8 decoder has already drained all bytes)
  2. becomes "closed" after draining unconsumed data
  3. becomes "errored"

@domenic
Copy link
Contributor

domenic commented Jan 30, 2015

I'd like to confirm if your opinion is the same as Anne regarding the following question:

fetch(req), new Request(req) and cache.put(req, res) fails if any of fetch(req), new Request(req) and cache.put(req, res) has happened, even after the operation is finished.

I agree with this.

Domenic, which is depicting your idea?

(A)'' depicts my idea.

BTW, Yutaka questioned what we should do with the ReadableStream when e.g. .text() fails due to invalid UTF-8 data. We came up with the following options:

I think 1 or 2 are reasonable and I don't have a strong preference between them. 1 would be text() doing nothing, and 2 would be text() calling this.body.cancel(new TypeError("couldn't decode chunk!")). Maybe since we already think of text() as saying "I want all of the remaining data", 2 is best.

@annevk
Copy link
Member

annevk commented Jan 31, 2015

utf-8 would not suddenly fail to decode if you hit a trailing byte. You would just get a U+FFFD. I don't think we want to use a fatal decoder.

@tyoshino
Copy link
Contributor

OK. So, text() uses a replacement mode decoder, so it doesn't fail halfway. In json(), we may detect a fatal error in the middle of the data. Is such a case, maybe we could call cancel() as Domenic suggested. Similar handling for formData(), too?

@annevk
Copy link
Member

annevk commented Jan 31, 2015

I'm not sure how multipart/form-data is normally processed. I guess we don't parse it ourselves normally so making it a fatal error is fine if we keep doing that consistently.

@yutakahirano
Copy link
Contributor Author

Just a nitpick, but you don't have to have an entry for fetch(req) in https://github.com/tyoshino/streams_integration/blob/master/FetchBodyPrecondition.md. It is included in new Request(req) case (at least in the current spec, and it seems nobody in this thread wants to change that).

@tyoshino
Copy link
Contributor

tyoshino commented Feb 3, 2015

Yutaka: Thanks. Done tyoshino/streams_integration@723bb07

@tyoshino
Copy link
Contributor

tyoshino commented Feb 3, 2015

Thanks, Anne, Domenic. Let's have json(), etc. cancel() the stream on any fatal error.


Re: Domenic (#452 (comment)),

OK. Let's proceed with that plan. I've moved (A)'' to the top in https://github.com/tyoshino/streams_integration/blob/master/FetchBodyPrecondition.md.

@annevk
Copy link
Member

annevk commented Feb 3, 2015

Except text(), right? That is, it should not cancel() on incorrect bytes.

@tyoshino
Copy link
Contributor

tyoshino commented Feb 3, 2015

Yes. As you said, text() just replaces incorrect bytes, so it never fails and therefore is never required to do cancel().

@tyoshino
Copy link
Contributor

tyoshino commented Feb 3, 2015

s/never fails/never fails as long as body doesn't become `"errored"/

@yutakahirano
Copy link
Contributor Author

Thank you @tyoshino and others, I will update the draft.

@wanderview
Copy link
Member

What is the type of the "chunk" provided by Fetch body ReadableStream objects? Is it an ArrayBuffer? The streams spec says a chunk can be of any type, but we should probably define what Fetch body streams return explicitly. (Sorry if this is already defined and I missed it. Its a bit hard to follow the overall effort.)

@yutakahirano
Copy link
Contributor Author

Yes, read() returns an ArrayBuffer. I will describe that in the draft.

yutakahirano added a commit to yutakahirano/fetch-with-streams that referenced this issue Feb 12, 2015
Used 'passed flag' instead of 'used flag' as discussed
at w3c/ServiceWorker#452.
yutakahirano added a commit to yutakahirano/fetch-with-streams that referenced this issue Mar 25, 2015
Used 'passed flag' instead of 'used flag' as discussed
at w3c/ServiceWorker#452.
@yutakahirano
Copy link
Contributor Author

Hi!

Sorry that it took so long, but I updated the draft.
Please let me know if it has any errors regarding bodyUsed, passed flag, locked flag, etc.

@annevk
Copy link
Member

annevk commented Apr 5, 2015

I suggest any further review is done in that repository until the merge happens (at which point we should switch to https://github.com/whatwg/fetch I think for anything new).

@xgqfrms-GitHub
Copy link

TypeError: body stream already read at FetchDatas.fetch.then ???

what.s wrong with this? Anybody can help?

promise-bug

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants