-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uploading: write to a writable stream, or pass in a readable stream? #30
Comments
Let's think of uploading chunks of raw bytes manually. I don't want to write code (create a ReadableStream and enqueue data to the ReadableStream) to pretend to be a ReadableStream. I'd rather want the Request object to expose some simple API using which I can write the chunks. That's the WritableByteStream API. We also need a method to upload data from a ReadableStream. We can have a separate API surface that accepts a ReadableStream and consumes data to upload from it. But we can represent that as I agree with Ben about the point that a Request is not always wrapping a socket, etc.
|
We could make this better by making the underlying source more API user friendly by using a Promise, and reduce the code to construct a ReadableStream (define an underlying source class and call new ReadableStream()). Then, that's the same as exposing the WritableStream on a Request. |
I also like the idea that passing a ReadableStream to the Constructor means we have exactly one place in the API for specifying the body a Request/Response. I thought we already had a way to convert a push source to a ReadableStream. Is that not adequate here for some reason? |
I think that's solvable independently. It does seem like we eventually landed on overloading
Well, that mechanism is generally used for wrapping lower-level things, and not supposed to be used by normal authors, but instead by library writers who are creating custom streams to wrap "legacy" push sources. (Similar to library writers creating promises to wrap "legacy" callback code.) As @tyoshino mentions, the underlying source API is not as well-suited as the writable stream API. For example when you enqueue something from an underlying source into a readable stream, that can never fail, so it just returns |
@domenic Can you write some example code showing where passing a ReadableStream to the Constructor is difficult? I don't think I understand the case you are concerned with. Another nice thing about passing ReadableStream to Constructor is that its clear fixed bodies are immutable. For example, it avoids this kind of confusion:
|
@wanderview |
@wanderview sure. Here's one scenario: I want to upload With a writable stream, this is easy: fetch('https://example.com', {
body(ws) {
ws.write(chunk1).then(() => updateProgress(1/3));
ws.write(chunk2).then(() => updateProgress(2/3));
ws.write(chunk3).then(() => updateProgress(3/3));
ws.closed.then(showUploadComplete, showUploadFailed);
// NB: we will probably have writers in the final writable stream API.
// Ignoring them for now.
}
}); I don't see how you would do this with the pass-in-a-readable stream version.
I don't think |
Does var req = new Request("https://example.com", {
body(ws) { ... }
})
fetch(req) also work? |
@domenic isn't this equivalent to your example?
Slightly more code, but you get back-pressure. The problem with just exposing WritableStream is it makes is more likely back pressure is forgotten. So it seems no matter what we end up with the concept of a pipe here. An object with ReadableStream and WritableStream interfaces with a buffer. Either we can have Request instantiate the pipe and then expose the writer to the creator of the Request, or we can have the creator instantiate the pipe and pass the ReadableStream. I guess I favor having the script instantiate the pipe and pass the ReadableStream because you can then avoid the whole pipe/buffer if you already have a ReadableStream. A pipe mock up:
|
@wanderview no. There's no way know when the chunk is finished uploading. Your updateProgress will fire too fast. The pipe example has the same problem. The writes will complete as soon as the readable stream accepts the chunk. Whereas, we want to know when the HTTP stack has accepted the chunk. There's no way to extract that information, since the code that reads from |
@annevk yeah, it should. |
In other word, WritableStream-UnderlyingSink pair communicates data and ack while ReadableStream-UnderlyingSource pair communicates only data. In whatwg/streams#287, I was prototyping an interface named OperationStream which has WritableStream and modified ReadableStream which allows the consumer to acknowledge completion of consumption of a data read from the ReadableStream. Especially, this will be problematic when we want to reuse a buffer passed to the sink. As current ReadableStream-UnderlyingSource pair cannot tell when the sink has done consuming all the bytes in the given ArrayBuffer, so the underlying source cannot return the buffer to the user for reuse. |
I don't see how you can accomplish this with a Request object that can be passed anywhere (fetch, cache, random script). All of these consumers are just going to be reading the body as a stream. Unless you are suggesting fetch should do something magical that is not represented in the stream API? |
@wanderview, that's a good point. As I said in #30 (comment), the sink (cache, fetch, etc.) cannot tell the Request to propagate successful completion of writing/sending/etc. via |
At a conceptual level, I'm not sure it makes sense to try to make a source stream express downstream progress of its consumer. It feels like trying to force a square peg into a round hole to me. I think something like this is a better fit:
|
That wasn't how I envisioned this.
No, not at all. Here, let me demonstrate one quickly-hacked-together implementation that should do the trick: if (typeof requestInit.body === "function") {
let resolveIsStarted;
const isStarted = new Promise(resolve => resolveIsStarted = resolve);
let dest;
const bodyWS = new WritableStream({
start() { return isStarted; },
write(chunk) {
return isStarted.then(() => {
if (dest === "cache") {
return write(cacheFD, chunk);
}
if (dest === "upload") {
return send(socketFD, chunk);
}
});
}
});
requestInit.body(bodyWS);
}
// adding to a cache: set `dest = "cache"` and call `resolveIsStarted()`
// uploading: set `dest = "upload"` and call `resolveIsStarted()` This kind of adapter is what I was getting at with my OP. This doesn't handle letting script read from const bodyRS = new ReadableStream({
pull() {
dest = "readablestream";
resolveIsStarted();
}
}); and then add another Re the last post: at a conceptual level, I don't see how it makes sense to fit the square peg of readable streams into the round hole of uploading/writing to the cache. |
I would agree with this statement if we had Having a hard coded list of "if consumer is this type then do this" conditionals in fetch Request sets alarm bells off for me. Is there way for a js library-defined consumer to take part in this behavior? It seems not. |
Then I think we need to start talking about design changes to fetch/cache that will allow writing directly. If you don't think the request/response object is a good way to expose this, we need something better. |
Domenic, maybe Ben called code like you just introduced as "magical". It's no longer using the publicly facing APIs of the Request. We're peeling the envelope of the Request to expose its internals and attaching consumer (cache, fetch, etc.). |
That seems reasonable. Are there other reasons to want a consumer-specific WritableStream here besides hooking progress event? If its purely for progress, it seems we might want something that supports getting progress for the upload of fixed bodies, like |
I investigated if we can delay fulfillment of the promise returned by .write() until the data is read from the ReadableStream to address this issue. When ReadableStreamReader.read() was specced to be synchronous, we could use this approach. The only limitation is that the consumer must consume or copy all the bytes from the ArrayBuffer returned by read() synchronously. This allows the consumer (cache, fetch, etc.) to tell when the data gets consumed indirectly to the data producing code that's using the writable side. Now, ReadableStreamReader.read() is asynchronous. So, we cannot use this approach. Extended version of ReadableStream like what I prototyped in whatwg/streams#287 is necessary. If we have this, completion of write can be propagated from the final consumer (cache, fetch) to the producer code through Request object. |
Definitely! It's all the reasons you want a WritableStream class to exist, instead of just saying your ecosystem is ReadableStreams + functions accepting them. Most generally, you want to be able to give this writable stream to generic code that deals with writable streams.
I think progress, in whatever form, needs to be built on top of the underlying streaming primitives. Since we've just abandoned Request/Response as the streaming primitives, we need to figure out what lies underneath them before we can talk about progress... |
Where is that written up? Sorry if I missed it above.
Again, I don't see why a pipe construct can't resolve this issue with the current Request/Response.
Well, I think @annevk should chime before declaring Request/Response dead as streaming primitives. :-) As I mentioned on IRC, I expect browsers don't really report "bytes written to the wire" for progress today. They probably just report based on internal buffer consumption. I think this would be consistent with reading from the Request body stream. |
Oh, I kind of assumed it was clear that we wanted a primitive that wraps underlying data sinks into a generic interface that libraries and authors could operate on, with backpressure etc. So you think instead of const ws = fs.createWriteStream("/path/to/file.flac");
ws.write(generateHeaders());
rs.pipeTo(ws).then(() => ws.write(generateTrailers()));
ws.closed.then(showSuccessUI, showFailureUI); we should do something like fs.writeToFile("/path/to/file.flac", new ReadableStream({
start(c) {
c.enqueue(generateHeaders);
pumpRS(rs, c).then(() => c.enqueue(generateTrailers()));
}
})
.then(showSuccessUI, showFailureUI); ?
To clarify, by "streaming primitives" I mean "exposing directly the ability to do uploads or cache writes in a streaming way".
Agreed they would be consistent. So I guess for XHR there's no way to know if an upload succeeded, besides just the server giving back 2xx? I guess that makes sense. Doesn't work for filesystem streams though... |
I am incorrect about XHR. At least in gecko I'm told by our network gurus we report progress based on bytes sent to the kernel. OS buffering means progress is really only meaningful for very large uploads, though. |
See http://krijnhoetmer.nl/irc-logs/whatwg/20150406#l-775 for discussion happened in IRC. |
Is the problem that these need different primitives? Why is that? Can't we provide the underlying source the moment |
Progress report from IRC (logs): One potential plan would be to explain things like this:
|
This doesn't looks so good. The WritableStream exposed via If we're going not to have a buffer in a Request/Response by default, I think the WritableStream revealer should be part of the final destination API. For example, fetch() taking FetchInit as a third parameter and we pass the revealer in FetchInit. It looks strange to me that the Writer is exposed as above. I find it a little inconvenient that we cannot start piping until fetch() is invoked. If there's already started push source, we need to have a queue (e.g. wrapping the push source using ReadableStream library, or have a queue with readable side and writable side and push generated data to the queue, ...) separate from a Request object to buffer the data being generated until we're ready to invoke As I have been proposing, we can choose to optionally expose an extended ReadableStreamReader on Request/Response whose read() returns a promise of a tuple of { value, resolve, reject }`, so that commit to file, write to kernel, etc. can be propagated to the user via WritableStream side of the Request/Response. We can have this as option so that the basic read() method that just returns a promise of a value is kept as-is. |
I guess I'm less concerned about the details of the pipe implementation at this point. I think something is possible there, though. So I really think this API is right for fetch:
In this case I believe its the time-disconnected Request which leads us here. So no, I don't think its a precedent for:
That example does not have a time-disconnected object, so it doesn't have the same problems. To me the Request object is conceptually:
To me this exactly mirrors a ReadableStream:
I think the symmetry there makes this API just favor ReadableStream. From a practical implementation side of things, I want this to be optimizable:
In this case response.body file data from Cache API should be streamable off-main-thread to the IPC connected to the fetch upload socket. That seems straightforward with Request.body being a simple ReadableStream. It seems harder to me if we have to accommodate Request.body.write() or something like it. Finally, I will note that there are few cases where the body always being a ReadableStream even requires the pipe to be made. You can still do things like this:
Anyway, I think I have exhausted my commentary on this issue. :-) Sorry for the wall of text. |
OK, thanks, that does help alleviate my concerns about the precedent. It would have been nice to have a non-time-disconnected API (like Node.js has), but maybe it is not worth adding if we already have the time-disconnected version.
I'm a little scared. But maybe we should start a whatwg/streams thread about that. I will do so. |
Sorry for being away for a long time. So @wanderview and @domenic agreed to pass a ReadableStream rather than passing a revealer function, relying on the pipe optimization, right? I'm not unhappy with the idea, but it looks we go back to the initial position such as this comment. I just want to understand the reasoning and what we learned from the entire discussion.
Are these correct? |
I think so. I would quibble that it's not easier to use for some cases, and people will be forced to create pipes with some frequency, but overall those look correct. |
Thank you. @tyoshino, are you OK with the idea? |
OK. Let's start with that approach. |
Thanks, I'll write something. |
What is the latest on this? This can replace WebSocket (which is not available over H2). It would be great to get it out there. |
Sorry, no progress yet. I've concentrated on "response construction from stream" recently. I'll put more effort on this issue. IIUC we need to address the redirect problem, right? Is there any other problems? |
I'm not sure, but for redirects we should do the same we do today I think. Tee the request body and use the tee on a redirect (if any). If redirect mode is "error" and window is is set to null we can avoid teeing. The server can assist the user agent here by sending a response early which will allow the user agent to stop teeing. |
The somewhat more significant issue is that streaming requires chunked encoding, I think. That is basically something browsers have not done historically and not all servers are capable of supporting. None of that is necessarily problematic, but it might require some work beyond just hooking up the API. It will certainly require some changes to Fetch which assumes a request content length in places. |
Seems that this strategy for redirects means streams aren't a good indication of upload progress. Will this need to be added as a separate feature? In terms of chunked encoding, the chunking isn't the problem right? HTTP2 data is always delivered in frames. I guess the issue is just the lack of |
#24 is the issue for upload progress, and I agree that we need a separate upload progress mechanism. |
@jakearchibald it's not any different from how progress events work today... |
@jakearchibald the issue with chunked is that HTTP/1 is still around. |
Sure, I mean that streaming doesn't require chunked encoding if you know the content-length. Rejecting is an option if there isn't a content-length and browser can't establish an HTTP/2 connection. |
We cannot rely on a developer-supplied length. Not reliable. (See also past discussion.) |
(See also earlier in this issue for this very discussion.) |
Hm, sorry. Did a search for "length" and couldn't find it. Guess I should read the whole thing. |
Should we close this issue (since the question in the OP is settled) and use a new issue label to track uploading? |
I guess I see the following rough outline of a solution:
I think we could probably write a spec for uploading now? Most of these are settled. I guess redirects is the big open question from my perspective. |
Given the length of this issue I'm happy with someone closing it and continuing the debate in the more focused issues. I think we're pretty close having satisfactory answers for a minimal v1. |
Previously: #3 and #25 (comment)
My opinion is that writing to a response body, i.e. doing an upload or writes to a cache, should be represented by a writable stream. It feels like a major failure of the entire writable stream design if we can't use it for cases like this.
It's true that in general you never "need" writable streams, as you can always have functions that accept readable streams and then directly feed their contents to underlying resources (e.g. directly call write(2) on the cache or send(2) on the socket). But kind of the entire point is to have an abstraction that lets you hide away that complexity.
@wanderview is not so sure...
It's also worth noting there is an equivalence here:
Given
function f(bodyWriter) { bodyWriter(ws); }
, we can getg(rs)
viaGiven
function g(rs) { doSomethingWith(rs); }
, we can getf(bodyWriter)
viaBut I am more interested in what is natural to implementations. My hope would be that you could supply a writable stream that would directly wrap to the socket for upload, or file descriptor for write to cache, and not have to do any adaptation of the type shown in the second example.
The text was updated successfully, but these errors were encountered: