Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure the JS thread doesn't have to get hit when piping two streams to each other #97

Closed
domenic opened this issue Mar 11, 2014 · 22 comments

Comments

@domenic
Copy link
Member

domenic commented Mar 11, 2014

E.g., an XHR stream to a file system stream, xhrStream.pipeTo(fsStream).

As-is, there might be a problem if e.g. the user does

var oldWrite = fsStream.write;
fsStream.write = function (chunk) {
  console.log("AOP logging!!", chunk);
  oldWrite.apply(this, arguments);
};

after the pipe is already set up. The implementation would have to detect this and start routing data to the main thread, which kind of sucks.

A possible fix would be caching the value of dest.write at the top of pipeTo.

/cc @sicking

@sicking
Copy link

sicking commented Mar 11, 2014

One important question is if we actually want to support code like the above?

My concern is that it makes it very easy to create footguns where you suddenly get dramatically slower performance for something that is seemingly a harmless no-op.

For example

var oldWrite = fsStream.write;
fsStream.write = function (chunk) {
  if (debug)
    console.log("saved", chunk);
  oldWrite.apply(this, arguments);
};

or even

fsStream.write = fsStream.write.bind(fsStream)

I don't know the answer. But it's not obvious to me that it's beneficial that overriding write affects pipeTo.

@domenic
Copy link
Member Author

domenic commented Mar 11, 2014

The issue is that if pipeTo doesn't go through a public API like write then it is impossible for users to create streams themselves; they can only use platform-provided streams.

@tyoshino
Copy link
Member

users to create streams themselves

This is important, but I don't think we need to enable replacing write of dest with ongoing piping.

We should also clearly specify how pipeTo determines whether to use such an offloaded/platform piping or call JS function.

@domenic
Copy link
Member Author

domenic commented Mar 13, 2014

The base pipeTo should always follow the specified algorithm; subclasses can always override it with more specialized versions.

@tyoshino
Copy link
Member

The base pipeTo should always follow the specified algorithm; subclasses can always override it with more specialized versions.

I meant how to check if dest can receive piped data out of JS thread or not. If dest's write is overridden by JS code, it needs to be run in JS thread. Otherwise, we could do some optimized offloaded pipe depending on dest's type.

@domenic
Copy link
Member Author

domenic commented Mar 13, 2014

Well, my understanding of @sicking and others' ideas was that if you did certain special combinations, e.g. xhrStream.pipeTo(fsStream), then XHRStream.prototype.pipeTo would recognize it's dealing with an FSStream, and do off-main-thread piping work. That is, something like

XHRStream.prototype.pipeTo = function (dest, options) {
  if (dest.[[InternalStreamType]] === "file") {
    cppBindings.pipeTogetherOffMainThread(this, dest, options);
  } else {
    ReadableStream.prototype.pipeTo.apply(this, arguments);
  }
};

@domenic
Copy link
Member Author

domenic commented Mar 13, 2014

Er, wait, I realize that doesn't address what you're talking about. Yes, I agree, determining clearly what the criteria are is important... the spec should provide some hook other specs can use so that they can do e.g.

XHRStream.prototype.pipeTo = function (dest, options) {
  if (dest.[[InternalStreamType]] === "file" && isUndisturbed(dest)) {
    cppBindings.pipeTogetherOffMainThread(this, dest, options);
  } else {
    ReadableStream.prototype.pipeTo.apply(this, arguments);
  }
};

Here isUndisturbed is the hook. It wouldn't be an actual function (I don't think?), but just an algorithm other specs could refer to.


This is all of course assuming our solution is indeed of the caching-write flavor described. Other solutions would be some kind of double-dispatch, which we could conceivably even bake into the base spec... e.g.

ReadableStream.pipeTo = function (dest, options) {
  if (dest.pipeFrom(this)) {
    // successfully hooked up at the OS level; all done
  } else {
    // the dest had no special way of hooking itself up to `this`, so do the usual algorithm
  }
};

@tyoshino
Copy link
Member

Right. Thanks.

One more point I'd like to clarify is how dest's state getter should work during offloaded pipeTo() is working. I guess the right value to return in such a case (not yet closed) is basically "waiting".

@ferasm
Copy link

ferasm commented Apr 2, 2014

I like the concept of isUndisturbed.

I also agree it would not be an exposed function, and would like to have it be well defined in the spec - I can see other specs needing this behavior documented and will be able to reference this spec for a definitive algorithm.

@tyoshino - I don't quite understand why the state getter should behave differently?

@tyoshino
Copy link
Member

tyoshino commented Apr 2, 2014

@ferasm state getter is the only synchronous (closed getter is also, but what it returns is the constant reference to the [[closedPromise]] instance) method on BaseWritableStream. So, I wondered how it should behave while pipe algorithm is running asynchronous to the JS thread. We should also discuss how the other methods (except for closed. It's clear) should behave but I was especially concerned with state.

@othiym23
Copy link

othiym23 commented Apr 4, 2014

/cc @Gozala

@tyoshino
Copy link
Member

@yutakahirano pointed out that not only for pipeTo() but we want similar functionality for the constructor of Resource of the Fetch API. If we add ReadableStream to the RequestInit, we would want to prohibit read()-ing from the passed ReadableStream. I.e. giving exclusive access to the Request object.

See also w3c/ServiceWorker#413 and w3c/ServiceWorker#452

@sicking
Copy link

sicking commented Oct 28, 2014

I also think that we really need the ability to send a ReadableStream or WritableStream to another thread using worker.postMessage({ theStream: stream }) or port.postMessage({ theStream: stream }).

In that case too we should make sure to not hit more threads than needed.

@tyoshino
Copy link
Member

@sicking Is what you're proposing a Streams API based pipe for communication between workers/windows?

@sicking
Copy link

sicking commented Oct 30, 2014

I'll avoid using the word "pipe" since it means different things to different people.

What I mean is that if I get a ReadableStream or WriableStram from something like writablestream = filesystemDirectory.writeToFile('filename'); or readablestream = fileReader.readAsStream(); or xhr.responseType = 'stream'; xhr.send(); readablestream = xhr.response I would expect to be able to pass that stream to a different thread. And that I could do that using port.postMessage(readablestream/writablestream). And that that data read or written would go directly to/from the worker thread to the filesystem or network.

@tyoshino
Copy link
Member

tyoshino commented Nov 4, 2014

@sicking Is the ReadableStream neutered in the original scope? What I have in my mind is something like socketpair(2). We call e.g. worker.createByteStreamPair(). It returns a WritableByteStream, and the worker receives an event with a ReadableByteStream attached. These streams are connected. Then, you do as follows:

In the main thread:

var fileStream = fileReader.readAsStream();
var streamToWorker = port.createByteStreamPair();
fileStream.pipeTo(streamToWorker);

In the worker thread:

self.addEventListener('message', function(e) {
  var streamToMain = e.data;
  // do something
}, false);

@sicking
Copy link

sicking commented Nov 4, 2014

@sicking Is the ReadableStream neutered in the original scope?

Yes. Just like with any other transferable objects.

What's the advantage of the createByteStreamPair() approach? It seems more awkward to me, but maybe I'm missing some advantages? Should we do the same thing for MessagePorts? I.e. would there be advantages to adding something like port.createPortPair() so that people could use that rather than sending MessagePorts over postMessage?

@sicking
Copy link

sicking commented Nov 4, 2014

And let me know if this discussion should maybe be done as a separate issue?

@tyoshino
Copy link
Member

tyoshino commented Nov 5, 2014

advantage of the createByteStreamPair()

Sorry, I hadn't give it much thoughts yet. A little more study follows:

The streams are so sophisticated than ArrayBuffer (just buffer. no state), String, Blob (immutable. no state), etc. I think we don't want to accept general streams. But even byte streams, string streams, etc. have states. In postMessage(readableStream) approach, do we need to pickle readableStream in the original scope and restore it in the new scope? It complicates, and I thought we don't have to do that and it's sufficient if we can connect windows/workers with back-pressured channel. Then, what should postMessage(readableStream) do?

  1. Neuter references to readableStream. Create a bridge that reads from readableStream via the ReadableStream interface. Read data is cross-thread transferred to an object with the ReadableStream interface in the destination scope.
  2. Do something special intruding the ReadableStream interface. E.g. for a stream obtained from XHR, detach TCP/TLS endpoint (file descriptor) from the stream and pass it to the destination scope and create a new object with the ReadableStream interface with the descriptor.

Thinking about (1), as we have pipeTo(), I thought we could just provide a way to generate a connected pair of ReadableStream and WritableStream and pass the read-side directly to the destination scope to avoid use of postMessage(readableStream) which implies serialization of the stream.

If we want to realize (2), we need to take postMessage(readableStream) approach.

@tyoshino
Copy link
Member

tyoshino commented Nov 5, 2014

And let me know if this discussion should maybe be done as a separate issue?

Created. #244

@domenic
Copy link
Member Author

domenic commented Mar 12, 2015

Quick update on this (old) issue: this is one of the highest priorities of the current design, pioneered with the stream reader concept. The only missing piece is a similar stream writer for writable streams. Once we have that in place this issue is done. We are very much on track though and have been designing the system around this goal.

@tyoshino
Copy link
Member

The only missing piece is a similar stream writer for writable streams.

This is happening at #462

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

5 participants