Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stream-whatwg: add whatwg streams #22352

Closed
wants to merge 3 commits into from

Conversation

devsnek
Copy link
Member

@devsnek devsnek commented Aug 16, 2018

This pr adds a whatwg stream implementation from chromium and a WPT test runner.

I was inspired to work on this specifically because of the WebAssembly streaming PR that was recently opened but I think these streams stand well enough on their own.

This provides the a piece of the puzzle for wasm streaming, fetch, and some things I'd like to see used in esm.

I realize that this isn't really in @nodejs/streams's plans and such so I'm marking this as discuss and in progress.

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines

@devsnek devsnek added wip Issues and PRs that are still a work in progress. discuss Issues opened for discussions and feedbacks. labels Aug 16, 2018
@nodejs-github-bot nodejs-github-bot added the lib / src Issues and PRs related to general changes in the lib or src directory. label Aug 16, 2018
@devsnek devsnek force-pushed the feature/whatwg-streams branch from bc166d7 to 0755fea Compare August 16, 2018 06:51
Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m very skeptical about adding this feature to Node.js. There is a lot of momentum and compatible APIs in the Node.js ecosystem about Node Streams (with their problems). Adding another implementation might not be in the best interest of the runtime.

What is the goal? Should we look for interoperability with whatwg streams? Can you link the other PRs or topics you needed these for?

Tagging as semver-major because it adds new globals.

Can you please remove the git submodule?

(I’m putting a -1 for this not to land without a long discussion).

@mcollina mcollina added the semver-major PRs that contain breaking changes and should be released in the next major version. label Aug 16, 2018
@devsnek
Copy link
Member Author

devsnek commented Aug 16, 2018

@mcollina

Adding another implementation might not be in the best interest of the runtime.

i thought about this a bit. the conclusion i came to is that it would certainly be weird, but i couldn't think of any actual problems here that we haven't hit before. the biggest issue would be the confusing of having multiple interfaces which is obviously a valid issue but i think whatwg streams have benefits that outweigh the confusion. as a parallel, we ship two different versions of URL in core as well. in some future i would like to imagine that we only really use whatwg streams, the same way we've mostly transitioned to whatwg urls.

What is the goal?

to add whatwg streams to enable other various features down the line, such as creating interfaces in node which would be compatible with webassembly streaming or implementing isomorphic http interfaces. if we did implement fetch it would also give us the response object which could be used in the esm loader to pass data around, although that idea hasn't really gone beyond an idea. there aren't really any prs to link because no one's tried to implement this stuff yet. it is also just an excellent interface that users of node can benefit from having exposed.

Can you please remove the git submodule?

why? how should i bring wpt into the repo?

@mcollina
Copy link
Member

All of the the most used features of Node.js are based on Node Streams. This will likely double the side of our API in a lot of areas: What is the whatwg version of net.Socket? tls? http?

Reimplemeting all the above is a lot of work and a lot of disruption for the community. In comparison, the URL change was tiny, and we used it very little in node core. Streams are used everywhere, and some of the semantics goes down to C++.

why? how should i bring wpt into the repo?

We vendor (copy) our dependencies.

@bricss
Copy link

bricss commented Aug 16, 2018

I will be great to have whatwg streams in node at least for compatibility reasons 🎉

@mafintosh
Copy link
Member

I share @mcollina's concerns about this and def believe this belongs in userland.

@devsnek devsnek force-pushed the feature/whatwg-streams branch from 0755fea to 2514c91 Compare August 16, 2018 14:38
@devsnek
Copy link
Member Author

devsnek commented Aug 16, 2018

@mafintosh at the very least a situation where we allow people to pass whatwg streams into node apis would be a win for me. i'd like to actively pursue features like fetch and wasm streaming being in core.

@mcollina fwiw whatwg streams are actually a lot safer and easier to use than node streams from c++ because of the reader locks, separation of stream events and stream producer events, etc. chromium has some nice apis here for interacting via c++ that i think we could take advantage of.

@devsnek
Copy link
Member Author

devsnek commented Aug 16, 2018

also after reading through the whatwg stream spec and node stream implementation, it looks like it would be safe to either make node streams extend whatwg streams, or if that makes people uncomfortable i think something like nodestream.acquireStandardStream() would be usable

@joyeecheung
Copy link
Member

This seems to be importing the entire WPT? Including those that don't make sense for us?

common.gypi Outdated
@@ -24,6 +24,17 @@

'openssl_fips%': '',

'v8_extra_library_files': [
'./lib/v8_extras/ByteLengthQueuingStragety.js',
Copy link
Member

@jasnell jasnell Aug 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo in file name... s/ByteLengthQueuingStragety/ByteLengthQueuingStrategy

@jasnell
Copy link
Member

jasnell commented Aug 16, 2018

Not yet convinced that this should be in core and definitely concerned about the massive import of WPT. One way we could make progress on this is to follow a development model much like we did with http2... that is, create an experimental fork repo, get this landed there, allow folks to iterate on it, then decide whether to pull it in. That said, this is something that could definitely be done as a userland module and we already have three versions of the existing streams API in core and we definitely need to have a decision around how to do this.

Also, in general, I'm not convinced that using v8_extras is the right approach for this. I think there are some large thorny issues we need to address in the implementation and in terms of how this would land, so a separate working repository could really help to iron those things out.

@devsnek devsnek force-pushed the feature/whatwg-streams branch 2 times, most recently from 6d1a805 to 0f3b850 Compare August 16, 2018 18:19
@devsnek
Copy link
Member Author

devsnek commented Aug 16, 2018

@joyeecheung @jasnell i did another trim of wpt to make it hopefully only contain interfaces we have in core.

One way we could make progress on this is to follow a development model much like we did with http2... that is, create an experimental fork repo, get this landed there, allow folks to iterate on it, then decide whether to pull it in.

do you mean developing intertop between node streams and whatwg streams? the whatwg stream implementation here is complete.

this is something that could definitely be done as a userland module

as i keep saying... i'd like to take advantage of the streams inside core. something i could have a pr for the day after this lands is wasm streaming. we would also be able to add Text{Encoder,Decoder}Stream interfaces.

@devsnek devsnek force-pushed the feature/whatwg-streams branch 2 times, most recently from a7d2f2f to f6e3c30 Compare August 16, 2018 18:29
@joyeecheung
Copy link
Member

@devsnek Maybe only include the stream tests in this PR, since the commit only says it adds whatwg streams? The diff seems to be much bigger than necessary to review.

Also, does the test pass locally? For what I can tell at least the linter won't be happy, the WPT files should probably be placed in fixtures to avoid being linted. A lot of tests ending with .any.js in WPT still make use of document methods, I don't think they'll pass as-is?

@devsnek
Copy link
Member Author

devsnek commented Aug 16, 2018

@joyeecheung all the selected tests pass (you can look at the travis build to see the TAP output)

@devsnek devsnek force-pushed the feature/whatwg-streams branch 6 times, most recently from 8679b96 to 7227280 Compare August 19, 2018 15:18
@devsnek devsnek force-pushed the feature/whatwg-streams branch from f615c9e to e0117e7 Compare August 20, 2018 15:37
@@ -24,6 +24,17 @@

'openssl_fips%': '',

'v8_extra_library_files': [
'./lib/v8_extras/ByteLengthQueuingStrategy.js',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are these files consumed by the build? I see that gypfiles in deps/v8 references v8_extra_library_files, are you relying on the compilation of v8 to grab these lib/v8_extra js files and compile them in? That will be more awkward for node-chakracore to work with.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's exactly what is happening. The V8 build compiles the v8_extras in. They are not loaded the same way as every other thing in core.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can chakra not just put these files somewhere and eval them when a context is created? we do plenty of stuff elsewhere that chakra has to shim.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, chakra shims a lot of things, but so far it has been native APIs (or js APIs). Not side effects of the build system. Sure, it can be done, but I'd much prefer not to.

Personally I also feel like this makes the build of node more complex to reason about, it is that much less of a dependency tree and more of a dependency graph.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you wanted to have these JS files live in lib and be compiled to native, then my preferred approach would be to explicitly put that logic in node's build, not use a side effect of v8's build.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MSLaguana are you saying to just put the list in node.gyp instead of common.gypi?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not the list, but the thing that consumes the list. The bit that takes the list of filenames and processes it to the point where the relevant build artifact is produced.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking through the implementation, I'm not sure which aspects of it would be infeasible to implement as a normal module. Doing so would make it significantly easier to polyfill, would make it more likely that it could just be dropped in to readable-stream, and wouldn't require any changes to the build system or any coordination between v8 and chakra-core. @devsnek ... can you explain a bit more about why you think such a port wouldn't be feasible?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v8 consumes the list and turns it into a c++ file that is linked internally to the engine. chakrashim would need to do something similar, calling those extras functions when new contexts are created.

Copy link
Member Author

@devsnek devsnek Aug 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jasnell because its a lot of stuff to rewrite (i don't have the time to do that) and i'd like to be able to collaborate with chromium by keeping at least similar implementations. readable-stream can use any of the polyfills on npm or even vendor the reference implementation from the streams repo.

@jasnell
Copy link
Member

jasnell commented Aug 20, 2018

... i purposely keep it in our lib because i want us to be able to continue working on it separately from chromium.

That would also be possible with it in deps/v8_extras. The point here is, it's a vendored in dependency that is compiled into v8 and is not loaded or bootstrapped in the same way as anything else in lib, which creates a code management inconsistency.

i don't understand this problem. you always have to know what type of thing you are using to be able to use it.

I don't have to know that process.stdout is a Duplex in order to use it as a Writable

Assuming, for a moment, that we stay with the approach of introducing new methods on the Readable and Writable prototypes, half of my concern here can easily be dealt with simply by not having a single overloaded acquireStandardStream() method, but by splitting those into separate methods based on the type that is being returned. e.g.

const rs = socket.toReadableStream()
const ws = socket.toWritableStream()

const rs2 = anyReadble.toReadableStream()
const ws2 = anyWritable.toWritableStream()

This would eliminate any ambiguity that would exist in that API and would eliminate the need to special case Duplex.

imagine you want to only use whatwg streams in your app. there are a lot of places where streams come from. throwing around global functions from requires to all those places is not a nice api to use.

Again, "nice" is entirely subjective here. I don't find the following to be particularly "nice"...

const acquireStandardStream = Readable.prototype.acquireStandardStream
const rs = acquireStandardStream.apply(someArbitraryStream)

But this is likely the pattern that I would need to implement in order to be certain that all things conforming and pretending to be stream.Readable can be appropriately wrapped.

The following is certainly no worse:

const { toReadableStream } = require('stream')
const rs = toReadableStream(someArbitraryStream)

You are absolutely right that there are lots of things that streams come from, including things that do not extend from stream.Readable or stream.Writable but still can be handled as streams.

In any case, I've pointed out these limitations in the current proposed design a couple of times now so won't continue to belabor the point.

maybe we can link existing resources like whatwg/streams:FAQ.md@master#what-are-the-main-differences-between-these-streams-and-nodejs-streams

Including such links in the documentation are fine, but they are not a substitute for proper and consistent API documentation.

@jasnell
Copy link
Member

jasnell commented Aug 20, 2018

In terms of handling Duplex with the interop API... additional thought will need to be given to handling error state across the shared underlying Duplex and the wrapper ReadableStream and WritableStream instances. Consider the following example for instance:

const http2 = require('http2')

const server = http2.createServer();

server.on('stream', (stream) => {
  stream.on('error', console.log)

  const rs = stream.acquireStandardReadableStream()
  rs.pipeTo(process.stdout.acquireStandardWritableStream())

  stream.respond()

  const ws = stream.acquireStandardWritableStream()
  ws.abort('no reason')
})

server.listen(8000, () => console.log('ok'))

Running a client against this code causes the server to crash with the following:

(node:98798) ExperimentalWarning: The http2 module is an experimental API.
ok
(node:98798) ExperimentalWarning: Readable.acquireStandardStream is an experimental feature. This feature could change at any time
(node:98798) ExperimentalWarning: Writable.acquireStandardStream is an experimental feature. This feature could change at any time
no reason
_stream_readable.js:1000
      this.on('end', () => controller.close());
                                      ^

TypeError: Cannot close an errored readable stream
    at ServerHttp2Stream.on (_stream_readable.js:1000:39)
    at ServerHttp2Stream.emit (events.js:182:13)
    at endReadableNT (_stream_readable.js:1108:12)
    at process._tickCallback (internal/process/next_tick.js:63:19)

The close here, I believe, is eminating from the Readable side that is piping to process.stdout. When the WritableStream is aborted, we need to decide if that should tear down the entire Duplex or if it should only close the Writable side of the Duplex. That decision could be based on whether reason is undefined or we could just decide to always tear down the entire thing. If the latter, the Writable needs to propagate the abort the Readable side, and vice versa.

@joyeecheung
Copy link
Member

joyeecheung commented Aug 27, 2018

If I understand correctly, this PR tries to run the WPT by running the testharness.js in the context first, then run the test files? I just noticed that the WPT harness does not seem to work that way. For example:

const harness = fs.readFileSync('path/to/testharness.js', 'utf8');
vm.runInThisContext(harness);
vm.runInThisContext('test(function() { assert_false(true) })');  // nothing happens
vm.runInThisContext('assert_false(true)');  // throws exception

@devsnek
Copy link
Member Author

devsnek commented Aug 27, 2018

@joyeecheung you need to install test callbacks, you can see the ones I have in wpt.js

@joyeecheung
Copy link
Member

you need to install test callbacks, you can see the ones I have in wpt.js

@devsnek Indeed, sorry for not looking into the docs more carefully before!

@Fishrock123
Copy link
Contributor

The first and most obvious take that I think will be pretty universally held is "whoa that's a lot of code".
To be clear, there is ~2k LOC that isn't tests that do need to be maintained here. We may be able to vendor it - but what happens if there are things that Chrome doesn't fix fast enough? Or fixes in a different way after we do? I think it is most likely that we'd end up having to maintain the implementation.


This adds and API that we already have with no clear benefit aside from being a different API.

I'm presently not convinced the API is better than streams3. Let's do a bit of a run down.

WhatWG Pros

  • More things are hidden behind symbols.
    • (Or symbol-ish-es - the spec isn't clear that these are JS properties in any way.)
  • Has "locking" and some concept of hidden "optimized" "piping" when "locked".
  • Does not use events, as far as I can tell.

WhatWG Cons

  • Push apis as part of the protocol.
    • It's still an overly complex state machine just like Node Streams are.
    • Backpressure is still quite confusing as a result.
  • Uses promises.
    • I'm not 100% sure to what extent, but this may cause unnecessary memory and GC pressure.
  • Largely duplicated API that is quite large.

Other WhatWG notes

  • Supports push sources / pull sinks, object / binary streams, just like Node Streams do.
  • Supports "tee"-ing, or multicasting. Probably easier due to the "locking".
    • Multicast streams still have confusing state implications regardless.
  • Both WhatWG and Node Streams support some form of simplified construction.
  • A lot of vendored JS code we are publicly exposing.

Uncertainties

  • If we wanted to transition more, I am unclear how well this API would work on a C++ level.
  • Performance & memory pressure.
    • If someone feels that the WhatWG streams should be favored due to perf or memory, they should gather some numbers and report back.

Due to the relatively small benefit, as far as I can tell, I'm not sure what Node.js would be gaining here.

Having more than one API could be confusing - which one do you pick if you are building things?

Also, in a future supporting both, we are likely to run into the unfortunate problem of going back and forth with transforms between the two streaming methods, likely resulting in unnecessary queuing as both streaming APIs maintain different types of queuing internally. That seems pretty bad to me, and it seems favorable for Node.js development[1] to keep WhatWG streams to the web.


As far as web-compat, we aren't a web browser, and so that should be largely irrelevant to us. We don't have to lock ourselves to web standards like browser do, and so we should be able to ensure development with Node.js in nice and works well for Node.js's uses without web standard impediment. (This is different than the language spec, I should note.)

I know people seem to like "isomorphic" JS code, but in my experience non-simple isomorphic dependencies tend to either be unreasonably troublesome to use, or perform significantly worse than ones built to cater to either Node.js or the Web, respectively.

[1] Ok sure, the more technical debt the more job security for support companies and consultants, but that's not what I'm here for.

Copy link
Contributor

@Fishrock123 Fishrock123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that userland is a better place for this, as I outlined above.

Additionally, on a purely PR and not feature level:

  • GitHub breaks on the PR size, making it difficult to review. Do we really need all of this? I feel like we almost certainly do not.
  • I don't think we should be vendoring v8_extras unless otherwise impossible.

Copy link
Member

@fhinkel fhinkel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m very skeptical about adding this feature to Node.js. >>> There is a lot of momentum and compatible APIs in the Node.js ecosystem about Node Streams (with their problems). Adding another implementation might not be in the best interest of the runtime.

@TimothyGu
Copy link
Member

So much of the discussion here has been on "how good is this PR", which while useful does not, in my opinion, constitute the whole picture.

I would invite you all to look at the bigger picture associated with this PR for a moment, rather than the PR itself; to think of the opportunities for growth that will be unlocked by this PR.

fetch()

One of the most oft-requested feature of Node.js is a built-in fetch(). For now, the module I maintain, node-fetch, has been filling in the gaps. However, even with that module, people cannot write code that is compatible with both browsers and Node.js that uses streaming without a ReadableStream implementation.

A major argument has been brought up in this thread, and that is "why cannot this be in the userland?" Right now, the biggest advantage node-fetch is its size: the latest version, v2.2.0, the installation size of node-fetch is 135 KB with 0 external dependencies. Even that number is inflated, since we package two copies of the same library, a CommonJS version and a ESM version (with .mjs extension).

This should be compared with got (285 KB), axios (387 KB), and request (4.45 MB), which are some of the other most popular modules for doing network fetches.

We do not want this to change. We are strong believers in the fact that Node.js alone should allow us to do requests. So far we have been fairly successful in doing that, in all areas except a uniform streaming API. It is our hope that Node.js would allow us to do that as well.

(This would also allow fetch() to become a part of Node.js, but that is a discussion for another day.)

More opportunities for isomorphic apps and libraries

Let's take a step back from specifically fetch().

Last year was an exciting time. URL, TextDecoder, and TextEncoder all got implemented in Node.js core to bring much parity to web browsers and Node.js. Several modules started support ArrayBuffer, TypedArray, and DataView, in addition to just the Node.js-specific Buffer. A web-compatible Worker was still being discussed (which is now implemented! thanks to the work by @addaleax). An implementation of ES Modules became part of the official Node.js core project. And personally, I became a Collaborator of the project, in a large part because of my work on the new URL parser.

Inspired and greatly encouraged by the progress we had seen, I did a talk last summer on how the Web Platform and Node.js are starting to work together to build a better developer experience for all JavaScript programmers. In the talk, I proposed an idea:

With uniform adoption of Web APIs, JavaScript programmers will be able to write truly isomorphic apps without the need for polyfills.

Much work still remains to accomplish that, but I see Web streams as a crucial part of the story to make this possible.

Not only does fetch() depend on Web streams, we are seeing more and more web APIs that utilize web streams being proposed and implemented. A great example of this is the proposed TextDecoder.stream and TextEncoder.stream properties. In fact, as soon as they become part of the standard for TextDecoder and TextEncoder, our implementations of those classes will become incompatible with the API found in browsers due to the missing property.

Let's not let streams be the blocker in our progress for better developer experience for JavaScript programmers everywhere.

WHATWG streams are never going to replace Node.js streams

Let's face it. Nowhere in this PR did the idea of replacement ever come into hand. Let us instead focus on what use cases would web streams enable. Comments like

I'm presently not convinced the API is better than streams3.

in my opinion should be inconsequential in how we evaluate this pull request specifically and the venture to bring Web streams into Node.js, as it implies that we'd have to choose one (in which case we would obviously choose the "better" one).

WHATWG streams are not trying to compete against Node.js streams. There is no winner or loser here. In fact, the WHATWG-Node.js stream bridge proposed by @devsnek and improved upon (in my opinion) by @jasnell would make both options fully interoperable.

WHATWG streams are easier to learn for new Node.js developers

Node.js is nothing without its developers, and in my experience with developing Node.js applications before joining in the project itself, the streaming API is by far the most difficult to understand and use. Even though the JavaScript community have moved away from templating engines, it is no coincidence that none of the major templating engines (EJS, Jade/Pug, Handlebars; the first two of which I maintained for a significant amount of time) supported streaming.

Even now, we do not see any straightforward definition for "stream1", "stream2", and "stream3" in Node.js documentation. Even the StrongLoop blog post where I learned about these different concepts has bitrotted, the best source for information I can find now being a StackOverflow post.

New additions to the stream API like async iterators go a long way in bringing Node.js streams up to date to new features in JavaScript like promises. But it is not enough. .pause(), .resume(), .on('data'), and the multiple end-like concepts ('end', 'close', 'error', close(), destroy()) confuse everyone who is not a stream expert – even including myself and other TSC members I have personally had contact with – and remains a source of bugs. Not to mention the Zalgo problem that is inherent in how Node.js streams work, which is cited as giving a performance boost in many cases but have a tendency to confuse for newcomers of the community.

On the other hand, WHATWG streams avoid many of those problems by having a refreshing API design that embraces promises. With the Node.js stream interop, we can provide a valuable stepping stone for developers with experience writing async-await based code. On the other hand, developers who want to squeeze the last bit of performance out of Node.js could still use the existing Node.js APIs. Isn't that quite nice?


My fellow TSC members, let us not write off Web streams too hastily, just based on this pull request or its approach alone. As the technical stewards of the project, we have an obligation to maintain the long-term health of the project. In the JavaScript world, promises are here to stay, Web streams are here to stay [1] [2]. We should not be left behind.

@mcollina
Copy link
Member

Unfortunately i cannot join the TSC meeting, and that post merit a long response in writing. Please postpone any decisions on the matter.

@Trott
Copy link
Member

Trott commented Aug 29, 2018

Discussed briefly at TSC meeting. Removing tsc-agenda label. Please re-add as appropraite.

@Trott Trott removed the tsc-agenda Issues and PRs to discuss during the meetings of the TSC. label Aug 29, 2018
@Fishrock123
Copy link
Contributor

However, even with that module, people cannot write code that is compatible with both browsers and Node.js that uses streaming without a ReadableStream implementation.

This doesn't seem correct to me. You could use a WhatWG stream module instead. As far as I'm aware what is in this PR could be put in a user module and provide an interoperability point from Node streams to WhatWG streams. Do note that in either case you are probably going to run into unnecessary queuing, and that is no different without a wholesale move to WhatWG streams, which doesn't seem very realistic (or perhaps even desirable) to me.

We do not want this to change. We are strong believers in the fact that Node.js alone should allow us to do requests. So far we have been fairly successful in doing that, in all areas except a uniform streaming API. It is our hope that Node.js would allow us to do that as well.

I don't see any reason why we'd presently be preventing user module WhatWG streams. If you want your user module to do it, then go and do it. Node core is not actually stopping you here. The above caveats apply in any case, and this doesn't seem to make them any better.

@devsnek
Copy link
Member Author

devsnek commented Aug 29, 2018

@Fishrock123 it is unsafe to depend on polyfills at a library level, if different libraries use different polyfills it can break things hardcore when you mix them. We also have at least two features within core (Encoding streams and compileStreaming) that can use this.

@mcollina
Copy link
Member

This is a long response, I'm sorry about it.


node-fetch

If you would like to take node-fetch to WHATWG stream, I'm up to help in building the modules and the ecosystem to help work well in the Node.js world. However this must involve a polyfill because of our LTS cycle as Node.js 8 is maintained for ~2 years. I think this is something we should resolve earlier than that.

Some of the usage that I've seen for node-fetch is:

  global.fetch = async function(url, params) {
    // External call, just use node-fetch
    if (!url.startsWith('/api')) return nodeFetch(url, params)

    // Local fetching, serve with fastify.inject
    const response = await server.inject({ method: 'GET', url, ...params })

    response.status = response.statusCode
    response.statusText = response.statusMessage
    return new nodeFetch.Response(response.payload, response)
  }

As a platform, we need to do a better job so that our users do not need to monkey patch the global object to make their code work. I think that we need a better pattern than pure fetch() for isomorphic applications and such, as that code make reuse and portability extremely hard. Note that shipping a global fetch() would break Node.js users.

I think a pure, spec-compliant and global fetch is not good for Node.js because Node.js is not a Browser. Real-world Node.js applications needs an http.Agent to support keep-alive, proxy, certificates, and a lot of other things. All of those settings should be able to change between different HTTP calls. fetch() assumes a global behavior because the user of a Browser is one individual, and those configurations are governed by them through the Browser. Node.js stopped shipping the limit of 5 concurrent http connections long ago for this (and other) reasons (https://nodejs.org/dist/latest-v0.10.x/docs/api/http.html#http_agent_maxsockets).

WHATWG streams are never going to replace Node.js streams

Let's face it. Nowhere in this PR did the idea of replacement ever come into hand.

There is a lot of baggage behind this. Several collaborators walked away from the project because of this and the not-so-great interaction between individuals. For these reasons mentioning WHATWG Streams stirs pushback immediately. Let's leave all of that in the past.

If we consider WHATWG Streams are never going to replace Node.js streams as a design goal, we can start working towards improving the compatibility of Node.js with them. I'm happy to work with that assumption.

WHATWG streams are easier to learn for new Node.js developers

Can we improve the docs of Node.js Streams, considering that they are there to stay? Can somebody help?

Not to mention the Zalgo problem that is inherent in how Node.js streams work, which is cited as giving a performance boost in many cases but have a tendency to confuse for newcomers of the community.

I think the major source of that was fixed in #17979 and readable-stream@3. If there are more weird things let's talk about it again.

On the other hand, WHATWG streams avoid many of those problems by having a refreshing API design that embraces promises. With the Node.js stream interop, we can provide a valuable stepping stone for developers with experience writing async-await based code. On the other hand, developers who want to squeeze the last bit of performance out of Node.js could still use the existing Node.js APIs. Isn't that quite nice?

I envision a world where the streams complexities blurs in the background (but are available for experts), and there is a common-ground way of doing things just using language constructs, and easy, portable helpers.

async function run (origin, dest) {
  try {
    const writer = buildWriter(dest); // needed because of emit('error')
    for await (let chunk of origin) {
      await writer.write(chunk.toString().toUpperCase());
    }
    await writer.end();
    await finished(dest);
  } catch (err) {
    cleanup(origin);
    cleanup(dest);
  }
}

or

async function saveWebsite(url, dest) {
  const response = await fetch(url)

  // pipeline can pipe things between WHATWG Streams and Node.js Streams
  await pipeline(response, fs.createWriteStream(dest))
}

IMHO both approach are simple. I would prefer that we focus on shipping a runtime where that is feasible.


@Fishrock123 it is unsafe to depend on polyfills at a library level, if different libraries use different polyfills it can break things hardcore when you mix them.

Node.js streams works in exactly this way, and it is a pretty similar concept: there is a public API and if you adhere to it things should work well. If the same concept does not apply to whatwg streams, let's open an issue on the their repo as this looks like a bug to me. As an example, native promises and async-await works well with promise libraries such as bluebird.


Conclusions

I propose to spin up a team that has the following goals:

  1. improve the developer experience of using WHATWG Streams on top of the Node.js releases without WHATWG Streams (Node 6, Node 8, Node 10).
  2. evaluate if/how/when we should add WHATWG Streams to Node.js.
  3. extends our streams API and ecosystem so that it works well in a world where the stream processing is governed by AsyncIterators and other language constructs.

Copy link
Member

@rvagg rvagg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not seeing a case in here for throwing more stuff into core when it could easily be done in userland. On top of that we have additional work going on around streams so I'd suggest adding more confusion to something that's in flux is a pretty bad idea.

if (mode === 'byob') {
// TODO(ricea): When BYOB readers are supported:
//
// Return ? AcquireReadableStreamBYOBReader(this).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This version doesn't even support BYOB readers...

@Fishrock123
Copy link
Contributor

In discussions, Trevor Norris tipped me off that WhatWG streams may not clean up resources correctly after errors when Tee'd (multiplexed). (Or perhaps even to the extent we'd like at all after errors?)

So far I'm largely unable to tell though, the complexity of this whole thing is unreasonable. (Which is why I have been approaching stream APIs completely differently.)

@devsnek
Copy link
Member Author

devsnek commented Sep 18, 2018

school is back up so i don't really have time to pursue this change anymore... if anyone else wants to use this diff feel free.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issues opened for discussions and feedbacks. lib / src Issues and PRs related to general changes in the lib or src directory. semver-major PRs that contain breaking changes and should be released in the next major version. wip Issues and PRs that are still a work in progress.
Projects
None yet
Development

Successfully merging this pull request may close these issues.