Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stream: support Uint8Array input to methods #11608

Closed
wants to merge 2 commits into from

Conversation

addaleax
Copy link
Member

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines
Affected core subsystem(s)

streams

$ ./node benchmark/compare.js --new ./node --old ./node-d08836003c57 --runs 5 streams | Rscript benchmark/compare.R
[00:10:00|% 100| 6/6 files | 10/10 runs | 1/1 configs]: Done
                                          improvement confidence      p.value
 streams/readable-bigread.js n=1000           20.95 %        *** 8.399686e-07
 streams/readable-bigunevenread.js n=1000     18.08 %        *** 1.672017e-06
 streams/readable-boundaryread.js n=2000      15.66 %         ** 1.908462e-03
 streams/readable-readall.js n=5000           42.03 %        *** 1.686883e-08
 streams/readable-unevenread.js n=1000         2.66 %            3.173964e-01
 streams/writable-manywrites.js n=2000000     13.60 %        *** 4.770480e-05

(Yes, those are improvements, not losses… I find it odd that instanceof is so heavy compared to isUint8Array, but this is what the benchmarks say, and I won’t complain. d088360 is the master head at the time of writing.)

@nodejs/streams

@addaleax addaleax added buffer Issues and PRs related to the buffer subsystem. semver-minor PRs that contain new features and should be released in the next minor version. stream Issues and PRs related to the stream subsystem. labels Feb 28, 2017
@nodejs-github-bot nodejs-github-bot added buffer Issues and PRs related to the buffer subsystem. stream Issues and PRs related to the stream subsystem. labels Feb 28, 2017
lib/stream.js Outdated
}

const version = process.version.substr(1).split('.');
if (version[0] <= 0 && version[1] < 12) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would the major version be less than zero?

Also, why are we adding an explicit check for such old versions of node (v0.10 and older) if they're no longer supported?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because readable-stream still supports them and removing that support would at least be a breaking change…

When would the major version be less than zero?

Never, I can update this to ===

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh... I can't help but think readable-stream should polyfill this instead of node core...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If @nodejs/streams is willing to do that I’m all for it.

- version: v6.0.0
pr-url: https://github.com/nodejs/node/pull/6170
description: Passing `null` as the `chunk` parameter will always be
considered invalid now, even in object mode.
-->

* `chunk` {String|Buffer} The data to write
* `chunk` {String|Buffer|Uint8Array} The data to write
Copy link
Contributor

@mscdex mscdex Feb 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit unrelated, but this doesn't match the chunk description of .end() which is confusing, specifically the 'any' part.

-->

* `chunk` {Buffer|String} Chunk of data to unshift onto the read queue
* `chunk` {Buffer|Uint8Array|String} Chunk of data to unshift onto the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto here about the chunk description and missing 'any' I think.


* `chunk` {Buffer|Null|String} Chunk of data to push into the read queue
* `chunk` {Buffer|Uint8Array|Null|String} Chunk of data to push into
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another unrelated nit: it seems like 'Null' should be all lowercase since there is no such constructor.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By that logic String should also be lowercase since it doesn't mean the String constructor.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seishun I agree, but I recall others possibly arguing for the uppercase version in the past? There might need to be changes to the doc tool...

@addaleax
Copy link
Member Author

@mscdex addressed your comments, PTAL

@addaleax
Copy link
Member Author

@seishun right… should’ve seen that. done!

@addaleax
Copy link
Member Author

@jasnell
Copy link
Member

jasnell commented Feb 28, 2017

Overall this LGTM but I'd definitely prefer more attention and signoff from @nodejs/streams folks


// Internal utilities
try {
Stream._isUint8Array = process.binding('util').isUint8Array;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move this into internal/util?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joyeecheung The thing is that the streams modules can’t really unconditionally rely on internal modules because they are also exported as https://github.com/nodejs/readable-stream. @mscdex expressed a preference for having this code polyfilled on the readable-stream side and I agree, but ultimately that’s @nodejs/streams’ call …

(Please let me know if I’m misunderstanding you!)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, forgot about the vendoring out part :) Nevermind then!

@mcollina
Copy link
Member

mcollina commented Mar 1, 2017

Before LGTMing this I would like to consider if we have alternative fast approaches here to avoid instanceof, maybe @fhinkel can check things out.

I'm not 100% on board with accepting UInt8Array as input, as it changes the behavior of when decoding is not enabled. Also, there are plenty of objectMode streams in the community leveraging Buffer objects. If we go ahead with this, it is semver-major for me (also in readable-stream).

Regarding readable-stream compatibility, I am afraid that Object.getPrototypeOf(chunk) !== Buffer.prototype might cause issues and we would likely have to rewrite this line. Plus, Buffer.prototype.slice.call(chunk) is likely to cause perf regressions on old node versions.
A safer approach might be to completely swap out UInt8Array support in readable-stream. @calvinmetcalf what do you think?

@addaleax
Copy link
Member Author

addaleax commented Mar 1, 2017

I would like to consider if we have alternative fast approaches here to avoid instanceof

What are the semantics that you would like to see in such an alternative?

I'm not 100% on board with accepting UInt8Array as input, as it changes the behavior of when decoding is not enabled.

Could you give an example for that? I tried to specifically avoid that.

Also, there are plenty of objectMode streams in the community leveraging Buffer objects.

Why would they be affected by this change?

I am afraid that Object.getPrototypeOf(chunk) !== Buffer.prototype might cause issues and we would likely have to rewrite this line.

What does readable-stream use as Buffers in non-Node environments?

Plus, Buffer.prototype.slice.call(chunk) is likely to cause perf regressions on old node versions.

This only hits when the input is not already a Buffer, so I’m not sure why this would be relevant.

A safer approach might be to completely swap out UInt8Array support in readable-stream

Does that mean skipping this PR for readable-stream in some way?

@mcollina
Copy link
Member

mcollina commented Mar 3, 2017

I would like to consider if we have alternative fast approaches here to avoid instanceof

What are the semantics that you would like to see in such an alternative?

To some extent, I do not care. I care about the speed improvement this brings without breaking any existing code, plus readable-stream.
Also, it should be something we can easily polyfill for older nodes, i.e. switch back to instanceof.

I'm not 100% on board with accepting UInt8Array as input, as it changes the behavior of when decoding is not enabled.

Could you give an example for that? I tried to specifically avoid that.

Here is the difference:

class MyTransform {

  constructor(opts) {
    super(opts)
  }

  transform(chunk, enc, cb) {
    // chunk can be both a UInt8Array or a Buffer
    // because this stream can be set in objectMode or not
    cb()
  })
}

I know it might be a very narrow case, but there is a lot of code where objectMode is a passed-through option.

Plus, Buffer.prototype.slice.call(chunk) is likely to cause perf regressions on old node versions.

This only hits when the input is not already a Buffer, so I’m not sure why this would be relevant.

What about

  return Object.prototype.toString.call(obj) === '[object Uint8Array]';

What does readable-stream use as Buffers in non-Node environments?

https://github.com/feross/buffer

A safer approach might be to completely swap out UInt8Array support in readable-stream

Does that mean skipping this PR for readable-stream in some way?

Yes

@addaleax addaleax force-pushed the stream-uint8array branch 2 times, most recently from b282929 to 5105ec0 Compare March 3, 2017 08:19
@addaleax
Copy link
Member Author

addaleax commented Mar 3, 2017

Here is the difference:

Thanks for pointing that out, that was a bug in this PR and not intentional. Fixed!

Plus, Buffer.prototype.slice.call(chunk) is likely to cause perf regressions on old node versions.

This only hits when the input is not already a Buffer, so I’m not sure why this would be relevant.

What about

 return Object.prototype.toString.call(obj) === '[object Uint8Array]';

The _uint8ArrayToBuffer call is still guarded by Object.getPrototypeOf(chunk) !== Buffer.prototype

What does readable-stream use as Buffers in non-Node environments?

https://github.com/feross/buffer

Right, this might indeed get tricky for platforms that don’t have Uint8Array… I’ll think about that.

@fhinkel
Copy link
Member

fhinkel commented Mar 3, 2017

Looking into the instanceof vs Stream._isUint8Array performance difference.

@addaleax
Copy link
Member Author

addaleax commented Mar 5, 2017

Just read @joyeecheung’s comment in #11690 (review) and checked the benchmarks on canary instead of master:

$ ./node benchmark/compare.js --new ./node --old ./node-ab7a25b93edb --runs 5 streams | Rscript benchmark/compare.R
[00:07:42|% 100| 6/6 files | 10/10 runs | 1/1 configs]: Done
                                          improvement confidence      p.value
 streams/readable-bigread.js n=1000            5.69 %        *** 2.415267e-04
 streams/readable-bigunevenread.js n=1000      6.76 %        *** 5.907977e-05
 streams/readable-boundaryread.js n=2000       4.13 %            7.522906e-02
 streams/readable-readall.js n=5000            5.27 %        *** 1.201201e-04
 streams/readable-unevenread.js n=1000         0.95 %            3.617545e-01
 streams/writable-manywrites.js n=2000000     -0.39 %            6.255211e-01

So this would still seem okay from a performance viewpoint on V8 5.8.

But anyway, I’m not proposing this change because of performance but because I think Uint8Array support is actually valuable for Node’s API.

@mcollina
Copy link
Member

mcollina commented Mar 5, 2017

@addaleax So, I'd like to have the speed boost in core, but on the other side I am not 100% in with the support of UInt8Array.

What's the use case to support UInt8Array? Most users leverage .pipe(), and after the first pass in write() the chunk is converted into a Buffer anyway.

IMHO, we should expose _uint8ArrayToBuffer as it seems to be the fastest way to do this.

@addaleax
Copy link
Member Author

addaleax commented Mar 6, 2017

@mcollina So … for the other modules where I’m doing this, it makes sense because:

  • There’s not really any reason not to support generic Uint8Arrays
  • The isUint8Array check is the accurate check when it comes to what the C++ bindings actually need
  • This kind of API change has been suggested in buffer: discuss future direction of Buffer constructor API #9531 and I don’t think that’s a silly approach; if Node didn’t already have Buffer as part of its API it wouldn’t introduce it now and just use typed arrays instead
  • And finally the small performance gain by using the C++ check instead of letting instanceof iterate over the prototype chain.

I realize that these things don’t apply to streams the same way they do for other parts of the API, but I’d still say that there’s no real reason not to support Uint8Arrays.

Maybe other @nodejs/collaborators have some thoughts to offer on whether we should do this, or a general 👍/👎 feeling?

IMHO, we should expose _uint8ArrayToBuffer as it seems to be the fastest way to do this.

We could expose FastBuffer directly if we really want to; otherwise that’s basically already exposed as Buffer.prototype.slice().

@fhinkel
Copy link
Member

fhinkel commented Mar 6, 2017

Not really part of the discussion anymore, just fyi, I looked the into performance of instance of vs Stream._isUint8Array. instance of is doing an (EcmaScript spec compliant) instance of check, which is quite involved, even if optimized. _isUint8Array calls a V8 API function v8::IsUint8Array, which is a quick cast and simple check internally. So the benchmark results look reasonable.

@mcollina
Copy link
Member

mcollina commented Mar 6, 2017

@addaleax all the streams API speaks buffers. Specially the _read, _write and _transform method, will keep supporting only Buffer for the time being. I think adding support for UInt8Array for some of the API will just make things more confusing for the end user.

I think we should talk about this with the rest of @nodejs/streams.

@addaleax
Copy link
Member Author

addaleax commented Mar 6, 2017

I think adding support for UInt8Array for some of the API will just make things more confusing for the end user.

Right – that’s kind of my point. 😄 It makes sense for the rest of Node core’s API, therefore it would be consistent and less confusing if that was the same for streams.

Specially the _read, _write and _transform method, will keep supporting only Buffer for the time being.

Yeah, of course, all the APIs that give back Buffers to user code will need to keep doing so. To me this means that they already support Uint8Arrays, though? Just not very explicitly…

I think we should talk about this with the rest of @nodejs/streams.

Yes! :)

@mcollina
Copy link
Member

mcollina commented Mar 6, 2017

I think adding support for UInt8Array for some of the API will just make things more confusing for the end user.

Right – that’s kind of my point. 😄 It makes sense for the rest of Node core’s API, therefore it would be consistent and less confusing if that was the same for streams.

So it might make sense from core's point of view, but not on require('streams') alone. At least we framed the problem.

Specially the _read, _write and _transform method, will keep supporting only Buffer for the time being.

Yeah, of course, all the APIs that give back Buffers to user code will need to keep doing so. To me this means that they already support Uint8Arrays, though? Just not very explicitly…

In Node.js, yes. However the whole streams ecosystem does not rely on Buffer inheriting from UInt8Array, see https://github.com/feross/buffer/blob/master/index.js#L21-L35.

Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mcollina
Copy link
Member

@addaleax this was approved during the WG meeting in Berlin! Would you mind updating the conflicting doc and land this? :)

@mcollina mcollina added the notable-change PRs with changes that should be highlighted in changelogs. label May 22, 2017
@mcollina
Copy link
Member

@addaleax can you rebase? I think we can release this.

@addaleax
Copy link
Member Author

@addaleax
Copy link
Member Author

Landed in 9f610b5

@addaleax addaleax closed this May 27, 2017
@addaleax addaleax deleted the stream-uint8array branch May 27, 2017 09:20
addaleax added a commit that referenced this pull request May 27, 2017
PR-URL: #11608
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
@mcollina
Copy link
Member

Fantastic!

jasnell pushed a commit that referenced this pull request May 28, 2017
PR-URL: #11608
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
@jasnell jasnell mentioned this pull request May 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
buffer Issues and PRs related to the buffer subsystem. notable-change PRs with changes that should be highlighted in changelogs. semver-minor PRs that contain new features and should be released in the next minor version. stream Issues and PRs related to the stream subsystem.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants