Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

string_decoder : support typed array or data view #22562

Closed
wants to merge 7 commits into from
Closed

string_decoder : support typed array or data view #22562

wants to merge 7 commits into from

Conversation

BeniCheni
Copy link
Contributor

@BeniCheni BeniCheni commented Aug 28, 2018

  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines

Fixes: #1826
Per this comment with the checklist, this PR attempts to help out the string_decoder scope from the check list. The PR supports explicitly TypedArray or DataView, insead of the original Uint8Array.

@nodejs-github-bot nodejs-github-bot added the string_decoder Issues and PRs related to the string_decoder subsystem. label Aug 28, 2018
@BeniCheni BeniCheni changed the title String decoder typed array data view string_decoder : support typed array or data view Aug 28, 2018
buf);
throw new ERR_INVALID_ARG_TYPE(
'buf',
['Buffer', 'TypedArray', 'DataView', 'ArrayBufferView'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're in here, does it make sense to alphabetize the list? Or is the existing ordering logical for other reasons?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the list makes sense in that Buffer will be the most common type of argument (by far), followed by TypedArrays.

I am not sure why ArrayBufferView is listed explicitly, since currently there are (afaik) only TypedArray and DataView ABVs.

@@ -91,7 +91,7 @@ decoder = new StringDecoder('utf8');
assert.strictEqual(decoder.write(Buffer.from('E1', 'hex')), '');

// A quick test for lastChar, lastNeed & lastTotal which are undocumented.
assert(decoder.lastChar.equals(new Uint8Array([0xe1, 0, 0, 0])));
assert(decoder.lastChar.equals(Buffer.from([0xe1, 0, 0, 0])));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To elaborate a bit: There's nothing in this change that breaks new Uint8Array() working in this context, so why is the test case being changed instead of a new test case being added?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't test cases be added for DataView and TypedArray? Do they even work properly? No code has been added in this PR to alter how they work with string_decoder. Were they working already or something?

Copy link
Contributor Author

@BeniCheni BeniCheni Aug 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was working with the suggestion of this comment here:

[ ] string_decoder
This list is for TypedArray/DataView support only.

I was trying not to use Unit8Array. Hopefully I understood the #1826 correctly.

If this Unit8Array doesn't need to change to Buffer to align with others, I can revert this change here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Buffer extends Uint8Array, I don't think this changed anything.

@@ -72,9 +72,11 @@ StringDecoder.prototype.write = function write(buf) {
if (typeof buf === 'string')
return buf;
if (!ArrayBuffer.isView(buf))
throw new ERR_INVALID_ARG_TYPE('buf',
['Buffer', 'Uint8Array', 'ArrayBufferView'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is Uint8Array being removed here?

Copy link
Contributor Author

@BeniCheni BeniCheni Aug 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the suggestion of this comment in #1826, "string_decoder" seems to be on the list TypedArray/DataView support only. After viewing the fs PR and the child_process, I just gave my best attempt to follow the "idea" to handle string_decode.

Please let me know if I misunderstood too much off from the context of the checklist & goals from the comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is okay and somewhat consistent with how we handle it elsewhere

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @BeniCheni , I think broadly we can categorize the changes for these PRs into:

  1. adding missing tests, for TypedArrays and DataViews wherever only Buffers or Uint8Arrays were tested earlier
  2. Updating code to work with these new types, where necessary(should mostly involve only JS side of things, in lib and maybe lib/internal)
  3. Updating documentation to reflect these changes.

Here, your PR is updating the errors being thrown, but is neither adding test cases nor adding functionality to work with the TypedArrays or DataViews. Please note, it might well be the case that string_decoder already works for these, but adding test cases is a way of validation for the same. Hope that helps!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙏 for the collective reviews. Will analyze & update accordingly.

Copy link
Member

@BridgeAR BridgeAR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM when ArrayBufferView is removed.

@BeniCheni
Copy link
Contributor Author

BeniCheni commented Sep 1, 2018

@ Trott: Shouldn't test cases be added for DataView and TypedArray?
@ SirR4T: adding missing tests, for TypedArrays and DataViews

Researched & wrote tests to cover TypedArray & DataView.

@ Trott: Why is Uint8Array being removed here?
@ devsnek: Buffer extends Uint8Array, I don't think this changed anything.

Researched more and updated to revert the previously incorrect removal of Uint8Array.

@ addaleax: I am not sure why ArrayBufferView is listed explicitly,
@ BridgeAR: LGTM when ArrayBufferView is removed.

Removed any context of ArrayBufferView in string_decoder source per suggestion.


Also confirmed if any context needed to update in string_decoder.md doc, but didn't find any necessary edit. Please suggest if I miss anything in the string_decoder.md doc. Please review again.

Copy link
Member

@BridgeAR BridgeAR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reconfirming my LG

@SirR4T
Copy link
Contributor

SirR4T commented Sep 1, 2018

Hi @BeniCheni , thanks for updating the tests! I think we could also update string_decoder.md, specifically for the .end() and the .write() function documentations, since they accept not just buffers, but other types too.

@BeniCheni
Copy link
Contributor Author

BeniCheni commented Sep 1, 2018

Hi @SirR4T, updated doc per your suggestion.

Notice the last CI instance failed with just a doc change in the commit. Awaiting to get help/advice from other reviewers to validate the failed CI instance.

Thoughts, any reviewer(s)?

@SirR4T
Copy link
Contributor

SirR4T commented Sep 1, 2018

Hi @BeniCheni , doc changes look good to me, thanks!

But as I'm myself a new enough contributor here, I'll refrain from approving PRs, in deference to someone with more review experience.

As for the CI failure, it doesn't seem related to your changes, but instead looks like a certificate failure from npm. This may or may not be related to npm's current issue at the time of this post.

@addaleax
Copy link
Member

addaleax commented Sep 2, 2018

@addaleax addaleax added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Sep 2, 2018
@addaleax
Copy link
Member

addaleax commented Sep 2, 2018

@addaleax
Copy link
Member

addaleax commented Sep 2, 2018

Resume CI: https://ci.nodejs.org/job/node-test-pull-request/16971/ (:heavy_check_mark:)

@@ -97,6 +97,21 @@ assert.strictEqual(decoder.lastTotal, 3);

assert.strictEqual(decoder.end(), '\ufffd');

// TypedArray with Int8Array test
decoder = new StringDecoder('utf8');
assert.strictEqual(decoder.write(new Int8Array([1, 2])), '\u0001\u0002');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer this test use getArrayBufferViews() to cover all possible TypedArray types and DataView. An example usage of this function by @SirR4T is

for (const expectView of common.getArrayBufferViews(inputBuffer)) {
console.log('Sync test for ', expectView[Symbol.toStringTag]);
fs.writeFileSync(filename, expectView);
assert.strictEqual(
fs.readFileSync(filename, 'utf8'),
inputBuffer.toString('utf8')
);
}

@BeniCheni
Copy link
Contributor Author

BeniCheni commented Sep 5, 2018

@ TimothyGu:
I'd prefer this test use getArrayBufferViews() to cover all possible TypedArray types and DataView.

Hi, @TimothyGu. Updated to use the common getArrayBufferViews util to cover possible types. Please review at convenience? Thanks.

const arrayBufferViewStr = 'String for ArrayBufferView tests\n';
const inputBuffer = Buffer.from(arrayBufferViewStr.repeat(8), 'utf8');
for (const expectView of common.getArrayBufferViews(inputBuffer)) {
console.log(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the console.log() statement is part of the test, please add a comment indicating such. If it is not, I'd prefer to use a comment instead of having the extraneous console output.

@BeniCheni
Copy link
Contributor Author

@ jasnell
If the console.log() statement is part of the test, please add a comment indicating such. If it is not, I'd prefer to use a comment instead of having the extraneous console output.

Hello @jasnell,
I've checked & removed the unneeded console log message from the string_decode test code. Please review at convenience. Thanks.

@BeniCheni
Copy link
Contributor Author

Ping/Dear reviewers,

Would you mind helping to guide this “author ready” PR to landing?

(Made all suggested updates according to previous review comments; thanks!)

@addaleax
Copy link
Member

@addaleax
Copy link
Member

Landed in e2325bc

@addaleax addaleax closed this Sep 17, 2018
addaleax pushed a commit that referenced this pull request Sep 17, 2018
Refs: #1826
PR-URL: #22562
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Trivikram Kamat <trivikr.dev@gmail.com>
@BeniCheni BeniCheni deleted the string-decoder-typed-array-data-view branch September 18, 2018 02:56
targos pushed a commit that referenced this pull request Sep 18, 2018
Refs: #1826
PR-URL: #22562
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Trivikram Kamat <trivikr.dev@gmail.com>
targos pushed a commit that referenced this pull request Sep 19, 2018
Refs: #1826
PR-URL: #22562
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Trivikram Kamat <trivikr.dev@gmail.com>
targos pushed a commit that referenced this pull request Sep 20, 2018
Refs: #1826
PR-URL: #22562
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Trivikram Kamat <trivikr.dev@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
author ready PRs that have at least one approval, no pending requests for changes, and a CI started. string_decoder Issues and PRs related to the string_decoder subsystem.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants