Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

child_process: make the maxBuffer correct with unicode #1902

Closed
wants to merge 1 commit into from

Conversation

JacksonTian
Copy link
Contributor

The maxLen just caculate the string length, but to unicode string,
the string size is less than buffer size.

@mscdex mscdex added the child_process Issues and PRs related to the child_process subsystem. label Jun 5, 2015
@cjihrig
Copy link
Contributor

cjihrig commented Jun 5, 2015

Would it work (and be simpler) to just use Buffer.byteLength() here and here?

@cjihrig
Copy link
Contributor

cjihrig commented Jun 5, 2015

Ref: #1901

@JacksonTian
Copy link
Contributor Author

caculate byte length maybe make performance slow. Would like to hear more suggestion.

@trevnorris
Copy link
Contributor

Ditto on the byteLength() call. That's the (or at least should be) "standard" way.

@JacksonTian JacksonTian force-pushed the max_buffer branch 2 times, most recently from d609ac1 to d90d98c Compare June 6, 2015 06:06
@JacksonTian
Copy link
Contributor Author

Thanks @cjihrig @trevnorris . Use byteLength now.

@@ -230,7 +230,11 @@ exports.execFile = function(file /* args, options, callback */) {
}

child.stdout.addListener('data', function(chunk) {
stdoutLen += chunk.length;
if (!encoding) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the docs, the unit of maxBuffer is bytes, so I think these should unconditionally use Buffer.byteLength().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JacksonTian missed this one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when don't set the encoding, the chunk is a buffer.

@JacksonTian
Copy link
Contributor Author

@cjihrig updated.

@JacksonTian
Copy link
Contributor Author

ping @nodejs/collaborators

@cjihrig
Copy link
Contributor

cjihrig commented Jun 8, 2015

@yosuke-furukawa
Copy link
Member

LGTM

@cjihrig
Copy link
Contributor

cjihrig commented Jun 8, 2015

CI is not happy.

@yosuke-furukawa
Copy link
Member

@JacksonTian
Copy link
Contributor Author

the windows doesn't like unicode. hmmm.

@JacksonTian
Copy link
Contributor Author

I think maybe reached an edge case for windows platform.

@JacksonTian
Copy link
Contributor Author

Anyone can help me to restart the CI?

@jbergstroem
Copy link
Member

@cjihrig
Copy link
Contributor

cjihrig commented Jun 10, 2015

The test failures seem to be related to a known issue on Windows - nodejs/node-v0.x-archive#7940.

@piscisaureus is this something that can be skipped on Windows during testing?

@JacksonTian
Copy link
Contributor Author

can we skip the test case for win32? ping @piscisaureus

@brendanashworth
Copy link
Contributor

You can probably skip the test case with something like the crypto skip, but with common.isWindows.

@JacksonTian
Copy link
Contributor Author

anyone can help me to restart the CI?

@bnoordhuis
Copy link
Member

I think what @piscisaureus is getting at is that partial character sequences throw off the length calculation; the .setEncoding(encoding) call makes the stream convert bytes to characters and incomplete multi-byte sequences get buffered by the StringDecoder until more data is available (which may never come so it's imprecise at best.)

Apart from that stipulation the patch looks correct to me. It's rather inefficient though: we first convert from bytes to characters and then we convert that back to bytes for the length calculation.

@JacksonTian
Copy link
Contributor Author

@bnoordhuis We can don't set encoding, so don't use StringDecoder, just only encode buffer before execute callback.

@JacksonTian JacksonTian force-pushed the max_buffer branch 3 times, most recently from aa6a9d7 to 8e08e84 Compare December 17, 2015 06:35
@bnoordhuis
Copy link
Member

I think you will still need a StringDecoder instance to deal with partial character sequences. If you encode through just buf.toString(encoding), it's either going to throw an exception or lose bytes.

I don't think we can just remove the .setEncoding() calls; that would almost certainly break existing code. Perhaps it's possible to add a fast path to stream.Readable, presumably in readableAddChunk(), although that code is already a twisty little maze of if statements.

@JacksonTian
Copy link
Contributor Author

With the commit bfb2cd0 be landed, the PR can be implemented more light-weight.

maxBuffer: 10
}, common.mustCall(function(err, stdout, stderr) {
assert(err);
assert.equal(err.message, 'stdout maxBuffer exceeded');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use assert.strictEqual() here. Also, you can drop the first assertion.

@cjihrig
Copy link
Contributor

cjihrig commented Jan 15, 2016

LGTM. This should hopefully avoid any encoding issues.

@bnoordhuis @piscisaureus @trevnorris thoughts?

The maxLen just caculate the string length, but to unicode string,
the string size is less than buffer size.
@Trott
Copy link
Member

Trott commented Mar 10, 2016

Would it be possible to get another round of reviews from some combination of @bnoordhuis @piscisaureus and @trevnorris ? It would be nice to see this finally land if the objections have been sufficiently addressed. (And if not, it would be nice to have the objections documented or reiterated.)

@Trott
Copy link
Member

Trott commented Mar 10, 2016

Nit: Can you add Fixes: https://github.com/nodejs/node/issues/1901 metadata line to the commit message?

@Trott
Copy link
Member

Trott commented Mar 10, 2016

Nit: Can you find a way to shorten the first line of the commit message so that it is 50 characters or less?

@trevnorris
Copy link
Contributor

For the test would it be possible to spawn a child node process and write to its stderr/stdout? That should work for all platforms.

@bnoordhuis
Copy link
Member

It looks like the test still needs an update.

@JacksonTian
Copy link
Contributor Author

bfb2cd0 was reverted, this issue needs hold.

@Trott Trott added the stalled Issues and PRs that are stalled. label May 14, 2016
@jasnell
Copy link
Member

jasnell commented Jun 7, 2016

I believe this can be closed now given that #6764 landed.

@jasnell jasnell closed this Jun 7, 2016
@JacksonTian
Copy link
Contributor Author

thanks @jasnell @Trott

@JacksonTian JacksonTian deleted the max_buffer branch May 4, 2017 08:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
child_process Issues and PRs related to the child_process subsystem. stalled Issues and PRs that are stalled.
Projects
None yet
Development

Successfully merging this pull request may close these issues.