Skip to content
This repository has been archived by the owner on Apr 22, 2023. It is now read-only.

Buffer issues with external two bytes string #8683

Closed
ghostoy opened this issue Nov 6, 2014 · 3 comments
Closed

Buffer issues with external two bytes string #8683

ghostoy opened this issue Nov 6, 2014 · 3 comments

Comments

@ghostoy
Copy link

ghostoy commented Nov 6, 2014

Node Buffer doesn't work correctly with two bytes external strings. Once passing such string to Buffer constructor, the content of the Buffer is just first half of the string in UCS2 encoding.

Steps to reproduce:

  1. Run node with --expose_externalize_string flag
  2. Run following code
var str = 'ABCD';
var buf = new Buffer(str);
console.log(buf.toString('hex')); // 41424344
externalizeString(str, true); // force two bytes string
var extbuf = new Buffer(str);
console.log(extbuf.toString('hex')); // 41004200

As documented for new Buffer(str, [encoding]):

new Buffer(str, [encoding])

  • str String - string to encode.
  • encoding String - encoding to use, Optional.

Allocates a new buffer containing the given str. encoding defaults to 'utf8'.

Therefore the expected content of extbuf should also be encoded in UTF8 (i.e. 41424344 in this case) for any JS string by default.

This issue should be the root cause of an node-webkit issue nwjs/nw.js#2439

@vkurchatkin
Copy link

Interesting variation:

var str = 'ABCD';
var buf = new Buffer(str);
console.log(buf.toString('hex')); // 41424344
externalizeString(str, true); // force two bytes string
var extbuf = new Buffer(str);
console.log(extbuf.toString('hex')); // 41004200

var buf = new Buffer('ABCD');
console.log(buf.toString('hex')); // 41004200

Although second 'ABCD' was not explicitly externalized, it still is. Only works for same string literals, so probably this is an effect of constant pool in action

@vkurchatkin
Copy link

This line causes the error : https://github.com/joyent/node/blob/master/src/string_bytes.cc#L331 Probably memcpy should be used only for one byte external strings

@ghostoy
Copy link
Author

ghostoy commented Nov 6, 2014

Also this line should return doubled length: https://github.com/joyent/node/blob/master/src/string_bytes.cc#L285

ghostoy pushed a commit to ghostoy/node that referenced this issue Nov 20, 2014
@trevnorris trevnorris self-assigned this Feb 11, 2015
bnoordhuis added a commit to bnoordhuis/io.js that referenced this issue Mar 5, 2015
StringBytes::Write() did a plain memcpy() when is_extern is true but
that's wrong when the source is a two-byte string and the destination
a one-byte or UTF-8 string.

The impact is limited to strings > 1,031,913 bytes because those are
normally the only strings that are externalized, although the use of
the 'externalize strings' extension (--expose_externalize_string) can
also trigger it.

This commit also cleans up the bytes versus characters confusion in
StringBytes::Write() because that was closely intertwined with the
UCS-2 encoding regression.  One wasn't fixable without the other.

Fixes: nodejs#1024
Fixes: nodejs/node-v0.x-archive#8683
PR-URL: nodejs#1042
Reviewed-By: Trevor Norris <trev.norris@gmail.com>
@trevnorris trevnorris removed their assignment Feb 18, 2016
richardlau pushed a commit to ibmruntimes/node that referenced this issue Jan 21, 2017
Part of nodejs#8683, increase coverage of the internal
state machine of streams.

PR-URL: nodejs/node#10249
See: nodejs/node#8683
See: nodejs/node#10230
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
gibfahn pushed a commit to ibmruntimes/node that referenced this issue Feb 22, 2017
Part of nodejs#8683, increase coverage of the internal
state machine of streams.

PR-URL: nodejs/node#10249
See: nodejs/node#8683
See: nodejs/node#10230
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
gibfahn pushed a commit to ibmruntimes/node that referenced this issue Feb 22, 2017
Part of nodejs#8683, increase coverage of the internal
state machine of streams.

PR-URL: nodejs/node#10249
See: nodejs/node#8683
See: nodejs/node#10230
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
@Trott Trott closed this as completed Apr 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants