buffer: implement WHATWG Encoding Standard API #13644

jasnell · 2017-06-13T03:19:20Z

Provide an (initially experimental) implementation of the WHATWG Encoding Standard API (TextDecoder and TextEncoder). The is the same API implemented on the browser side.

By default, with small-icu, only the UTF-8, UTF-16le and UTF-16be decoders are supported. With full-icu enabled, every encoding other than iso-8859-16 is supported.

This provides a basic test, but does not include the full web platform tests. Note: many of the web platform tests for this would fail by default because we ship with small-icu by default.

The implementation is added without changing any of the existing encoding support in core.

Refs: https://encoding.spec.whatwg.org/

/cc @domenic @TimothyGu @addaleax

Checklist

make -j4 test (UNIX), or vcbuild test (Windows) passes
tests and/or benchmarks are included
documentation is changed or added
commit message follows commit guidelines

Affected core subsystem(s)

buffer

mscdex · 2017-06-13T03:57:01Z

What's special about iso-8859-16?

jasnell · 2017-06-13T04:20:59Z

It's not implemented by ICU at all, even with full-icu

TimothyGu

Some preliminary comments. I'll need some time to digest the intricacies of BOM and the various flags related to it.

TimothyGu · 2017-06-13T08:04:11Z

lib/internal/encoding.js

+
+function normalizeEncoding(encoding) {
+  if (encoding === undefined)
+    return 'utf-8';


This function doesn't have to handle defaults -- nor should it. The two places that use this function both ensure encoding !== undefined. I also somewhat prefer the signature function getEncodingFromLabel(label) to align better with the spec.

TimothyGu · 2017-06-13T08:05:02Z

lib/internal/encoding.js

+    return 'utf-8';
+  let enc = encodings.get(encoding);
+  if (enc !== undefined) return enc;
+  enc = encodings.get(encoding.toLowerCase());


ASCII whitespace (and ASCII whitespace only, not Unicode) needs to be trimmed as well per step 1 of the spec.

TimothyGu · 2017-06-13T08:05:30Z

lib/internal/encoding.js

+  let enc = encodings.get(encoding);
+  if (enc !== undefined) return enc;
+  enc = encodings.get(encoding.toLowerCase());
+  if (enc !== undefined) return enc;


Just return enc w/o if should be fine and simpler.

TimothyGu · 2017-06-13T08:07:05Z

src/node_i18n.cc

+
+    UErrorCode status = U_ZERO_ERROR;
+    UConverter* conv = ucnv_open(*label, &status);
+    args.GetReturnValue().Set(U_SUCCESS(status) == 1);


How about just !!U_SUCCESS(status)? The == 1 looks kinda fragile.

TimothyGu · 2017-06-13T08:12:30Z

lib/internal/encoding.js

+    if (handle === undefined)
+      throw new errors.Error('ERR_ENCODING_NOT_SUPPORTED', encoding);
+
+    Object.defineProperties(this, {


Object.defineProperties is pretty slow to be used on per object construction. I don't see any harm in making kHandle configurable, enumerable, or writable, as it's a symbol property already (and it's what we do for URL). encoding, fatal, and ignoreBOM should all be defined on the prototype as getter functions per spec.

TimothyGu · 2017-06-13T08:28:47Z

src/node_i18n.cc

+    ConverterObject* converter;
+    ASSIGN_OR_RETURN_UNWRAP(&converter, args[0].As<Object>());
+    SPREAD_BUFFER_ARG(args[1], input_obj);
+    int flags = args[2]->Uint32Value();


Use the non-deprecated Uint32Value(Local<Context>) signature of the function.

TimothyGu · 2017-06-13T08:31:53Z

src/node_i18n.cc

+    }
+
+    Local<ObjectTemplate> t = ObjectTemplate::New(env->isolate());
+    t->SetInternalFieldCount(1);


Independent of this PR, we should find a way to cache this ObjectTemplate.

Potentially dumb question: why not use ObjectWrap?

Ping this question...

For no particular reason other than ObjectWrap really isn't used anywhere else in core. I've actually been contemplating whether or not it is something we could eliminate at some point.

TimothyGu · 2017-06-13T08:33:50Z

src/node_i18n.cc

+                   NULL, flush, &status);
+
+    if (U_SUCCESS(status)) {
+      result.SetLength(target - &result[0]);


SetLength, too, takes number of entries as argument, not number of bytes.

TimothyGu · 2017-06-13T08:38:27Z

tools/icu/icu-generic.gyp

@@ -35,7 +35,7 @@
             'defines': [
                # ICU cannot swap the initial data without this.
                # http://bugs.icu-project.org/trac/ticket/11046


Does the comment still apply?

Nope. Was planning to remove the whole section.

TimothyGu · 2017-06-13T08:40:14Z

doc/api/buffer.md

+Per the [WHATWG Encoding Standard][], the encodings supported by the
+`TextDecoder` API are outlined in the table below. For each encoding,
+one or more aliases may be used. Support for some encodings is enabled
+only when Node.js is using the full ICU data.


Is it documented anywhere how to get full ICU?

https://github.com/nodejs/node/blob/master/BUILDING.md#intl-ecma-402-support
+
https://github.com/nodejs/node/wiki/Intl (there is a link to it in the first fragment).

This should be better documented.

TimothyGu · 2017-06-13T08:58:45Z

lib/internal/encoding.js

+
+  encode(input) {
+    const buf = lazyBuffer().from(String(input));
+    return new Uint8Array(buf.buffer, buf.offset, buf.length);


Per spec, the returned Uint8Array must be unpooled and have an entire block of ArrayBuffer dedicated to it.

vsemozhetbyt · 2017-06-13T11:33:08Z

doc/api/buffer.md

-`http.get()`, if the returned charset is one of those listed in the WHATWG spec
-it's possible that the server actually returned win-1252-encoded data, and
-using `'latin1'` encoding may incorrectly decode the characters.
+*Note*: Today's browsers follow the [WHATWG Encoding Standard] which aliases


[WHATWG Encoding Standard][] (+ second pair of brackets) for consistency?

vsemozhetbyt · 2017-06-13T11:38:14Z

doc/api/buffer.md

+const decoder = new TextDecoder('shift_jis');
+let string = '';
+let buffer;
+while(buffer = getNextChunkSomehow()) {


while( -> while ( ?

vsemozhetbyt · 2017-06-13T11:38:57Z

doc/api/buffer.md

+let string = '';
+let buffer;
+while(buffer = getNextChunkSomehow()) {
+  string += decoder.decode(buffer, { stream:true });


stream:true -> stream: true ?

vsemozhetbyt · 2017-06-13T11:57:14Z

doc/api/buffer.md

+    `false`.
+  * `ignoreBOM` {boolean} When `true`, the `TextDecoder` will include the byte
+     order mark in the decoded result. This option is only used when `encoding`
+     is `'utf-8'`, `'utf-16be'` or `'utf-16le'`.


Note about defaults (false = strip the BOM)?

(false = strip the BOM)

That's correct.

var dec = new TextDecoder('utf-16be', { ignoreBOM: true }); [...dec.decode(new Uint8Array([0xfe, 0xff, 0x00, 0x20]))].map(str => str.codePointAt(0).toString(16)); // ["feff", "20"]

var dec = new TextDecoder('utf-16be', { ignoreBOM: false }); [...dec.decode(new Uint8Array([0xfe, 0xff, 0x00, 0x20]))].map(str => str.codePointAt(0).toString(16)); // ["20"]

I mean maybe it is worth to add a note about the default value.

vsemozhetbyt · 2017-06-13T12:13:26Z

test/parallel/test-whatwg-encoding.js

+
+const common = require('../common');
+const assert = require('assert');
+const { Buffer, TextDecoder, TextEncoder } = require('buffer');


Why we need to require Buffer if we state in the doc:

The Buffer class is a global within Node.js, making it unlikely that one would need to ever use require('buffer').Buffer

It's in a lint rule we use.

@TimothyGu Could you elaborate?

https://github.com/nodejs/node/blob/master/tools/eslint-rules/require-buffer.js

(added in #1794)

Why do we use it only in the lib/.eslintrc.yaml and not in the test/.eslintrc.yaml?

I'm guessing here: tests are supposed to be self-contained scripts, while everything in lib is user-visible and -contaminatable, like a seemingly innocent var Buffer = {} line in REPL. See nodejs/node-convergence-archive#21.

@TimothyGu Thank you.

jasnell · 2017-06-13T18:45:02Z

Updated

sam-github · 2017-06-13T18:48:51Z

doc/api/buffer.md

+one or more aliases may be used. Support for some encodings is enabled
+only when Node.js is using the full ICU data.
+
+<table>


If https://help.github.com/articles/organizing-information-with-tables/ works as advertised, github markdown tables would be easier to read in-source and to review.

As far as I understand (and I do not see anything in the referenced link) .. markdown tables do not support rowspan.

sam-github · 2017-06-13T18:52:25Z

doc/api/buffer.md

+* `options` {object}
+  * `fatal` {boolean} `true` if decoding failures are fatal. Defaults to
+    `false`.
+  * `ignoreBOM` {boolean} When `true`, the `TextDecoder` will include the byte


I wonder if withBOM (EDIT: or leaveBOM) would make more sense for this, if its not a fixed name due to the spec? I thought the docs might have had a typo and reversed the sense, because if the BOM is ignored it wouldn't be in the output, then I thought about it, and realized the sense of ignore is "don't remove from input stream if present". I'm not sure what would happen if the BOM is not in the input stream, though, would it get added to the decoded result?

ignoreBOM is fixed within the spec. To change this would require a change in the spec.

Understood. I know this is all experimental, but FYI, its not clear to me from current docs whether a BOM is added if not present, or just passed through ("ignored") if present.

TimothyGu · 2017-06-13T23:58:53Z

lib/internal/encoding.js

+      throw new errors.TypeError('ERR_INVALID_ARG_TYPE', 'input',
+                                 ['ArrayBuffer', 'ArrayBufferView']);
+    }
+    if (options === null ||


The Web IDL spec treats null the same way as undefined. See https://heycam.github.io/webidl/#es-dictionary, step 4.1.2.

TimothyGu · 2017-06-14T00:00:19Z

lib/internal/encoding.js

+      throw new errors.TypeError('ERR_INVALID_ARG_TYPE', 'options', 'object');
+    }
+
+    var flags = (options.stream === true) ? 0 : CONVERTER_FLAGS_FLUSH;


IDL recognizes all truthy values as true, so just options.stream ?.

TimothyGu · 2017-06-14T00:02:39Z

lib/internal/encoding.js

+class TextDecoder {
+  constructor(encoding = 'utf-8', options = {}) {
+    if (typeof encoding !== 'string')
+      throw new errors.Error('ERR_INVALID_ARG_TYPE', 'encoding', 'string');


This should coerce encoding to string rather than check its type, per IDL rules. See here for how it's done in URL.

TimothyGu · 2017-06-14T00:03:20Z

lib/internal/encoding.js

+  constructor(encoding = 'utf-8', options = {}) {
+    if (typeof encoding !== 'string')
+      throw new errors.Error('ERR_INVALID_ARG_TYPE', 'encoding', 'string');
+    if (options !== undefined && typeof options !== 'object')


Here too: null should be treated just like undefined. Also, options cannot be undefined here due to the default parameter above.

TimothyGu · 2017-06-14T00:05:18Z

lib/internal/encoding.js

+  }
+
+  decode(input = empty, options = {}) {
+    if (isAnyArrayBuffer(input)) {


SharedArrayBuffer is not supported at this moment in Web platform APIs that do not have [AllowShared] extended attribute. See https://heycam.github.io/webidl/#es-buffer-source-types.

Yep, understood. Unfortunately, the separate process.binding('util').isArrayBuffer() and process.binding('util').isSharedArrayBuffer() method no longer exist. It looks like those were condensed recently into a single IsAnyArrayBuffer(). I think this is one we can safely ignore for now.

Seems bad to ignore, as becoming spec-complaint later will be a backward-incompatible breaking change.

Yeah, agreed. I've added the isArrayBuffer() util method back in to process.binding('util')

TimothyGu · 2017-06-14T00:18:16Z

src/node_i18n.cc

+  static void Decode(const FunctionCallbackInfo<Value>& args) {
+    Environment* env = Environment::GetCurrent(args);
+
+    CHECK_GE(args.Length(), 3);  // Converter, ArrayBuffer, Flags


s/ArrayBuffer/Buffer/

TimothyGu · 2017-06-14T00:18:46Z

src/node_i18n.cc

+
+    CHECK_GE(args.Length(), 2);
+    Utf8Value label(env->isolate(), args[0]);
+    int flags = args[1]->Uint32Value();


This should use the Local<Context> version as well.

TimothyGu · 2017-06-14T00:22:08Z

src/node_i18n.cc

+    const char* source = input_obj_data;
+    size_t source_length = input_obj_length;
+
+    if (converter->unicode_ && !converter->bomSeen_) {


Put && !converter->ignoreBOM_ here as well?

TimothyGu · 2017-06-14T00:22:42Z

src/node_i18n.cc

+      }
+      source += bomOffset;
+      source_length -= bomOffset;
+      converter->bomSeen_ = true;


This flag should only be set when there actually is a BOM, i.e. bomOffset != 0.

Not quite ...

In: https://encoding.spec.whatwg.org/#interface-textdecoder

* If encoding is UTF-8, UTF-16BE, or UTF-16LE, and ignore BOM flag and BOM seen flag are unset, then: * If token is U+FEFF, then set BOM seen flag. * Otherwise, if token is not end-of-stream, then set BOM seen flag and append token to output. * Otherwise, return output.

TimothyGu · 2017-06-14T00:25:22Z

src/node_i18n.cc

+    UChar* target = *result;
+    ucnv_toUnicode(converter->conv,
+                   &target, target + (limit * sizeof(UChar)),
+                   &source, source + source_length,


The ICU docs say:

For most Unicode charsets it is also possible to ignore the indicated number of initial stream bytes and start converting after them. However, there are stateful Unicode charsets (UTF-7 and BOCU-1) for which this will not work. Therefore, it is best to ignore the first output UChar instead of the input signature bytes.

AFAICT source is incremented by the signature bytes, so we are doing what the ICU docs advise not to do. Are the negative consequences applicable to us?

I do not believe so. TextDecoder does not support UTF-7 nor BOCU-1. We are only skipping the signature bytes for UTF-8, UTF-16le, and UTF-16be, none of which have this issue.

Ah, in that case LGTM.

TimothyGu · 2017-06-14T08:53:54Z

lib/internal/encoding.js

+    return 'utf-8';
+  }
+
+  encode(input) {


input is an optional parameter too. See https://encoding.spec.whatwg.org/#interface-textencoder

TimothyGu · 2017-06-14T08:54:43Z

lib/internal/encoding.js

+  }
+
+  encode(input) {
+    return new Uint8Array(lazyBuffer().from(String(input)));


You have to use template literal syntax to prevent symbols from getting converted to strings. See

node/lib/internal/url.js

Line 56 in 448c4c6

const str = `${val}`;

.

TimothyGu · 2017-06-14T08:55:09Z

lib/internal/encoding.js

+}
+
+class TextEncoder {
+  constructor() {}


Can this be omitted?

jasnell · 2017-06-14T19:35:19Z

Updated :-)
@TimothyGu ... as always, I'm loving the thorough review :-)

TimothyGu

Looking much better, thanks a lot!

TimothyGu · 2017-06-15T05:53:44Z

doc/api/buffer.md

+
+### textDecoder.decode([input[, options]])
+
+* `input` {ArrayBuffer|TypedArray} An `ArrayBuffer` or `%TypedArray%` instance


Also DataView. The %TypedArray% notation is generally reserved to specs, so do you think it is appropriate to use it here?

TimothyGu · 2017-06-15T05:54:30Z

doc/api/buffer.md

+
+* `input` {ArrayBuffer|TypedArray} An `ArrayBuffer` or `%TypedArray%` instance
+  containing the encoded data.
+* `options` {object}


TimothyGu · 2017-06-15T05:55:36Z

doc/api/buffer.md

+```js
+const { TextEncoder } = require('buffer');
+const encoder = new TextEncoder();
+const int8array = encoder.encode('this is some data');


uint8array?

TimothyGu · 2017-06-15T05:58:29Z

lib/internal/encoding.js

+
+class TextDecoder {
+  constructor(encoding = 'utf-8', options = {}) {
+    encoding = String(encoding);


Template string literal is needed here as well (to disallow symbols).

TimothyGu · 2017-06-15T06:00:01Z

lib/internal/encoding.js

+
+  [inspect](depth, opts) {
+    if (this == null || this[kEncoding] === undefined) {
+      throw new errors.TypeError('ERR_INVALID_THIS', 'TextDecoder');


This check should technically be used for the user-visible getters and the methods as well.

Yeah, for now I'd rather leave it just here tho as it's not a check we commonly do.

FYI, we are doing this for URLs (see Web IDL create a operation function step 2.1.2.4). I'll defer to your opinion regarding these classes though.

I would recommend adding appropriate type-checking here if possible; not a big deal, but seems unnecessary to diverge from the spec and from URL in that way.

@domenic ... just want to make sure I'm clear on what kind of checking you're recommending? instanceof or is the more efficient hidden-symbol check ok?

Hidden symbol check for sure. Although ideally one that distinguishes encoder vs. decoder.

TimothyGu · 2017-06-15T06:00:46Z

lib/internal/encoding.js

+    // This is not perfect, but there's really nothing else to key off since
+    // there are no internal properties specific to TextEncoder
+    if (this == null || this[Symbol.toStringTag] !== 'TextEncoder') {
+      throw new errors.TypeError('ERR_INVALID_THIS', 'TextEncoder');


Same here. encoding and encode should have this check too.

For the time being, I'd prefer not.

TimothyGu · 2017-06-15T06:02:23Z

lib/internal/encoding.js

+
+  [inspect](depth, opts) {
+    // This is not perfect, but there's really nothing else to key off since
+    // there are no internal properties specific to TextEncoder


:/ In this case I'd rather add a dummy property through a constructor, just so that subclasses that extend TextEncoder but override Symbol.toStringTag can work.

TimothyGu · 2017-06-15T06:02:48Z

src/node_i18n.cc

+    }
+
+    Local<ObjectTemplate> t = ObjectTemplate::New(env->isolate());
+    t->SetInternalFieldCount(1);


Ping this question...

TimothyGu · 2017-06-15T06:03:42Z

lib/internal/encoding.js

+  var s = 0;
+  var e = label.length;
+  while (s < e && (
+         label[s] === '\u0009' ||


Would a regex like /[\t\n\f\r ]/ be faster?

Not in my experience.

TimothyGu · 2017-06-15T06:11:40Z

A bug I noticed: https://encoding.spec.whatwg.org/#dom-textdecoder-decode step 1 unsets the BOM seen flag if do not flush flag resulted from a previous run is not set (i.e. this is either the first call to decode() or the first call after a decode() call with { stream: false }). However, this doesn't appear to be so:

> var dec = new buffer.TextDecoder('utf-8');
> [...dec.decode(Uint8Array.of(0xEF, 0xBB, 0xBF, 0x20))].map(str => str.codePointAt(0).toString(16));
[ '20' ]
> [...dec.decode(Uint8Array.of(0xEF, 0xBB, 0xBF, 0x20))].map(str => str.codePointAt(0).toString(16));
[ 'feff', '20' ]

The correct behavior seems to also be the one implemented by Chrome:

> var dec = new TextDecoder('utf-8');
> [...dec.decode(Uint8Array.of(0xEF, 0xBB, 0xBF, 0x20))].map(str => str.codePointAt(0).toString(16));
[ '20' ]
> [...dec.decode(Uint8Array.of(0xEF, 0xBB, 0xBF, 0x20))].map(str => str.codePointAt(0).toString(16));
[ '20' ]

TimothyGu

One more observation...

TimothyGu · 2017-06-15T08:26:21Z

lib/internal/encoding.js

+  // ['hz-gb-2312', 'replacement'],
+  // ['iso-2022-cn', 'replacement'],
+  // ['iso-2022-cn-ext', 'replacement'],
+  // ['iso-2022-kr', 'replacement'],


Rather than mentioning these as "unsupported", we can in fact say they don't necessarily need to be supported, since TextDecoder errors out on 'replacement'.

TimothyGu · 2017-06-15T08:28:35Z

lib/internal/encoding.js

+      throw new errors.Error('ERR_INVALID_ARG_TYPE', 'options', 'object');
+
+    const enc = getEncodingFromLabel(encoding);
+    if (enc === undefined || enc === 'replacement')


If we make sure no entries in encodings Map actually map to 'replacement' as value, we don't have to explicitly check for it here.

jasnell · 2017-06-15T20:45:27Z

@TimothyGu ... updated!

TimothyGu

LGTM as is, but some further observations are below. WPT integration can be worked on after this PR is merged.

I do have one more suggested edit: TimothyGu@0d0f7fc, which eliminates the need to copy from a Buffer to an Uint8Array.

TimothyGu · 2017-06-16T01:18:58Z

lib/internal/encoding.js

+
+  [inspect](depth, opts) {
+    if (this == null || this[kEncoding] === undefined) {
+      throw new errors.TypeError('ERR_INVALID_THIS', 'TextDecoder');


FYI, we are doing this for URLs (see Web IDL create a operation function step 2.1.2.4). I'll defer to your opinion regarding these classes though.

TimothyGu · 2017-06-16T01:19:47Z

lib/internal/encoding.js

+  }
+
+  [inspect](depth, opts) {
+    if (this == null || this[kEncoding] === undefined) {


This unfortunately will not distinguish TextEncoder and TextDecoder objects, as both have this property defined.

Provide an (initially experimental) implementation of the WHATWG Encoding Standard API (`TextDecoder` and `TextEncoder`). The is the same API implemented on the browser side. By default, with small-icu, only the UTF-8, UTF-16le and UTF-16be decoders are supported. With full-icu enabled, every encoding other than iso-8859-16 is supported. This provides a basic test, but does not include the full web platform tests. Note: many of the web platform tests for this would fail by default because we ship with small-icu by default. A process warning will be emitted on first use to indicate that the API is still experimental. No runtime flag is required to use the feature. Refs: https://encoding.spec.whatwg.org/ PR-URL: nodejs#13644 Reviewed-By: Timothy Gu <timothygu99@gmail.com> Reviewed-By: Matteo Collina <matteo.collina@gmail.com>

Provide an (initially experimental) implementation of the WHATWG Encoding Standard API (`TextDecoder` and `TextEncoder`). The is the same API implemented on the browser side. By default, with small-icu, only the UTF-8, UTF-16le and UTF-16be decoders are supported. With full-icu enabled, every encoding other than iso-8859-16 is supported. This provides a basic test, but does not include the full web platform tests. Note: many of the web platform tests for this would fail by default because we ship with small-icu by default. A process warning will be emitted on first use to indicate that the API is still experimental. No runtime flag is required to use the feature. Backport-PR-URL: #14585 Backport-Reviewed-By: Anna Henningsen <anna@addaleax.net> Refs: https://encoding.spec.whatwg.org/ PR-URL: #13644 Reviewed-By: Timothy Gu <timothygu99@gmail.com> Reviewed-By: Matteo Collina <matteo.collina@gmail.com>

V8 6.0: The V8 engine has been upgraded to version 6.0, which has a significantly changed performance profile. [#14574](#14574) More detailed information on performance differences can be found at https://medium.com/the-node-js-collection/get-ready-a-new-v8-is-coming-node-js-performance-is-changing-46a63d6da4de Other notable changes: * **DNS** * Independent DNS resolver instances are supported now, with support for cancelling the corresponding requests. [#14518](#14518) * **REPL** * Autocompletion support for `require()` has been improved. [#14409](#14409) * **Utilities** * The WHATWG Encoding Standard (`TextDecoder` and `TextEncoder`) has been implemented. [#13644](#13644) * **Added new collaborators** * [XadillaX](https://github.com/XadillaX) – Khaidi Chu

V8 6.0: The V8 engine has been upgraded to version 6.0, which has a significantly changed performance profile. [#14574](#14574) More detailed information on performance differences can be found at https://medium.com/the-node-js-collection/get-ready-a-new-v8-is-coming-node-js-performance-is-changing-46a63d6da4de Other notable changes: * **DNS** * Independent DNS resolver instances are supported now, with support for cancelling the corresponding requests. [#14518](#14518) * **N-API** * Multiple N-API functions for error handling have been changed to support assigning error codes. [#13988](#13988) * **REPL** * Autocompletion support for `require()` has been improved. [#14409](#14409) * **Utilities** * The WHATWG Encoding Standard (`TextDecoder` and `TextEncoder`) has been implemented as an experimental feature. [#13644](#13644) * **Added new collaborators** * [XadillaX](https://github.com/XadillaX) – Khaidi Chu

bnoordhuis · 2017-08-09T10:10:58Z

tools/icu/icu-generic.gyp

-             'defines': [
-                # ICU cannot swap the initial data without this.
-                # http://bugs.icu-project.org/trac/ticket/11046
-                'UCONFIG_NO_LEGACY_CONVERSION=1'


FYI, Coverity is none too happy about this being re-enabled.

V8 6.0: The V8 engine has been upgraded to version 6.0, which has a significantly changed performance profile. [#14574](#14574) More detailed information on performance differences can be found at https://medium.com/the-node-js-collection/get-ready-a-new-v8-is-coming-node-js-performance-is-changing-46a63d6da4de Other notable changes: * **DNS** * Independent DNS resolver instances are supported now, with support for cancelling the corresponding requests. [#14518](#14518) * **N-API** * Multiple N-API functions for error handling have been changed to support assigning error codes. [#13988](#13988) * **REPL** * Autocompletion support for `require()` has been improved. [#14409](#14409) * **Utilities** * The WHATWG Encoding Standard (`TextDecoder` and `TextEncoder`) has been implemented as an experimental feature. [#13644](#13644) * **Added new collaborators** * [XadillaX](https://github.com/XadillaX) – Khaidi Chu * [gabrielschulhof](https://github.com/gabrielschulhof) – Gabriel Schulhof

V8 6.0: The V8 engine has been upgraded to version 6.0, which has a significantly changed performance profile. [#14574](#14574) More detailed information on performance differences can be found at https://medium.com/the-node-js-collection/get-ready-a-new-v8-is-coming-node-js-performance-is-changing-46a63d6da4de Other notable changes: * **DNS** * Independent DNS resolver instances are supported now, with support for cancelling the corresponding requests. [#14518](#14518) * **N-API** * Multiple N-API functions for error handling have been changed to support assigning error codes. [#13988](#13988) * **REPL** * Autocompletion support for `require()` has been improved. [#14409](#14409) * **Utilities** * The WHATWG Encoding Standard (`TextDecoder` and `TextEncoder`) has been implemented as an experimental feature. [#13644](#13644) * **Added new collaborators** * [XadillaX](https://github.com/XadillaX) – Khaidi Chu * [gabrielschulhof](https://github.com/gabrielschulhof) – Gabriel Schulhof Conflicts: src/node_version.h

PR-URL: nodejs#13916 Refs: nodejs#13644 (comment) Reviewed-By: Vse Mozhet Byt <vsemozhetbyt@gmail.com>

PR-URL: #13916 Refs: #13644 (comment) Reviewed-By: Vse Mozhet Byt <vsemozhetbyt@gmail.com>

gibfahn · 2018-01-15T22:05:26Z

Release team decided not to land on v6.x, if you disagree let us know.

jasnell added buffer Issues and PRs related to the buffer subsystem. semver-minor PRs that contain new features and should be released in the next minor version. labels Jun 13, 2017

domenic mentioned this pull request Jun 13, 2017

Discussion of implementing the Encoding Standard with ICU, and associated web platform tests #13646

Closed

TimothyGu reviewed Jun 13, 2017

View reviewed changes

vsemozhetbyt reviewed Jun 13, 2017

View reviewed changes

sam-github reviewed Jun 13, 2017

View reviewed changes

TimothyGu reviewed Jun 14, 2017

View reviewed changes

jasnell force-pushed the encoding-standard branch from 1e8402f to 97f8476 Compare June 14, 2017 19:33

TimothyGu reviewed Jun 15, 2017

View reviewed changes

TimothyGu approved these changes Jun 16, 2017

View reviewed changes

addaleax added the backport-requested-v8.x label Jul 27, 2017

kunalspathak mentioned this pull request Jul 28, 2017

fix build break from JsCopyString nodejs/node-chakracore#349

Merged

2 tasks

Trott removed the ctc-review label Jul 31, 2017

jasnell mentioned this pull request Aug 1, 2017

[8.x] backport of util refactor and whatwg encoding #14585

Closed

4 tasks

addaleax added backported-to-v8.x notable-change PRs with changes that should be highlighted in changelogs. and removed backport-requested-v8.x labels Aug 2, 2017

addaleax mentioned this pull request Aug 2, 2017

v8.3.0 proposal #14594

Merged

bnoordhuis reviewed Aug 9, 2017

View reviewed changes

TimothyGu mentioned this pull request Sep 12, 2017

[v6.x backport] doc: add documentation on ICU #15353

Closed

2 tasks

TimothyGu added a commit to TimothyGu/node that referenced this pull request Sep 15, 2017

doc: add documentation on ICU

d7293be

PR-URL: nodejs#13916 Refs: nodejs#13644 (comment) Reviewed-By: Vse Mozhet Byt <vsemozhetbyt@gmail.com>

MylesBorins pushed a commit that referenced this pull request Sep 19, 2017

doc: add documentation on ICU

90fcccd

PR-URL: #13916 Refs: #13644 (comment) Reviewed-By: Vse Mozhet Byt <vsemozhetbyt@gmail.com>

gibfahn added the dont-land-on-v6.x label Jan 15, 2018

lance mentioned this pull request Mar 14, 2018

test: add assertions for TextEncoder/Decoder #18132

Closed

3 tasks

RReverser mentioned this pull request Mar 25, 2019

Add TextEncoder.prototype.encodeInto #26904

Closed

skyclouds2001 mentioned this pull request Nov 20, 2024

doc: Missing version info for TextEncoder.prototype.encodeInto() #55938

Closed


		### textDecoder.decode([input[, options]])

		* `input` {ArrayBuffer\|TypedArray} An `ArrayBuffer` or `%TypedArray%` instance

buffer: implement WHATWG Encoding Standard API #13644

buffer: implement WHATWG Encoding Standard API #13644

Conversation

jasnell commented Jun 13, 2017

Checklist

Affected core subsystem(s)

mscdex commented Jun 13, 2017

jasnell commented Jun 13, 2017

TimothyGu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vsemozhetbyt Jun 13, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vsemozhetbyt Jun 13, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vsemozhetbyt Jun 13, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jasnell commented Jun 13, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sam-github Jun 13, 2017 • edited Loading

Choose a reason for hiding this comment

jasnell Jun 13, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TimothyGu Jun 14, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jasnell commented Jun 14, 2017

TimothyGu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vsemozhetbyt Jun 13, 2017 •

edited

Loading

vsemozhetbyt Jun 13, 2017 •

edited

Loading

vsemozhetbyt Jun 13, 2017 •

edited

Loading

sam-github Jun 13, 2017 •

edited

Loading

jasnell Jun 13, 2017 •

edited

Loading

TimothyGu Jun 14, 2017 •

edited

Loading