Default to TEXTDECODER=2 in -Oz builds #13304

juj · 2021-01-22T17:56:15Z

Default to TEXTDECODER=2 in -Oz builds to save 0.5KB-1KB of build output size.

curiousdannii · 2021-01-22T23:44:36Z

Could it be made default unless in legacy browser mode? Support is pretty universal now: https://caniuse.com/textencoder

juj · 2021-01-23T08:32:43Z

TextDecoder is enabled by default, but it is not always used, because it is slower decoding short strings compared to using a manually rolled JavaScript implementation, which is why the current default TEXTDECODER=1 manually decodes short strings, and falls back to TextDecoder for long strings (cutoff is at 16 byte strings).

Also in multithreaded builds TextDecoder cannot be used because .decode() may not allow SAB. The spec was changed at whatwg/encoding#172 to allow it, but it was done in a way that is making hard to have any visibility as to when each browser would have shipped the update. (when we do, we can drop the TEXTDECODER does not support USE_PTHREADS case in the compiler)

annevk · 2021-01-23T12:57:43Z

@juj is there a benchmark somewhere for TextDecoder for short strings? I wonder if that is something we could optimize in Firefox.

cc @hsivonen

hsivonen · 2021-01-25T11:20:36Z

I have a presumed optimization rotting in Phabricator, because I was unable to show that it's an optimization for real workloads.

However, that still doesn't optimize away the general overhead of the WebIDL layer.

kripken

This seems reasonable to me, but I'm curious what others think. In general we don't flip many options like this based on optimization/shrink levels.

emcc.py

juj · 2021-01-27T11:34:31Z

This seems reasonable to me, but I'm curious what others think. In general we don't flip many options like this based on optimization/shrink levels.

That is true, though for -Oz I think we should adjust the default settings in scenarios where the result is not functionally observable (like is the case is for TEXTDECODER), as long as we retain the chance to override them as usual. The only other setting that I can find that could also be flipped in this manner in -Oz is GL_POOL_TEMP_BUFFERS=0.

juj · 2021-01-27T11:39:14Z

@juj is there a benchmark somewhere for TextDecoder for short strings? I wonder if that is something we could optimize in Firefox.

There are https://github.com/emscripten-core/emscripten/blob/master/tests/benchmark_utf8.cpp and https://github.com/emscripten-core/emscripten/blob/master/tests/benchmark_utf16.cpp . See tests/test_browser.py test_utf8_textdecoder and test_utf16_textdecoder for examples how to build.

The main part of the slowdown using TextDecoder is due to needing to scan the string length in advance: https://github.com/emscripten-core/emscripten/blob/master/src/runtime_strings.js#L32.

Something to note is that the decision on which to default is not based on Firefox performance, but general performance across Firefox, Chrome and Safari. So even if Firefox was super fast, it would probably not change anything if other browsers are not.

That being said, I don't know what the most recent numbers are on each browser, it has been a while since this has been benchmarked.

hsivonen · 2021-02-01T09:42:43Z

There are https://github.com/emscripten-core/emscripten/blob/master/tests/benchmark_utf8.cpp and https://github.com/emscripten-core/emscripten/blob/master/tests/benchmark_utf16.cpp . See tests/test_browser.py test_utf8_textdecoder and test_utf16_textdecoder for examples how to build.

These don't appear to be extracted from real-world applications. The key question for how to proceed in Gecko is getting realistic workloads for benchmarking. With unrealistic workloads, the case for the patch can be argued either way depending on workload selection.

The main part of the slowdown using TextDecoder is due to needing to scan the string length in advance: https://github.com/emscripten-core/emscripten/blob/master/src/runtime_strings.js#L32.

Does emscripten provide conversion from UTF-8-holding C++ std::string (that knows its length) to JavaScript strings?

juj · 2021-02-13T16:00:42Z

Does emscripten provide conversion from UTF-8-holding C++ std::string (that knows its length) to JavaScript strings?

There is embind that provides marshalling of higher level types, but currently it does not depend on C++ object ABI layout on JS side, but marshals using the same C string marshalling functions.

juj · 2021-02-13T16:02:10Z

Ping @kripken @sbc100 , would you have a moment to review this?

…put size.

hsivonen · 2021-02-15T07:58:46Z

Does emscripten provide conversion from UTF-8-holding C++ std::string (that knows its length) to JavaScript strings?

There is embind that provides marshalling of higher level types, but currently it does not depend on C++ object ABI layout on JS side, but marshals using the same C string marshalling functions.

I filed an issue about that.

juj added the code size label Jan 22, 2021

juj force-pushed the default_textdecoder_2_oz branch from d454220 to a76a2a6 Compare January 22, 2021 17:56

kripken reviewed Jan 25, 2021

View reviewed changes

emcc.py Show resolved Hide resolved

juj force-pushed the default_textdecoder_2_oz branch from 6e9d439 to 2ea4390 Compare February 13, 2021 16:01

juj added 2 commits February 13, 2021 18:51

Default to TEXTDECODER=2 in -Oz builds to save 0.5KB-1KB of build out…

8c0be42

…put size.

Add comment

086a18f

juj force-pushed the default_textdecoder_2_oz branch from 2ea4390 to 086a18f Compare February 13, 2021 16:51

sbc100 approved these changes Feb 14, 2021

View reviewed changes

juj merged commit 8dd277d into emscripten-core:master Feb 14, 2021

hsivonen mentioned this pull request Feb 15, 2021

embind should use the length that std::string already knows #13500

Open

kjpou1 mentioned this pull request Feb 18, 2021

[browser] Bump emscripten 2.0.14 dotnet/runtime#48450

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default to TEXTDECODER=2 in -Oz builds #13304

Default to TEXTDECODER=2 in -Oz builds #13304

juj commented Jan 22, 2021

curiousdannii commented Jan 22, 2021

juj commented Jan 23, 2021 •

edited

Loading

annevk commented Jan 23, 2021

hsivonen commented Jan 25, 2021

kripken left a comment

juj commented Jan 27, 2021

juj commented Jan 27, 2021 •

edited

Loading

hsivonen commented Feb 1, 2021

juj commented Feb 13, 2021

juj commented Feb 13, 2021

hsivonen commented Feb 15, 2021

Default to TEXTDECODER=2 in -Oz builds #13304

Default to TEXTDECODER=2 in -Oz builds #13304

Conversation

juj commented Jan 22, 2021

curiousdannii commented Jan 22, 2021

juj commented Jan 23, 2021 • edited Loading

annevk commented Jan 23, 2021

hsivonen commented Jan 25, 2021

kripken left a comment

Choose a reason for hiding this comment

juj commented Jan 27, 2021

juj commented Jan 27, 2021 • edited Loading

hsivonen commented Feb 1, 2021

juj commented Feb 13, 2021

juj commented Feb 13, 2021

hsivonen commented Feb 15, 2021

juj commented Jan 23, 2021 •

edited

Loading

juj commented Jan 27, 2021 •

edited

Loading