Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default to TEXTDECODER=2 in -Oz builds #13304

Merged
merged 2 commits into from
Feb 14, 2021

Conversation

juj
Copy link
Collaborator

@juj juj commented Jan 22, 2021

Default to TEXTDECODER=2 in -Oz builds to save 0.5KB-1KB of build output size.

@juj juj added the code size label Jan 22, 2021
@juj juj force-pushed the default_textdecoder_2_oz branch from d454220 to a76a2a6 Compare January 22, 2021 17:56
@curiousdannii
Copy link
Contributor

Could it be made default unless in legacy browser mode? Support is pretty universal now: https://caniuse.com/textencoder

@juj
Copy link
Collaborator Author

juj commented Jan 23, 2021

TextDecoder is enabled by default, but it is not always used, because it is slower decoding short strings compared to using a manually rolled JavaScript implementation, which is why the current default TEXTDECODER=1 manually decodes short strings, and falls back to TextDecoder for long strings (cutoff is at 16 byte strings).

Also in multithreaded builds TextDecoder cannot be used because .decode() may not allow SAB. The spec was changed at whatwg/encoding#172 to allow it, but it was done in a way that is making hard to have any visibility as to when each browser would have shipped the update. (when we do, we can drop the TEXTDECODER does not support USE_PTHREADS case in the compiler)

@annevk
Copy link

annevk commented Jan 23, 2021

@juj is there a benchmark somewhere for TextDecoder for short strings? I wonder if that is something we could optimize in Firefox.

cc @hsivonen

@hsivonen
Copy link

I have a presumed optimization rotting in Phabricator, because I was unable to show that it's an optimization for real workloads.

However, that still doesn't optimize away the general overhead of the WebIDL layer.

Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable to me, but I'm curious what others think. In general we don't flip many options like this based on optimization/shrink levels.

emcc.py Show resolved Hide resolved
@juj
Copy link
Collaborator Author

juj commented Jan 27, 2021

This seems reasonable to me, but I'm curious what others think. In general we don't flip many options like this based on optimization/shrink levels.

That is true, though for -Oz I think we should adjust the default settings in scenarios where the result is not functionally observable (like is the case is for TEXTDECODER), as long as we retain the chance to override them as usual. The only other setting that I can find that could also be flipped in this manner in -Oz is GL_POOL_TEMP_BUFFERS=0.

@juj
Copy link
Collaborator Author

juj commented Jan 27, 2021

@juj is there a benchmark somewhere for TextDecoder for short strings? I wonder if that is something we could optimize in Firefox.

There are https://github.com/emscripten-core/emscripten/blob/master/tests/benchmark_utf8.cpp and https://github.com/emscripten-core/emscripten/blob/master/tests/benchmark_utf16.cpp . See tests/test_browser.py test_utf8_textdecoder and test_utf16_textdecoder for examples how to build.

The main part of the slowdown using TextDecoder is due to needing to scan the string length in advance: https://github.com/emscripten-core/emscripten/blob/master/src/runtime_strings.js#L32.

Something to note is that the decision on which to default is not based on Firefox performance, but general performance across Firefox, Chrome and Safari. So even if Firefox was super fast, it would probably not change anything if other browsers are not.

That being said, I don't know what the most recent numbers are on each browser, it has been a while since this has been benchmarked.

@hsivonen
Copy link

hsivonen commented Feb 1, 2021

There are https://github.com/emscripten-core/emscripten/blob/master/tests/benchmark_utf8.cpp and https://github.com/emscripten-core/emscripten/blob/master/tests/benchmark_utf16.cpp . See tests/test_browser.py test_utf8_textdecoder and test_utf16_textdecoder for examples how to build.

These don't appear to be extracted from real-world applications. The key question for how to proceed in Gecko is getting realistic workloads for benchmarking. With unrealistic workloads, the case for the patch can be argued either way depending on workload selection.

The main part of the slowdown using TextDecoder is due to needing to scan the string length in advance: https://github.com/emscripten-core/emscripten/blob/master/src/runtime_strings.js#L32.

Does emscripten provide conversion from UTF-8-holding C++ std::string (that knows its length) to JavaScript strings?

@juj
Copy link
Collaborator Author

juj commented Feb 13, 2021

Does emscripten provide conversion from UTF-8-holding C++ std::string (that knows its length) to JavaScript strings?

There is embind that provides marshalling of higher level types, but currently it does not depend on C++ object ABI layout on JS side, but marshals using the same C string marshalling functions.

@juj juj force-pushed the default_textdecoder_2_oz branch from 6e9d439 to 2ea4390 Compare February 13, 2021 16:01
@juj
Copy link
Collaborator Author

juj commented Feb 13, 2021

Ping @kripken @sbc100 , would you have a moment to review this?

@juj juj force-pushed the default_textdecoder_2_oz branch from 2ea4390 to 086a18f Compare February 13, 2021 16:51
@juj juj merged commit 8dd277d into emscripten-core:master Feb 14, 2021
@hsivonen
Copy link

Does emscripten provide conversion from UTF-8-holding C++ std::string (that knows its length) to JavaScript strings?

There is embind that provides marshalling of higher level types, but currently it does not depend on C++ object ABI layout on JS side, but marshals using the same C string marshalling functions.

I filed an issue about that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants