-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default to TEXTDECODER=2 in -Oz builds #13304
Conversation
d454220
to
a76a2a6
Compare
Could it be made default unless in legacy browser mode? Support is pretty universal now: https://caniuse.com/textencoder |
TextDecoder is enabled by default, but it is not always used, because it is slower decoding short strings compared to using a manually rolled JavaScript implementation, which is why the current default TEXTDECODER=1 manually decodes short strings, and falls back to TextDecoder for long strings (cutoff is at 16 byte strings). Also in multithreaded builds TextDecoder cannot be used because .decode() may not allow SAB. The spec was changed at whatwg/encoding#172 to allow it, but it was done in a way that is making hard to have any visibility as to when each browser would have shipped the update. (when we do, we can drop the TEXTDECODER does not support USE_PTHREADS case in the compiler) |
I have a presumed optimization rotting in Phabricator, because I was unable to show that it's an optimization for real workloads. However, that still doesn't optimize away the general overhead of the WebIDL layer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems reasonable to me, but I'm curious what others think. In general we don't flip many options like this based on optimization/shrink levels.
That is true, though for -Oz I think we should adjust the default settings in scenarios where the result is not functionally observable (like is the case is for TEXTDECODER), as long as we retain the chance to override them as usual. The only other setting that I can find that could also be flipped in this manner in -Oz is GL_POOL_TEMP_BUFFERS=0. |
There are https://github.com/emscripten-core/emscripten/blob/master/tests/benchmark_utf8.cpp and https://github.com/emscripten-core/emscripten/blob/master/tests/benchmark_utf16.cpp . See tests/test_browser.py The main part of the slowdown using TextDecoder is due to needing to scan the string length in advance: https://github.com/emscripten-core/emscripten/blob/master/src/runtime_strings.js#L32. Something to note is that the decision on which to default is not based on Firefox performance, but general performance across Firefox, Chrome and Safari. So even if Firefox was super fast, it would probably not change anything if other browsers are not. That being said, I don't know what the most recent numbers are on each browser, it has been a while since this has been benchmarked. |
These don't appear to be extracted from real-world applications. The key question for how to proceed in Gecko is getting realistic workloads for benchmarking. With unrealistic workloads, the case for the patch can be argued either way depending on workload selection.
Does emscripten provide conversion from UTF-8-holding C++ |
There is embind that provides marshalling of higher level types, but currently it does not depend on C++ object ABI layout on JS side, but marshals using the same C string marshalling functions. |
6e9d439
to
2ea4390
Compare
2ea4390
to
086a18f
Compare
I filed an issue about that. |
Default to TEXTDECODER=2 in -Oz builds to save 0.5KB-1KB of build output size.