Fix performance issue with UTF8ToString (#12517) #12544

LanderlYoung · 2020-10-19T06:48:59Z

~~- add extra parameter exactStringLength to UTF8ToString,~~
~~This parameter is optional, and if given, the maxBytesToRead parameter is ignored.~~
~~to keep consistent API flavor, UTF16ToString, UTF32ToString, is also changed.~~

Fix performance issue with UTF8ToString (#12517)

introduce new functions
- UTF8ToStringWithLength
- UTF16ToStringWithLength
- UTF32ToStringWithLength

Decode string exactly length with given length, any '\0' in between will
be kept as-is.

Those functions require an argument `lengthInBytes`, so no need to iterator
the heap to find a null-terminator, thus have better performance.

welcome · 2020-10-19T06:49:00Z

Thank you for submitting a pull request! If this is your first PR, make sure to add yourself to AUTHORS.

juj · 2020-10-19T07:49:24Z

This regresses generated code size for users that do not use the new variant.

Also unless I am mistaken, this does not really fix any performance issue, because no caller code site is actually changed to use the exact length form. The PR just slows down the functions when exact length is not specified.

Maybe this fixes a performance issue in your JS library code calls that are updated to use this form, at the expense of slowing down generated code for every other user?

I would recommend implementing a separate set of functions UTF*ToStringExactNBytes where maxBytesToRead would be replaced with a non-optional parameter exactBytesToRead. Then callers can use that variant when they need to marshal exactly requested number of bytes.

For the new variant, add a clear comment that if there are null bytes in the middle of the string, then those null bytes will also be deserialized into the generated string.

LanderlYoung · 2020-10-19T08:07:20Z

@juj Thanks for the suggestion. BTW: previous discussion here #12517

This does not fix the performance issue, because the function still needs to be backward compatible.
But it provides a way to be more efficient.

...at the expense of slowing down generated code for every other user?

I don't think this PR would slow down other's code. Just added some simple if checks.

I would recommend implementing a separate set of functions

This is nice.

juj · 2020-10-19T08:20:20Z

I don't think this PR would slow down other's code. Just added some simple if checks.

The loop condition changed from

while (idx < endPtr)

to

    while (hasExactLength ? idx < endPtr : !(idx >= endIdx)) {

which will slow down code for all users.

Let's go with a separate set of functions, that will be best for code size and performance in each case.

LanderlYoung · 2020-10-19T11:36:04Z

Ok, new commit latter

LanderlYoung · 2020-10-20T08:48:47Z

@juj I rewrote this PR and added corresponding UnitTest, please review.

LanderlYoung · 2020-10-20T08:50:51Z

BTW, the tests/utf32.cpp file didn't properly formated with clang-format, and since this file is small, and my test now makes up more lines, I formatted it with project's .clang-format config.

src/runtime_strings.js

juj · 2020-10-20T14:35:59Z

src/runtime_strings.js

@@ -118,6 +118,85 @@ function UTF8ToString(ptr, maxBytesToRead) {
 #endif
 }

+


Please move all the added content in this file over to src/library_strings.js. The file runtime_strings.js should not receive any more content, it exists for historical reasons. All new functions can go to library_strings.js.

I tried to move code to src/library_strings.js, but it seems to have something to be related to the MINIMAL_RUNTIME flag, no matter how the unit test complains ReferenceError: UTF8ToStringNBytes is not defined.

The half-done commit is here LanderlYoung@f9149d7

If there is any doc related, please let me know, thanks.

The way the library_xxx.js system works is that symbols are only ever included on on demand.

There are several ways a library symbol can get included:

By being directly referenced by native code

By being add to -s DEFAULT_LIBRARY_FUNCS_TO_INCLUDE list.

Via the __deps entry to another symbol that is incuded.

(1) doesn't apply to JS functions that are only callable from JS such as these ones so you can ignore that. To signal these are JS-only functions you should prefix them with $.

It looks like src/library_strings.js is only currently included in MINIMAL_RUNTIME mode. Its job is currently to make the functions in runtime_strings_extra.js available as JS library, but only MINIMAL_RUNTIME mode.

runtime_strings_extra.js is include directly in normal (non-MINIMAL) mode.

So some refactoring would need to be doing before these functions can be make universally available via the library system.

I'd be tempted to add all these new functions to runtime_strings_extra.js + src/library_strings.js for now. At least then they will be used in library form under MINIMAL_RUNTIME.

The alternative is that we do some refactoring such at src/library_strings.js is used in all modes.

juj · 2020-10-20T14:38:59Z

src/runtime_strings_extra.js

@@ -77,6 +77,37 @@ function UTF16ToString(ptr, maxBytesToRead) {
 #endif // TEXTDECODER
 }

+function UTF16ToStringWithLength(ptr, lengthInBytes) {


Please move all the added code from this file into file src/library_strings.js. This file exists only for historical compatibility reasons, but all new code can be added to src/library_strings.js.

juj · 2020-10-20T14:41:28Z

Great work! Thanks for championing this. One code organization comment only, we don't need to use the runtime_x.js files for new code, but can use the proper library_x.js structuring there.

LanderlYoung · 2020-10-20T15:00:55Z

Thanks for the advice, will do it tomorrow. (I'm from China, it's night now😃)

LanderlYoung · 2020-10-21T12:10:41Z

update:

Fixed cr issues:

renamed UTF*ToStringWithLength to UTF*ToStringNBytes
fixed typo in doc
add docs to preamble.js.rst

BUT, failed to move code to library_strings.js, more detail explained in cr conversation.

introduce new functions: - UTF8ToStringNBytes - UTF16ToStringNBytes - UTF32ToStringNBytes add docs to preamble.js.rst Decode string exactly length with given length, any '\0' in between will be kept as-is. Those functions require an argument `lengthInBytes`, so no need to iterator the heap to find a null-terminator, thus have better performance.

stale · 2022-04-16T13:21:31Z

This issue has been automatically marked as stale because there has been no activity in the past year. It will be closed automatically if no further activity occurs in the next 30 days. Feel free to re-open at any time if this issue is still relevant.

LanderlYoung marked this pull request as ready for review October 19, 2020 07:57

LanderlYoung force-pushed the feature/UTF8ToString_performance_issue_12517 branch from 41ce84e to d626038 Compare October 20, 2020 08:47

LanderlYoung force-pushed the feature/UTF8ToString_performance_issue_12517 branch 2 times, most recently from 7bac16d to 7dfe671 Compare October 20, 2020 11:00

juj reviewed Oct 20, 2020

View reviewed changes

src/runtime_strings.js Show resolved Hide resolved

juj reviewed Oct 20, 2020

View reviewed changes

LanderlYoung force-pushed the feature/UTF8ToString_performance_issue_12517 branch from 7dfe671 to ad88e7d Compare October 21, 2020 12:01

LanderlYoung force-pushed the feature/UTF8ToString_performance_issue_12517 branch 2 times, most recently from 9599cb2 to 10fce0d Compare October 23, 2020 02:43

LanderlYoung force-pushed the feature/UTF8ToString_performance_issue_12517 branch from 10fce0d to 5a20826 Compare October 26, 2020 02:20

Base automatically changed from master to main March 8, 2021 23:49

stale bot added the wontfix label Apr 16, 2022

LanderlYoung closed this Aug 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix performance issue with UTF8ToString (#12517) #12544

Fix performance issue with UTF8ToString (#12517) #12544

LanderlYoung commented Oct 19, 2020 •

edited

Loading

welcome bot commented Oct 19, 2020

juj commented Oct 19, 2020

LanderlYoung commented Oct 19, 2020

juj commented Oct 19, 2020

LanderlYoung commented Oct 19, 2020

LanderlYoung commented Oct 20, 2020

LanderlYoung commented Oct 20, 2020

juj Oct 20, 2020

LanderlYoung Oct 20, 2020

LanderlYoung Oct 21, 2020 •

edited

Loading

sbc100 Oct 21, 2020

sbc100 Oct 21, 2020

juj Oct 20, 2020

LanderlYoung Oct 20, 2020

juj commented Oct 20, 2020

LanderlYoung commented Oct 20, 2020

LanderlYoung commented Oct 21, 2020

stale bot commented Apr 16, 2022

		@@ -118,6 +118,85 @@ function UTF8ToString(ptr, maxBytesToRead) {
		#endif
		}

Fix performance issue with UTF8ToString (#12517) #12544

Fix performance issue with UTF8ToString (#12517) #12544

Conversation

LanderlYoung commented Oct 19, 2020 • edited Loading

welcome bot commented Oct 19, 2020

juj commented Oct 19, 2020

LanderlYoung commented Oct 19, 2020

juj commented Oct 19, 2020

LanderlYoung commented Oct 19, 2020

LanderlYoung commented Oct 20, 2020

LanderlYoung commented Oct 20, 2020

juj Oct 20, 2020

Choose a reason for hiding this comment

LanderlYoung Oct 20, 2020

Choose a reason for hiding this comment

LanderlYoung Oct 21, 2020 • edited Loading

Choose a reason for hiding this comment

sbc100 Oct 21, 2020

Choose a reason for hiding this comment

sbc100 Oct 21, 2020

Choose a reason for hiding this comment

juj Oct 20, 2020

Choose a reason for hiding this comment

LanderlYoung Oct 20, 2020

Choose a reason for hiding this comment

juj commented Oct 20, 2020

LanderlYoung commented Oct 20, 2020

LanderlYoung commented Oct 21, 2020

stale bot commented Apr 16, 2022

LanderlYoung commented Oct 19, 2020 •

edited

Loading

LanderlYoung Oct 21, 2020 •

edited

Loading