-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix performance issue with UTF8ToString (#12517) #12544
Fix performance issue with UTF8ToString (#12517) #12544
Conversation
Thank you for submitting a pull request! If this is your first PR, make sure to add yourself to AUTHORS. |
This regresses generated code size for users that do not use the new variant. Also unless I am mistaken, this does not really fix any performance issue, because no caller code site is actually changed to use the exact length form. The PR just slows down the functions when exact length is not specified. Maybe this fixes a performance issue in your JS library code calls that are updated to use this form, at the expense of slowing down generated code for every other user? I would recommend implementing a separate set of functions For the new variant, add a clear comment that if there are null bytes in the middle of the string, then those null bytes will also be deserialized into the generated string. |
@juj Thanks for the suggestion. BTW: previous discussion here #12517 This does not fix the performance issue, because the function still needs to be backward compatible.
I don't think this PR would slow down other's code. Just added some simple if checks.
This is nice. |
The loop condition changed from
to
which will slow down code for all users. Let's go with a separate set of functions, that will be best for code size and performance in each case. |
Ok, new commit latter |
41ce84e
to
d626038
Compare
@juj I rewrote this PR and added corresponding UnitTest, please review. |
BTW, the tests/utf32.cpp file didn't properly formated with clang-format, and since this file is small, and my test now makes up more lines, I formatted it with project's .clang-format config. |
7bac16d
to
7dfe671
Compare
@@ -118,6 +118,85 @@ function UTF8ToString(ptr, maxBytesToRead) { | |||
#endif | |||
} | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move all the added content in this file over to src/library_strings.js
. The file runtime_strings.js
should not receive any more content, it exists for historical reasons. All new functions can go to library_strings.js
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
accepted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to move code to src/library_strings.js
, but it seems to have something to be related to the MINIMAL_RUNTIME
flag, no matter how the unit test complains ReferenceError: UTF8ToStringNBytes is not defined
.
The half-done commit is here LanderlYoung@f9149d7
If there is any doc related, please let me know, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way the library_xxx.js
system works is that symbols are only ever included on on demand.
There are several ways a library symbol can get included:
- By being directly referenced by native code
- By being add to
-s DEFAULT_LIBRARY_FUNCS_TO_INCLUDE
list. - Via the
__deps
entry to another symbol that is incuded.
(1) doesn't apply to JS functions that are only callable from JS such as these ones so you can ignore that. To signal these are JS-only functions you should prefix them with $
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like src/library_strings.js
is only currently included in MINIMAL_RUNTIME mode. Its job is currently to make the functions in runtime_strings_extra.js
available as JS library, but only MINIMAL_RUNTIME mode.
runtime_strings_extra.js
is include directly in normal (non-MINIMAL) mode.
So some refactoring would need to be doing before these functions can be make universally available via the library system.
I'd be tempted to add all these new functions to runtime_strings_extra.js
+ src/library_strings.js
for now. At least then they will be used in library form under MINIMAL_RUNTIME.
The alternative is that we do some refactoring such at src/library_strings.js
is used in all modes.
src/runtime_strings_extra.js
Outdated
@@ -77,6 +77,37 @@ function UTF16ToString(ptr, maxBytesToRead) { | |||
#endif // TEXTDECODER | |||
} | |||
|
|||
function UTF16ToStringWithLength(ptr, lengthInBytes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move all the added code from this file into file src/library_strings.js
. This file exists only for historical compatibility reasons, but all new code can be added to src/library_strings.js
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
accepted
Great work! Thanks for championing this. One code organization comment only, we don't need to use the |
Thanks for the advice, will do it tomorrow. (I'm from China, it's night now😃) |
7dfe671
to
ad88e7d
Compare
update: Fixed cr issues:
BUT, failed to move code to |
9599cb2
to
10fce0d
Compare
introduce new functions: - UTF8ToStringNBytes - UTF16ToStringNBytes - UTF32ToStringNBytes add docs to preamble.js.rst Decode string exactly length with given length, any '\0' in between will be kept as-is. Those functions require an argument `lengthInBytes`, so no need to iterator the heap to find a null-terminator, thus have better performance.
10fce0d
to
5a20826
Compare
This issue has been automatically marked as stale because there has been no activity in the past year. It will be closed automatically if no further activity occurs in the next 30 days. Feel free to re-open at any time if this issue is still relevant. |
- add extra parameterexactStringLength
to UTF8ToString,This parameter is optional, and if given, themaxBytesToRead
parameter is ignored.to keep consistent API flavor,UTF16ToString
,UTF32ToString
, is also changed.