Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance issue with UTF8ToString #12517

Open
LanderlYoung opened this issue Oct 14, 2020 · 13 comments
Open

performance issue with UTF8ToString #12517

LanderlYoung opened this issue Oct 14, 2020 · 13 comments
Labels

Comments

@LanderlYoung
Copy link

LanderlYoung commented Oct 14, 2020

My code frequently passes a string from C++ to JS. To do this I used the UTF8ToString built-in function.

Something like this:

EM_JS(void, consumeAString(const char* str, size_t length), {
  const string = UTF8ToString(str, length);
  typeof string === 'string';
  // do with string;
});

void myFunc() {
  std::string str;
  consumeAString(str.c_str(), str.size());
}

However, I found this code executes slower than expected (chrome profile shows 85% of total execution time). So I looked into the code.

call chain: UTF8ToString -> UTF8ArrayToString

And UTF8ArrayToString basically has such logic:

function UTF8ArrayToString(heap, idx, maxBytesToRead) {
  var endIdx = idx + maxBytesToRead;
  var endPtr = idx;

  // (1)  find the '\0' terminator -- this is slow
  while (heap[endPtr] && !(endPtr >= endIdx)) ++endPtr;

  if (endPtr - idx > 16 && heap.subarray && UTF8Decoder) {
    // (2) do decode
    return UTF8Decoder.decode(heap.subarray(idx, endPtr));

...

Turns out that step (1) spend as much as (or even more than) step (2).

So I wrote a simple test code.

test code

const testIteration = 100;
const decoder = new TextDecoder('utf-8');

function test(length) {
    const buffer = new Int8Array(length);
    for (let i = 0; i < length; i++) {
        buffer[i] = 65; // ascii 'A'
    }

    function decode() {
        decoder.decode(buffer);
    }

    function iterate() {
        let i = 0;
        while(i < length && buffer[i]) i++;
    }

    let start = Date.now();
    for (let i = 0; i < testIteration; i++) {
        decode();
    }
    const decodeTime = (Date.now() - start) / testIteration;

    start = Date.now();
    for (let i = 0; i < testIteration; i++) {
        iterate();
    }
    const iterateTime = (Date.now() - start) / testIteration;

    console.log("TypedArray length:[" + length + "] decode:" + decodeTime + "ms" + " iterate:" + iterateTime + "ms");
}

test(1024);

test(1024 * 1024);

test(10 * 1024 * 1024);

Online test code on repl.it

On my Chrome 85 @MacOS 10.15 i9-9900K environment, it shows:

TypedArray length:[1024] decode:0ms iterate:0.01ms
TypedArray length:[1048576] decode:1.11ms iterate:0.91ms
TypedArray length:[10485760] decode:11.54ms iterate:8.96ms

Plus: I also tested strlen performance in C (Wasm) and JS

StrlenTest.cc

#include <iostream>
#include <chrono>
#include <emscripten/emscripten.h>

using namespace std;
using namespace std::chrono;

EM_JS(void, strlenTestJS, (const char* str), {
    function strlen(str) {
        let end = str;
        while (Module.HEAP8[end]) end++;
        return end - str;
    }

    for (let i = 0; i < 100; i++) {
        strlen(str);
    }
});

void strlenTestC(const char* str) {
    for (int i = 0; i < 100; i++) {
         std::strlen(str);
    }
}

void test(size_t length) {
    std::string str;
    str.resize(length, 'A');

    auto start = steady_clock::now().time_since_epoch();
    strlenTestC(str.c_str());
    auto c = steady_clock::now().time_since_epoch() - start;

    start = steady_clock::now().time_since_epoch();
    strlenTestJS(str.c_str());
    auto js = steady_clock::now().time_since_epoch() - start;

    cout << "length:" << length
        << " C:" << duration_cast<milliseconds>(c).count() / 100 << "ms"
        << " JS:" << duration_cast<milliseconds>(js).count() / 100 << "ms"
        << std::endl;
}

int main() {
    test(1024);
    test(1024 * 1024);
    test(10 * 1024 * 1024);
    test(50 * 1024 * 1024);
}

run with node.js

# em++ test.cpp -sALLOW_MEMORY_GROWTH=1
# Build with `-O0`
length:1024 C:0ms JS:0ms
length:1048576 C:0ms JS:0ms
length:10485760 C:1ms JS:6ms
length:52428800 C:9ms JS:37ms

# em++ test.cpp -O3 -sALLOW_MEMORY_GROWTH=1
# Build with `-O3`
length:1024 C:0ms JS:0ms
length:1048576 C:0ms JS:0ms
length:10485760 C:0ms JS:6ms
length:52428800 C:0ms JS:37ms

Known facts:

  1. use js to iterate TypeArray is slow
  2. use C/Wasm to calculate strlen is really fast

Discuss

UTF8ToString already takes a maxBytesToRead argument, but it is not really the string length, so, inside it, string length is still needed to be calculated.

C/C++ code already knows (or can be really fast to know) the actual length of a string.

So, a simple solution is to pass string length to UTF8ToString.


Possible solutions:

Solutions are simple.

  1. change the maxBytesToRead semantic to stringLength
    this breaks backward compatibility.

  2. add a third argument named stringLength

PS: there is also UTF16ToString, UTF32ToString.

@sbc100
Copy link
Collaborator

sbc100 commented Oct 14, 2020

This is pretty widely used function so I don't think we can break back-compat here. But using an extra arg or a seperate function sounds fine. This will be useful for C++ strings where the length of the string also already known a-priori.

I do have question though: Do you have a real world application the cost of doing strlen in JS is having a noticeable effect on performace? It doesn't seem that surprising that this cost dominates in your microbenchmark since that is all you are doing there.

@kripken
Copy link
Member

kripken commented Oct 15, 2020

Do we actually need to compute the length of the string? I'm not sure from the docs if TextDecoder.decode stops at a 0 or not.

If it doesn't, we should ask for an API that does...

@LanderlYoung
Copy link
Author

Thanks for the reply~

@sbc100 I'm still developing my ScriptEngine library, which intended to provide a unified C++ API for V8/JSCore/Lua and now WASM.

My library doesn't have any real-world application using on WASM for now (but will have very soon).

The result is tested on my UnitTest.

But since we already find the performance problem, I'd like to open a PR to fix it.

I'm not sure from the docs if TextDecoder.decode stops at a 0 or not.

reply @kripken According to the comment TextDecoder needs to know the byte length in advance, it doesn't stop on null terminator by itself. It doesn't.

LanderlYoung added a commit to LanderlYoung/emscripten that referenced this issue Oct 19, 2020
- add extra parameter `exactStringLength` to UTF8ToString,
This parameter is optional, and if given, the `maxBytesToRead` parameter is ignored.

to keep consistent API flavor, `UTF16ToString`, `UTF32ToString`, is also changed.
LanderlYoung added a commit to LanderlYoung/emscripten that referenced this issue Oct 19, 2020
- add extra parameter `exactStringLength` to UTF8ToString,
This parameter is optional, and if given, the `maxBytesToRead` parameter is ignored.

to keep consistent API flavor, `UTF16ToString`, `UTF32ToString`, is also changed.
@kripken
Copy link
Member

kripken commented Oct 19, 2020

@LanderlYoung I wonder if that comment is accurate, though. Reading the MDN docs I'm not sure. Also, there may be another Web API for this, as it does seem pretty obvious. Worth looking for if someone has time.

LanderlYoung pushed a commit to LanderlYoung/emscripten that referenced this issue Oct 20, 2020
introduce new functions
- UTF8ToStringWithLength
- UTF16ToStringWithLength
- UTF32ToStringWithLength

Decode string exactly length with given length, any '\0' in between will
be kept as-is.

Those functions require an argument `lengthInBytes`, so no need to iterator
the heap to find a null-terminator, thus have better performance.
LanderlYoung added a commit to LanderlYoung/emscripten that referenced this issue Oct 20, 2020
introduce new functions
- UTF8ToStringWithLength
- UTF16ToStringWithLength
- UTF32ToStringWithLength

Decode string exactly length with given length, any '\0' in between will
be kept as-is.

Those functions require an argument `lengthInBytes`, so no need to iterator
the heap to find a null-terminator, thus have better performance.
@LanderlYoung
Copy link
Author

LanderlYoung commented Oct 20, 2020

@LanderlYoung I wonder if that comment is accurate, though. Reading the MDN docs I'm not sure. Also, there may be another Web API for this, as it does seem pretty obvious. Worth looking for if someone has time.

@kripken As my tested shows, the comment is accurate. MDN does not clarify the situation when there is a '\0' inside the string. But Unicode standard does allow '\0' inside strings. So, as far as I know, the behavior is correct.

As for other APIs, neither do I have any idea
BTW, once there was a StringView API, but now It seems deprecated. https://developer.mozilla.org/en-US/docs/Archive/Add-ons/Code_snippets/StringView_library

and also test code:

const decoder = new TextDecoder("utf-8");

const raw = new Int8Array([
    0,  // '\0'
    0,  // '\0'
    0,  // '\0'
    65, // 'A'
    65, // 'A'
    0,  // '\0'
    65, // 'A'
    0,  // '\0'
    0,  // '\0'
]);

const str = "\0\0\0AA\0A\0\0";

const decodedString = decoder.decode(raw);

console.assert(str[0] === '\0');
console.assert(str[3] === 'A');
console.assert(str[7] === '\0');
console.assert(str === decodedString);

console.log("test done.");

LanderlYoung added a commit to LanderlYoung/emscripten that referenced this issue Oct 20, 2020
introduce new functions
- UTF8ToStringWithLength
- UTF16ToStringWithLength
- UTF32ToStringWithLength

Decode string exactly length with given length, any '\0' in between will
be kept as-is.

Those functions require an argument `lengthInBytes`, so no need to iterator
the heap to find a null-terminator, thus have better performance.
LanderlYoung added a commit to LanderlYoung/emscripten that referenced this issue Oct 21, 2020
introduce new functions:
- UTF8ToStringNBytes
- UTF16ToStringNBytes
- UTF32ToStringNBytes

add docs to preamble.js.rst

Decode string exactly length with given length, any '\0' in between will
be kept as-is.

Those functions require an argument `lengthInBytes`, so no need to iterator
the heap to find a null-terminator, thus have better performance.
@kripken
Copy link
Member

kripken commented Oct 21, 2020

I see, thanks @LanderlYoung

@RReverser , maybe you know: is there really no Web API for "get a JS string from a UTF-8 data starting at an offset in a typed array, and possibly not using the entire typed array"?

@RReverser
Copy link
Collaborator

@kripken That's what subarray is for - it creates a cheap view into the existing ArrayBuffer, which can then be passed to TextDecoder.

The bigger problem, however, is that TextDecoder and TextEncoder themselves are fairly slow, as they're parts of DOM API (implemented in C++) and any usage, especially for small strings, ends up dominated by JS <-> C++ communication overhead.

When I worked on wasm-bindgen, I made some deeper analysis of this problem and managed to get significant speed-ups by avoiding the TextEncoder / TextDecoder code path as long as the string is ASCII-only (and switching to those APIs once I've hit a Unicode character).

You can check rustwasm/wasm-bindgen#1470 (comment) for some benchmark comparisons and more details - perhaps Emscripten would be able to do the same? (AFAIK it already has code for manual encoding / decoding, it's just currently unused when TextEncoder & TextDecoder are detected, so it's just a matter of merging code paths together.)

@RReverser
Copy link
Collaborator

Oh and FWIW as for step 1 one more thing you could try is replace manual

  // (1)  find the '\0' terminator -- this is slow
  while (heap[endPtr] && !(endPtr >= endIdx)) ++endPtr;

with something like

  endPtr = heap.subarray(0, endIdx).indexOf(0, endPtr);
  if (endPtr == -1) endPtr = endIdx;

Not 100% sure if it would be faster due to extra subarray indirection though.

LanderlYoung added a commit to LanderlYoung/emscripten that referenced this issue Oct 22, 2020
introduce new functions:
- UTF8ToStringNBytes
- UTF16ToStringNBytes
- UTF32ToStringNBytes

add docs to preamble.js.rst

Decode string exactly length with given length, any '\0' in between will
be kept as-is.

Those functions require an argument `lengthInBytes`, so no need to iterator
the heap to find a null-terminator, thus have better performance.
@kripken
Copy link
Member

kripken commented Oct 22, 2020

Thanks @RReverser , sounds interesting to investigate those.

LanderlYoung added a commit to LanderlYoung/emscripten that referenced this issue Oct 23, 2020
introduce new functions:
- UTF8ToStringNBytes
- UTF16ToStringNBytes
- UTF32ToStringNBytes

add docs to preamble.js.rst

Decode string exactly length with given length, any '\0' in between will
be kept as-is.

Those functions require an argument `lengthInBytes`, so no need to iterator
the heap to find a null-terminator, thus have better performance.
LanderlYoung added a commit to LanderlYoung/emscripten that referenced this issue Oct 26, 2020
introduce new functions:
- UTF8ToStringNBytes
- UTF16ToStringNBytes
- UTF32ToStringNBytes

add docs to preamble.js.rst

Decode string exactly length with given length, any '\0' in between will
be kept as-is.

Those functions require an argument `lengthInBytes`, so no need to iterator
the heap to find a null-terminator, thus have better performance.
@stale
Copy link

stale bot commented Apr 17, 2022

This issue has been automatically marked as stale because there has been no activity in the past year. It will be closed automatically if no further activity occurs in the next 30 days. Feel free to re-open at any time if this issue is still relevant.

@liquidaty
Copy link

liquidaty commented Oct 16, 2022

An alternative that works better for me than `UTF8ToString` is ``` String.fromCharCode.apply(null, new Uint8Array(Module.HEAP8.buffer, s, length)) ```

Performance is roughly the same, but it handles invalid bytes such as \uDEAD better for my use case (perhaps "properly" but you be the judge) i.e. with malformed input, JSON.stringify() outputs valid JSON with the result from String.fromCharCode, but not with the output of UTF8ToString

EDIT: this is a bad idea

@RReverser
Copy link
Collaborator

@liquidaty Well two big differences are that 1) this doesn't handle UTF-8, it can only handle ASCII characters correctly and 2) on any long string it will overflow the stack - that's why you should almost never use String.fromCharCode.apply on inputs with arbitrary length.

So yes, it's faster, but it's because it does something quite different and in a more unsafe manner.

@liquidaty
Copy link

@RReverser ah. duh. terrible idea. Thanks for clarifying!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants