-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jsrt: Modify signature of JsCopyString #3433
Conversation
Looks like there are still some references to |
lib/Jsrt/ChakraCore.h
Outdated
/// <param name="writtenLength">Total number of characters written. This is only | ||
/// populated when passed with non-null `buffer`. | ||
/// </param> | ||
/// <param name="actualLength">Total number of UTF8 characters present in `value`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this "utf8 characters"/codepoints, or "bytes"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be number of UTF8 decoded "bytes" corresponding to code-points present in "value". i will fix that up.
Yes, the fix for |
lib/Jsrt/ChakraCore.h
Outdated
@@ -292,7 +292,13 @@ CHAKRA_API | |||
/// <param name="value">JavascriptString value</param> | |||
/// <param name="buffer">Pointer to buffer</param> | |||
/// <param name="bufferSize">Buffer size</param> | |||
/// <param name="written">Total number of characters written</param> | |||
/// <param name="writtenLength">Total number of characters written. This is only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should also be clarified in the same way as the actualLength
parameter.
Might also be worth clarifying that there is no trailing null character?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, also just noticed in the remarks
section above it still refers to written
instead of actualLength
.
For some reason the xplat builds can't see |
Yes, just realized that |
@kunalspathak since size_t wlength = 0;
size_t alength = 0;
JsCopyString(jsString, nullptr, 0, &wlength, &alength);
char* buffer = malloc(alength+1);
JsCopyString(jsString, buffer, alength+1, &wlength, &alength); |
@kunalspathak sorry for joining conversation late but isn't this very expensive to check if the source string is all ascii? |
I just didn't understand the scenario. |
Currently in node we do roughly what @liminzhu mentions, calling If we have the length (in utf16 characters) of the string, then we can guess that maybe the string is made of characters with a 1 byte utf8 encoding (often true), and so if we provide a buffer of that size up front we may be able to avoid a second call unless it turns out we actually need more space. |
Our initial version was a single call (previous method was allocating necessary space) This version will call Utf8Str and that will end up allocating memory anyways? If you know the string length (as in number of letters (ascii and utf8)) You can safely pass a buffer with (length * 3) + 1 (if buffer is char) otherwise (length * 2) + 1 (if buffer is utf16) Actually internal Utf8Str does a similar trick to make things faster. Checking a string (whether if it's all ASCII or not) is a basic loop that you may also introduce before calling this too? |
If there's significant overhead of calling Do we need to do anything for |
IMHO, calling malloc once or twice doesn't make much difference. Loop is the most important part here. If we know the string length, we don't need to change anything. Otherwise, I'm not sure the perf gain here. |
@obastemur - it makes a difference. From what I see there is 300MB worth of heap allocation and free happening inside
I am not sure what you mean when you mean by this. Could you elaborate? I spoke offline with @MSLaguana about this and we think here is the problem. Today to perform a copy, we do 3 allocations
What should happen inside
Thus, we can replace 3 malloc/free with just 1 in best case but that is beyond the scope of this PR. I still would like to get in new |
See https://github.com/Microsoft/ChakraCore/blob/master/lib/Common/Codex/Utf8Codex.cpp#L326 for checking ASCII. If we just need to know whether a string has any multi byte chars in it, this could be the approach? Current design also help for fail-safe. In case of a basic ASCII check fails.
I would be really surprised if this costs something tangible. IIRC; we were loosing majority of the time on double looping inside the Utf8ToStr. Now we don't. |
@kunalspathak Sorry I chime in late too. To me adding a Also there is another workaround for your purpose: If you really prefer an API... We don't have an API to get Array length, do we? How about a common JsGetLength, that works on String, Array, (TypedArray)... any object? [Update]: The String APIs and implementations were put up quickly and likely not optimized for many scenarios. I recall node has a code path that only interested in utf8 byte length? For that you can skip One question: You mentioned |
Regarding experimental - I haven't seen any flag except mentioning it in wiki. |
Modified signature of `JsCopyString` to also return actual count of UTF8 bytes present in jsString. With this information, host can simply allocate a buffer assuming all characters are ascii and based on `writtenLength` and `actualLength` values returned by the API, it can decide if the assumption was correct i.e. `writtenLength == actualLength` or it should take slow path to call `JsCopyString` again by passing bigger buffer equal to size of`actualLength`. Today, if host wants to copy UTF8 representation of javascript string into a buffer, here is how it is done: ```c++ size_t length = 0; JsCopyString(jsString, nullptr, 0, &length); char* buffer = malloc(length); size_t written = 0; JsCopyString(jsString, buffer, length, &written); assert(written == length); ``` can be changed to ```c++ size_t actualLength = 0; size_t writtenLength = 0; size_t strLength = 0; JsStringToPointer(strRef, nullptr, &strLength); char* buffer = malloc(strLength + 1); JsCopyString(jsString, buffer, strLength, &writtenLength, &actualLength); // slow path if jsString contains non-ascii characters if(written != actualLength) { free(buffer); buffer = malloc(actualLength + 1); JsCopyString(jsString, buffer, actualLength, &writtenLength, &actualLength); } ```
a7a2e41
to
356656c
Compare
@dotnet-bot test OSX static_osx_osx_release please |
Thanks everyone for the feedback and review! |
Merge pull request #3433 from pr/kunalspathak/1.7 Modified signature of `JsCopyString` to also return actual count of UTF8 bytes present in jsString. With this information, host can simply allocate a buffer assuming all characters are ascii and based on `writtenLength` and `actualLength` values returned by the API, it can decide if the assumption was correct i.e. `writtenLength == actualLength` or it should take slow path to call `JsCopyString` again by passing bigger buffer equal to size of`actualLength`. Today, if host wants to copy UTF8 representation of javascript string into a buffer, here is how it is done: ```c++ size_t length = 0; JsCopyString(jsString, nullptr, 0, &length); char* buffer = malloc(length); size_t written = 0; JsCopyString(jsString, buffer, length, &written); assert(written == length); ``` can be changed to ```c++ size_t actualLength = 0; size_t writtenLength = 0; size_t strLength = 0; JsStringToPointer(strRef, nullptr, &strLength); char* buffer = malloc(strLength + 1); JsCopyString(jsString, buffer, strLength, &writtenLength, &actualLength); // slow path if jsString contains non-ascii characters if(writtenLength != actualLength) { free(buffer); buffer = malloc(actualLength + 1); JsCopyString(jsString, buffer, actualLength, &writtenLength, &actualLength); } ```
…CopyString Merge pull request #3433 from pr/kunalspathak/1.7 Modified signature of `JsCopyString` to also return actual count of UTF8 bytes present in jsString. With this information, host can simply allocate a buffer assuming all characters are ascii and based on `writtenLength` and `actualLength` values returned by the API, it can decide if the assumption was correct i.e. `writtenLength == actualLength` or it should take slow path to call `JsCopyString` again by passing bigger buffer equal to size of`actualLength`. Today, if host wants to copy UTF8 representation of javascript string into a buffer, here is how it is done: ```c++ size_t length = 0; JsCopyString(jsString, nullptr, 0, &length); char* buffer = malloc(length); size_t written = 0; JsCopyString(jsString, buffer, length, &written); assert(written == length); ``` can be changed to ```c++ size_t actualLength = 0; size_t writtenLength = 0; size_t strLength = 0; JsStringToPointer(strRef, nullptr, &strLength); char* buffer = malloc(strLength + 1); JsCopyString(jsString, buffer, strLength, &writtenLength, &actualLength); // slow path if jsString contains non-ascii characters if(writtenLength != actualLength) { free(buffer); buffer = malloc(actualLength + 1); JsCopyString(jsString, buffer, actualLength, &writtenLength, &actualLength); } ```
Modified signature of
JsCopyString
to also return actual count of UTF8 bytes present in jsString.With this information, host can simply allocate a buffer assuming all characters are ascii and
based on
writtenLength
andactualLength
values returned by the API, it can decide if the assumption was correcti.e.
writtenLength == actualLength
or it should take slow path to callJsCopyString
again by passing bigger bufferequal to size of
actualLength
.Today, if host wants to copy UTF8 representation of javascript string into a buffer, here is how it is done:
can be changed to