Jsrt: Modify signature of JsCopyString #3433

kunalspathak · 2017-07-26T01:01:48Z

Modified signature of JsCopyString to also return actual count of UTF8 bytes present in jsString.
With this information, host can simply allocate a buffer assuming all characters are ascii and
based on writtenLength and actualLength values returned by the API, it can decide if the assumption was correct
i.e. writtenLength == actualLength or it should take slow path to call JsCopyString again by passing bigger buffer
equal to size ofactualLength.

Today, if host wants to copy UTF8 representation of javascript string into a buffer, here is how it is done:

   size_t length = 0;
   JsCopyString(jsString, nullptr, 0, &length);

   char* buffer = malloc(length);

   size_t written = 0;
   JsCopyString(jsString, buffer, length, &written);
   assert(written == length);

can be changed to

   size_t actualLength = 0;
   size_t writtenLength = 0;
   size_t strLength = 0;
   JsStringToPointer(strRef, nullptr, &strLength);
   char* buffer = malloc(strLength + 1);
   JsCopyString(jsString, buffer, strLength, &writtenLength, &actualLength);

   // slow path if jsString contains non-ascii characters
   if(writtenLength != actualLength) {
      free(buffer);
      buffer = malloc(actualLength + 1);

      JsCopyString(jsString, buffer, actualLength, &writtenLength, &actualLength);
   }

MSLaguana · 2017-07-26T16:02:04Z

Looks like there are still some references to JsCopyString with 4 arguments in ch, and you need to initialize writtenLength to appease prefast.

MSLaguana · 2017-07-26T16:05:45Z

lib/Jsrt/ChakraCore.h

+/// <param name="writtenLength">Total number of characters written. This is only
+///                             populated when passed with non-null `buffer`.
+/// </param>
+/// <param name="actualLength">Total number of UTF8 characters present in `value`.


Is this "utf8 characters"/codepoints, or "bytes"?

I think it should be number of UTF8 decoded "bytes" corresponding to code-points present in "value". i will fix that up.

kunalspathak · 2017-07-26T17:27:30Z

Yes, the fix for ch is on the way.

MSLaguana · 2017-07-26T18:15:14Z

lib/Jsrt/ChakraCore.h

@@ -292,7 +292,13 @@ CHAKRA_API
 /// <param name="value">JavascriptString value</param>
 /// <param name="buffer">Pointer to buffer</param>
 /// <param name="bufferSize">Buffer size</param>
-/// <param name="written">Total number of characters written</param>
+/// <param name="writtenLength">Total number of characters written. This is only


I think this should also be clarified in the same way as the actualLength parameter.

Might also be worth clarifying that there is no trailing null character?

Oh, also just noticed in the remarks section above it still refers to written instead of actualLength.

MSLaguana · 2017-07-26T18:17:29Z

For some reason the xplat builds can't see JsStringToPointer

kunalspathak · 2017-07-26T19:24:33Z

Yes, just realized that JsStringToPointer is windows only API. The reason i was using it was to easily get the string length (no. of ascii characters) on which my assumption is based on. I could use JsGetStringLength API, but that returns length in int rather than size_t which i think is dangerous. Other option i could use is read property length of string object, but it might trigger getter, trap, etc.
Instead I am proposing a new API JsGetStringLengthSizeT (I am open to have a better name for this). I will add that API under experimental and use it for ch to do what I am trying to do.

liminzhu · 2017-07-26T21:16:03Z

@kunalspathak since JsCopyString would return the actual length, how about calling JsCopyString(jsString, nullptr, 0, &wlength, &alength) to determine the buffer size needed instead of exposing JsGetStringLengthSizeT? It would look like this.

   size_t wlength = 0;
   size_t alength = 0;
   JsCopyString(jsString, nullptr, 0, &wlength, &alength);
   char* buffer = malloc(alength+1);
   JsCopyString(jsString, buffer, alength+1, &wlength, &alength);

obastemur · 2017-07-26T21:21:41Z

@kunalspathak sorry for joining conversation late but isn't this very expensive to check if the source string is all ascii?

obastemur · 2017-07-26T21:22:54Z

I just didn't understand the scenario.

MSLaguana · 2017-07-26T21:34:49Z

Currently in node we do roughly what @liminzhu mentions, calling JsCopyString twice, once with no buffer to get the length, and then once with a sufficiently large buffer to get the string out. However, JsCopyString internally performs allocations to put the utf8-encoded string somewhere, since we store as utf-16 internally. One of the main motivations of this change is to try and reduce the number of those allocations by having fewer calls to JsCopyString.

If we have the length (in utf16 characters) of the string, then we can guess that maybe the string is made of characters with a 1 byte utf8 encoding (often true), and so if we provide a buffer of that size up front we may be able to avoid a second call unless it turns out we actually need more space.

obastemur · 2017-07-26T21:55:04Z

Our initial version was a single call (previous method was allocating necessary space)

This version will call Utf8Str and that will end up allocating memory anyways? If you know the string length (as in number of letters (ascii and utf8)) You can safely pass a buffer with (length * 3) + 1 (if buffer is char) otherwise (length * 2) + 1 (if buffer is utf16)

Actually internal Utf8Str does a similar trick to make things faster.

Checking a string (whether if it's all ASCII or not) is a basic loop that you may also introduce before calling this too?

liminzhu · 2017-07-26T22:16:42Z

If there's significant overhead of calling JsCopyString, maybe we can have a fast path for when buffer is nullptr to just get the written/actual length (you need to pay the cost somewhere and get those anyways from Kunal's opening comment)? The calling JsCopyString twice pattern is relatively concise and consistent with how we use other APIs like JsSerializeScript.

Do we need to do anything for JsCopyStringUtf16?

obastemur · 2017-07-26T22:35:28Z

IMHO, calling malloc once or twice doesn't make much difference. Loop is the most important part here. If we know the string length, we don't need to change anything. Otherwise, I'm not sure the perf gain here.

kunalspathak · 2017-07-26T23:09:33Z

IMHO, calling malloc once or twice doesn't make much difference.

@obastemur - it makes a difference. From what I see there is 300MB worth of heap allocation and free happening inside JsCopyString which can be reduced to half. I thought about a solution to just initialize a buffer of (strLen * 3) + 1, but in most of the cases we will be allocating more memory than needed. To give you an example, in one of the sample app that I was running, 108K times string contains all ascii characters and only 60 times they contain Unicode characters (mainly characters in script file like this). So definitely avoiding a call to JsCopyString helps.

Checking a string (whether if it's all ASCII or not) is a basic loop that you may also introduce before calling this too?

I am not sure what you mean when you mean by this. Could you elaborate?

I spoke offline with @MSLaguana about this and we think here is the problem. Today to perform a copy, we do 3 allocations

By NarrowWideConverter to calculate the length
Allocate the buffer of size length
By NarrowWideConverter to copy the contents into buffer

What should happen inside JsCopyString is

If user doesn't pass a buffer, skip allocation inside NarrowWideConverter and have different way of calculating the length.
If user pass a buffer, use it to copy contents to, instead of creating own inside NarrowWideConverter.
If the buffer that was passed was not big enough, then return back the offset of utf16 upto which contents were copied. Host can then do a realloc with proper length and again call JsCopyString with the offset and resume copying. In this step too, we will avoid allocating buffer inside NarrowWideConverter.

Thus, we can replace 3 malloc/free with just 1 in best case but that is beyond the scope of this PR. I still would like to get in new JsGetStringLengthSizeT and changes to JsCopyString into 1.7 and from there into node-chakracore. We can come back and fine tune JsCopyString later.

obastemur · 2017-07-26T23:54:15Z

See https://github.com/Microsoft/ChakraCore/blob/master/lib/Common/Codex/Utf8Codex.cpp#L326 for checking ASCII. If we just need to know whether a string has any multi byte chars in it, this could be the approach? Current design also help for fail-safe. In case of a basic ASCII check fails.

it makes a difference. From what I see there is 300MB worth of heap allocation and free happening inside JsCopyString which can be reduced to half.

I would be really surprised if this costs something tangible. IIRC; we were loosing majority of the time on double looping inside the Utf8ToStr. Now we don't.

jianchun · 2017-07-27T17:06:06Z

@kunalspathak Sorry I chime in late too. To me adding a JsGetStringLengthSizeT looks a bit awkward when we already have JsGetStringLength. Although int isn't ideal, it works at least for now and probably long in future. We have other APIs taking int parameters (JsCopyStringUtf16).

Also there is another workaround for your purpose: JsCopyStringUtf16(strVar, 0, -1, nullptr, &length).

If you really prefer an API... We don't have an API to get Array length, do we? How about a common JsGetLength, that works on String, Array, (TypedArray)... any object?

[Update]: The String APIs and implementations were put up quickly and likely not optimized for many scenarios. I recall node has a code path that only interested in utf8 byte length? For that you can skip NarrowWideConverter, instead change utf8 converter function into a template that takes a dummy output iterator which only counts output bytes. Use that when only requesting utf8 byte length. No memory allocation at all.

One question: You mentioned add API under experimental -- What's our mechanism for experimental? Do we #ifdef or mark some compile time attributes or anything?

kunalspathak · 2017-07-27T19:20:54Z

Thanks @jianchun for the suggestion. I think you are right about ok to have length as int rather than introducing a new API for size_t. I will use JsGetStringLength() instead.
Regarding optimizing converter, I agree and what I mentioned here but I would keep that as a separate PR.

kunalspathak · 2017-07-27T19:26:26Z

Regarding experimental - I haven't seen any flag except mentioning it in wiki.

Modified signature of `JsCopyString` to also return actual count of UTF8 bytes present in jsString. With this information, host can simply allocate a buffer assuming all characters are ascii and based on `writtenLength` and `actualLength` values returned by the API, it can decide if the assumption was correct i.e. `writtenLength == actualLength` or it should take slow path to call `JsCopyString` again by passing bigger buffer equal to size of`actualLength`. Today, if host wants to copy UTF8 representation of javascript string into a buffer, here is how it is done: ```c++ size_t length = 0; JsCopyString(jsString, nullptr, 0, &length); char* buffer = malloc(length); size_t written = 0; JsCopyString(jsString, buffer, length, &written); assert(written == length); ``` can be changed to ```c++ size_t actualLength = 0; size_t writtenLength = 0; size_t strLength = 0; JsStringToPointer(strRef, nullptr, &strLength); char* buffer = malloc(strLength + 1); JsCopyString(jsString, buffer, strLength, &writtenLength, &actualLength); // slow path if jsString contains non-ascii characters if(written != actualLength) { free(buffer); buffer = malloc(actualLength + 1); JsCopyString(jsString, buffer, actualLength, &writtenLength, &actualLength); } ```

kunalspathak · 2017-07-27T22:34:52Z

@dotnet-bot test OSX static_osx_osx_release please

kunalspathak · 2017-07-27T22:48:56Z

Thanks everyone for the feedback and review!

Merge pull request #3433 from pr/kunalspathak/1.7 Modified signature of `JsCopyString` to also return actual count of UTF8 bytes present in jsString. With this information, host can simply allocate a buffer assuming all characters are ascii and based on `writtenLength` and `actualLength` values returned by the API, it can decide if the assumption was correct i.e. `writtenLength == actualLength` or it should take slow path to call `JsCopyString` again by passing bigger buffer equal to size of`actualLength`. Today, if host wants to copy UTF8 representation of javascript string into a buffer, here is how it is done: ```c++ size_t length = 0; JsCopyString(jsString, nullptr, 0, &length); char* buffer = malloc(length); size_t written = 0; JsCopyString(jsString, buffer, length, &written); assert(written == length); ``` can be changed to ```c++ size_t actualLength = 0; size_t writtenLength = 0; size_t strLength = 0; JsStringToPointer(strRef, nullptr, &strLength); char* buffer = malloc(strLength + 1); JsCopyString(jsString, buffer, strLength, &writtenLength, &actualLength); // slow path if jsString contains non-ascii characters if(writtenLength != actualLength) { free(buffer); buffer = malloc(actualLength + 1); JsCopyString(jsString, buffer, actualLength, &writtenLength, &actualLength); } ```

…CopyString Merge pull request #3433 from pr/kunalspathak/1.7 Modified signature of `JsCopyString` to also return actual count of UTF8 bytes present in jsString. With this information, host can simply allocate a buffer assuming all characters are ascii and based on `writtenLength` and `actualLength` values returned by the API, it can decide if the assumption was correct i.e. `writtenLength == actualLength` or it should take slow path to call `JsCopyString` again by passing bigger buffer equal to size of`actualLength`. Today, if host wants to copy UTF8 representation of javascript string into a buffer, here is how it is done: ```c++ size_t length = 0; JsCopyString(jsString, nullptr, 0, &length); char* buffer = malloc(length); size_t written = 0; JsCopyString(jsString, buffer, length, &written); assert(written == length); ``` can be changed to ```c++ size_t actualLength = 0; size_t writtenLength = 0; size_t strLength = 0; JsStringToPointer(strRef, nullptr, &strLength); char* buffer = malloc(strLength + 1); JsCopyString(jsString, buffer, strLength, &writtenLength, &actualLength); // slow path if jsString contains non-ascii characters if(writtenLength != actualLength) { free(buffer); buffer = malloc(actualLength + 1); JsCopyString(jsString, buffer, actualLength, &writtenLength, &actualLength); } ```

msftclas added the cla-not-required label Jul 26, 2017

kunalspathak requested review from digitalinfinity, MSLaguana and liminzhu July 26, 2017 01:02

MSLaguana reviewed Jul 26, 2017

View reviewed changes

kunalspathak force-pushed the pr/kunalspathak/1.7 branch from a7a2e41 to 356656c Compare July 27, 2017 21:38

MSLaguana approved these changes Jul 27, 2017

View reviewed changes

chakrabot merged commit 356656c into release/1.7 Jul 27, 2017

kunalspathak deleted the pr/kunalspathak/1.7 branch July 27, 2017 22:59

kunalspathak mentioned this pull request Jul 28, 2017

optimize StringUtf8::From nodejs/node-chakracore#348

Merged

2 tasks

liminzhu mentioned this pull request Aug 23, 2017

Linux_OSX_Shared/Sample.cpp Build Error microsoft/Chakra-Samples#80

Closed

ctaggart mentioned this pull request Aug 23, 2017

JsCopyString expected 5 parameters darfink/chakracore-rs#18

Closed

Cellule mentioned this pull request Aug 25, 2017

Keep AutoString::length in sync w/ AutoString::data #3588

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jsrt: Modify signature of JsCopyString #3433

Jsrt: Modify signature of JsCopyString #3433

kunalspathak commented Jul 26, 2017 •

edited

Loading

MSLaguana commented Jul 26, 2017

MSLaguana Jul 26, 2017

kunalspathak Jul 26, 2017

kunalspathak commented Jul 26, 2017

MSLaguana Jul 26, 2017

MSLaguana Jul 26, 2017

MSLaguana commented Jul 26, 2017

kunalspathak commented Jul 26, 2017 •

edited

Loading

liminzhu commented Jul 26, 2017 •

edited

Loading

obastemur commented Jul 26, 2017

obastemur commented Jul 26, 2017

MSLaguana commented Jul 26, 2017

obastemur commented Jul 26, 2017

liminzhu commented Jul 26, 2017 •

edited

Loading

obastemur commented Jul 26, 2017

kunalspathak commented Jul 26, 2017 •

edited

Loading

obastemur commented Jul 26, 2017

jianchun commented Jul 27, 2017 •

edited

Loading

kunalspathak commented Jul 27, 2017

kunalspathak commented Jul 27, 2017

kunalspathak commented Jul 27, 2017

kunalspathak commented Jul 27, 2017

Jsrt: Modify signature of JsCopyString #3433

Jsrt: Modify signature of JsCopyString #3433

Conversation

kunalspathak commented Jul 26, 2017 • edited Loading

MSLaguana commented Jul 26, 2017

MSLaguana Jul 26, 2017

Choose a reason for hiding this comment

kunalspathak Jul 26, 2017

Choose a reason for hiding this comment

kunalspathak commented Jul 26, 2017

MSLaguana Jul 26, 2017

Choose a reason for hiding this comment

MSLaguana Jul 26, 2017

Choose a reason for hiding this comment

MSLaguana commented Jul 26, 2017

kunalspathak commented Jul 26, 2017 • edited Loading

liminzhu commented Jul 26, 2017 • edited Loading

obastemur commented Jul 26, 2017

obastemur commented Jul 26, 2017

MSLaguana commented Jul 26, 2017

obastemur commented Jul 26, 2017

liminzhu commented Jul 26, 2017 • edited Loading

obastemur commented Jul 26, 2017

kunalspathak commented Jul 26, 2017 • edited Loading

obastemur commented Jul 26, 2017

jianchun commented Jul 27, 2017 • edited Loading

kunalspathak commented Jul 27, 2017

kunalspathak commented Jul 27, 2017

kunalspathak commented Jul 27, 2017

kunalspathak commented Jul 27, 2017

kunalspathak commented Jul 26, 2017 •

edited

Loading

kunalspathak commented Jul 26, 2017 •

edited

Loading

liminzhu commented Jul 26, 2017 •

edited

Loading

liminzhu commented Jul 26, 2017 •

edited

Loading

kunalspathak commented Jul 26, 2017 •

edited

Loading

jianchun commented Jul 27, 2017 •

edited

Loading