📝 add note for wstring handling

nlohmann · Aug 1, 2021 · eb488bb · eb488bb
1 parent a544032
commit eb488bb
Show file tree

Hide file tree

Showing 2 changed files with 41 additions and 1 deletion.
diff --git a/README.md b/README.md
@@ -1612,6 +1612,7 @@ The library supports **Unicode input** as follows:
 - Invalid surrogates (e.g., incomplete pairs such as `\uDEAD`) will yield parse errors.
 - The strings stored in the library are UTF-8 encoded. When using the default string type (`std::string`), note that its length/size functions return the number of stored bytes rather than the number of characters or glyphs.
 - When you store strings with different encodings in the library, calling [`dump()`](https://nlohmann.github.io/json/api/basic_json/dump/) may throw an exception unless `json::error_handler_t::replace` or `json::error_handler_t::ignore` are used as error handlers.
+- To store wide strings (e.g., `std::wstring`), you need to convert them to a a UTF-8 encoded `std::string` before, see [an example](https://json.nlohmann.me/home/faq/#wide-string-handling).
 
 ### Comments in JSON
 

diff --git a/doc/mkdocs/docs/home/faq.md b/doc/mkdocs/docs/home/faq.md
@@ -44,7 +44,7 @@ for objects.
 
 !!! question
 
-	- Can you add an option to ignore trailing commas?
+	Can you add an option to ignore trailing commas?
 
 This library does not support any feature which would jeopardize interoperability.
 
@@ -70,6 +70,45 @@ The library supports **Unicode input** as follows:
 In most cases, the parser is right to complain, because the input is not UTF-8 encoded. This is especially true for Microsoft Windows where Latin-1 or ISO 8859-1 is often the standard encoding.
 
 
+### Wide string handling
+
+!!! question
+
+    Why are wide strings (e.g., `std::wstring`) dumped as arrays of numbers?
+
+As described [above](#parse-errors-reading-non-ascii-characters), the library assumes UTF-8 as encoding.  To store a wide string, you need to change the encoding.
+
+!!! example
+
+    ```cpp
+    #include <codecvt> // codecvt_utf8
+    #include <locale>  // wstring_convert
+
+    // encoding function
+    std::string to_utf8(std::wstring& wide_string)
+    {
+        static std::wstring_convert<std::codecvt_utf8<wchar_t>> utf8_conv;
+        return utf8_conv.to_bytes(wide_string);
+    }
+
+    json j;
+    std::wstring ws = L"車B1234 こんにちは";
+
+    j["original"] = ws;
+    j["encoded"] = to_utf8(ws);
+
+    std::cout << j << std::endl;
+    ```
+
+    The result is:
+
+    ```json
+    {
+      "encoded": "車B1234 こんにちは",
+      "original": [36554, 66, 49, 50, 51, 52, 32, 12371, 12435, 12395, 12385, 12399]
+    }
+    ```
+
 ## Exceptions
 
 ### Parsing without exceptions