Skip to content

Commit

Permalink
📝 add note for wstring handling
Browse files Browse the repository at this point in the history
  • Loading branch information
nlohmann committed Aug 1, 2021
1 parent a544032 commit eb488bb
Show file tree
Hide file tree
Showing 2 changed files with 41 additions and 1 deletion.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -1612,6 +1612,7 @@ The library supports **Unicode input** as follows:
- Invalid surrogates (e.g., incomplete pairs such as `\uDEAD`) will yield parse errors.
- The strings stored in the library are UTF-8 encoded. When using the default string type (`std::string`), note that its length/size functions return the number of stored bytes rather than the number of characters or glyphs.
- When you store strings with different encodings in the library, calling [`dump()`](https://nlohmann.github.io/json/api/basic_json/dump/) may throw an exception unless `json::error_handler_t::replace` or `json::error_handler_t::ignore` are used as error handlers.
- To store wide strings (e.g., `std::wstring`), you need to convert them to a a UTF-8 encoded `std::string` before, see [an example](https://json.nlohmann.me/home/faq/#wide-string-handling).

### Comments in JSON

Expand Down
41 changes: 40 additions & 1 deletion doc/mkdocs/docs/home/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ for objects.

!!! question

- Can you add an option to ignore trailing commas?
Can you add an option to ignore trailing commas?

This library does not support any feature which would jeopardize interoperability.

Expand All @@ -70,6 +70,45 @@ The library supports **Unicode input** as follows:
In most cases, the parser is right to complain, because the input is not UTF-8 encoded. This is especially true for Microsoft Windows where Latin-1 or ISO 8859-1 is often the standard encoding.


### Wide string handling

!!! question

Why are wide strings (e.g., `std::wstring`) dumped as arrays of numbers?

As described [above](#parse-errors-reading-non-ascii-characters), the library assumes UTF-8 as encoding. To store a wide string, you need to change the encoding.

!!! example

```cpp
#include <codecvt> // codecvt_utf8
#include <locale> // wstring_convert

// encoding function
std::string to_utf8(std::wstring& wide_string)
{
static std::wstring_convert<std::codecvt_utf8<wchar_t>> utf8_conv;
return utf8_conv.to_bytes(wide_string);
}

json j;
std::wstring ws = L"車B1234 こんにちは";

j["original"] = ws;
j["encoded"] = to_utf8(ws);

std::cout << j << std::endl;
```

The result is:

```json
{
"encoded": "車B1234 こんにちは",
"original": [36554, 66, 49, 50, 51, 52, 32, 12371, 12435, 12395, 12385, 12399]
}
```

## Exceptions

### Parsing without exceptions
Expand Down

0 comments on commit eb488bb

Please sign in to comment.