-
Notifications
You must be signed in to change notification settings - Fork 14
Decoding examples
- Basic decoding
- Error handling with exceptions
- Error handling without exceptions
- Error handling with substitutions
Text_view iterators produce characters (a class object with an associated character set and code point value) as their element type. In the following example, note that \u00F8 (LATIN SMALL LETTER O WITH STROKE)
is encoded as UTF-8 using two code units (\xC3\xB8
), but iterator based enumeration sees just the single code point.
using CT = utf8_encoding::character_type;
auto tv = make_text_view<utf8_encoding>(u8"J\u00F8erg is my friend");
auto it = tv.begin();
assert(*it++ == CT{0x004A}); // 'J'
assert(*it++ == CT{0x00F8}); // 'ø'
assert(*it++ == CT{0x0065}); // 'e'
The iterators and ranges that Text_view provides are compatible with the non-modifying sequence utilities provided by the standard C++ <algorithm>
library. This enables use of standard algorithms to search encoded text.
it = std::find(tv.begin(), tv.end(), CT{0x00F8});
assert(it != tv.end());
The iterators provided by Text_view also provide access to the underlying code unit sequence.
auto base_it = it.base_range().begin();
assert(*base_it++ == '\xC3');
assert(*base_it++ == '\xB8');
assert(base_it == it.base_range().end());
Text_view ranges can be used in range-based for statements.
for (const auto &ch : tv) {
...
}
By default, exceptions are thrown when errors occur during decoding operations.
auto tv = make_text_view<utf8_encoding>("\xc2"); // Invalid UTF-8 code unit sequence.
auto it = tv.begin();
try {
auto c = *it; // Throws 'text_decode_error'.
} catch (text_decode_error &tde) {
// Exception caught.
}
Text_view iterators allow checking for error conditions before exceptions are thrown.
auto tv = make_text_view<utf8_encoding>("\xc2"); // Invalid UTF-8 code unit sequence.
auto it = tv.begin();
assert(it.error_occurred());
decode_status ds = it.get_error();
assert(ds == decode_status::invalid_code_unit_sequence);
}
Text_view's error policies allow creating views and iterators that substitute a character set specific substitution character when errors are encountered in a code unit sequence. For example:
auto tv = make_text_view<utf8_encoding, text_permissive_error_policy>("\xc2"); // Invalid UTF-8 code unit sequence.
auto it = tv.begin();
assert(*it == CT{0xFFFD}); // Unicode substitution character.