From da1069aa5a83127cb556087942e8f1feddd0a1fb Mon Sep 17 00:00:00 2001 From: Paul Bowen-Huggett Date: Fri, 12 Jan 2024 18:19:28 +0100 Subject: [PATCH 1/2] Move explicit conversion docs to their own page. --- README.md | 44 ++-------------- docs/explicit-conversion.md | 52 +++++++++++++++++++ docs/index.md | 101 +----------------------------------- 3 files changed, 57 insertions(+), 140 deletions(-) create mode 100644 docs/explicit-conversion.md diff --git a/README.md b/README.md index a9714aef..f6a390d2 100644 --- a/README.md +++ b/README.md @@ -24,7 +24,7 @@ There are broadly three ways to use the icubaby library: 1. [C++ 20 Range Adaptor](#c-20-range-adaptor) 2. An iterator interface -3. Manually driving the conversion +3. [Converting one code-unit at a time](#converting-one-code-unit-at-a-time) ### C++ 20 Range Adaptor @@ -54,7 +54,7 @@ it = t.end_cp (it); The `icubaby::iterator<>` class offers a familiar output iterator for using a transcoder. Each code unit from the input encoding is written to the iterator and this writes the output encoding to a second iterator. This enables use to use standard algorithms such as [`std::copy`](https://en.cppreference.com/w/cpp/algorithm/copy) with the library. -### Manually Driving the Conversion +### Converting One Code-Unit at a Time Let’s try converting a single Unicode emoji character 😀 (U+1F600 GRINNING FACE) expressed as four UTF-8 code units (0xF0, 0x9F, 0x98, 0x80) to UTF-16 (where it is the surrogate pair 0xD83D, 0xDE00). @@ -68,45 +68,7 @@ for (auto cu: {0xF0, 0x9F, 0x98, 0x80}) { it = t.end_cp (it); ~~~ -The `out` vector will contain a two UTF-16 code units 0xD83D and 0xDE00. - -#### Disecting this code - -1. Define where and how the output should be written: - - ~~~cpp - std::vector out; - auto it = std::back_inserter (out); - ~~~ - - For the purposes of this example, we write the encoded output to a `std::vector`. Use the container of your choice! - -2. Create the transcoder instance: - - ~~~cpp - icubaby::t8_16 t; - ~~~ - - [`transcoder<>`](#transcoder) is a template class which requires two arguments to define the input and output encoding. You may use `char8_t` (in C++ 20, or [`icubaby::char8`](#char8) in C++ 17 and later) for UTF-8, `char16_t` for UTF-16, and `char32_t` for UTF-32. For example, `icubaby::transcoder` will convert from UTF-16 to UTF-32; `icubaby::transcoder` will convert from UTF-8 to UTF-16. - - There is a collection of [nine typedefs](#helper-types) to make this a little more compact. Each is named `icubaby::t_I_O` where I and O are 8, 16, or 32. For example, `icubaby::t16_32` is equivalent to `icubaby::transcoder` and `icubaby::t8_16` means `icubaby::transcoder`. - -3. Pass each code unit and the output iterator to the transcoder. - - ~~~cpp - for (auto cu: {0xF0, 0x9F, 0x98, 0x80}) { - it = t (cu, it); - } - ~~~ - -4. Tell the transcoder that we’ve reached the end of the input. This ensures that the sequence didn’t end part way through a code point. - - ~~~cpp - it = t.end_cp (it); - ~~~ - - It’s only necessary to make a single call to `end_cp()` once *all* of the input has been fed to the transcoder. - +The `out` vector will contain a two UTF-16 code units 0xD83D and 0xDE00. See the [explicit conversion documentation](https://paulhuggett.github.io/icubaby/explicit-conversion.html) for more details. ## API diff --git a/docs/explicit-conversion.md b/docs/explicit-conversion.md new file mode 100644 index 00000000..a390ca7f --- /dev/null +++ b/docs/explicit-conversion.md @@ -0,0 +1,52 @@ +# Explicit Conversion + +Let’s try converting a single Unicode emoji character 😀 (U+1F600 GRINNING FACE) expressed as four UTF-8 code units (0xF0, 0x9F, 0x98, 0x80) to UTF-16 (where it is the surrogate pair 0xD83D, 0xDE00). + +~~~cpp +std::vector out; +auto it = std::back_inserter (out); +icubaby::t8_16 t; +for (auto cu: {0xF0, 0x9F, 0x98, 0x80}) { + it = t (cu, it); +} +it = t.end_cp (it); +~~~ + +The `out` vector will contain a two UTF-16 code units 0xD83D and 0xDE00. + +## Disecting this code + +1. Define where and how the output should be written: + + ~~~cpp + std::vector out; + auto it = std::back_inserter (out); + ~~~ + + For the purposes of this example, we write the encoded output to a `std::vector`. Use the container of your choice! + +2. Create the transcoder instance: + + ~~~cpp + icubaby::t8_16 t; + ~~~ + + [`transcoder<>`](#transcoder) is a template class which requires two arguments to define the input and output encoding. You may use `char8_t` (in C++ 20, or [`icubaby::char8`](#char8) in C++ 17 and later) for UTF-8, `char16_t` for UTF-16, and `char32_t` for UTF-32. For example, `icubaby::transcoder` will convert from UTF-16 to UTF-32; `icubaby::transcoder` will convert from UTF-8 to UTF-16. + + There is a collection of [nine typedefs](#helper-types) to make this a little more compact. Each is named `icubaby::t_I_O` where I and O are 8, 16, or 32. For example, `icubaby::t16_32` is equivalent to `icubaby::transcoder` and `icubaby::t8_16` means `icubaby::transcoder`. + +3. Pass each code unit and the output iterator to the transcoder. + + ~~~cpp + for (auto cu: {0xF0, 0x9F, 0x98, 0x80}) { + it = t (cu, it); + } + ~~~ + +4. Tell the transcoder that we’ve reached the end of the input. This ensures that the sequence didn’t end part way through a code point. + + ~~~cpp + it = t.end_cp (it); + ~~~ + + It’s only necessary to make a single call to `end_cp()` once *all* of the input has been fed to the transcoder. diff --git a/docs/index.md b/docs/index.md index 6038dfea..308fd0b6 100644 --- a/docs/index.md +++ b/docs/index.md @@ -13,105 +13,8 @@ C++ 17 [deprecated the standard library's `` header file](https://www.o There are broadly three ways to use the icubaby library: 1. [C++ 20 Range Adaptor](cxx20-range-adaptor.md) -2. An iterator interface -3. Manually driving the conversion - -### The Iterator Interface - -~~~cpp -auto const in = std::vector{char8_t{0xF0}, char8_t{0x9F}, char8_t{0x98}, char8_t{0x80}}; -std::vector out; -icubaby::t8_16 t; -auto it = icubaby::iterator{&t, std::back_inserter (out)}; -for (auto cu: in) { - *(it++) = cu; -} -it = t.end_cp (it); -~~~ - -The `icubaby::iterator<>` class offers a familiar output iterator for using a transcoder. Each code unit from the input encoding is written to the iterator and this writes the output encoding to a second iterator. This enables use to use standard algorithms such as [`std::copy`](https://en.cppreference.com/w/cpp/algorithm/copy) with the library. - -### Manually Driving the Conversion - -Let’s try converting a single Unicode emoji character 😀 (U+1F600 GRINNING FACE) expressed as four UTF-8 code units (0xF0, 0x9F, 0x98, 0x80) to UTF-16 (where it is the surrogate pair 0xD83D, 0xDE00). - -~~~cpp -std::vector out; -auto it = std::back_inserter (out); -icubaby::t8_16 t; -for (auto cu: {0xF0, 0x9F, 0x98, 0x80}) { - it = t (cu, it); -} -it = t.end_cp (it); -~~~ - -The `out` vector will contain a two UTF-16 code units 0xD83D and 0xDE00. - -#### Disecting this code - -1. Define where and how the output should be written: - - ~~~cpp - std::vector out; - auto it = std::back_inserter (out); - ~~~ - - For the purposes of this example, we write the encoded output to a `std::vector`. Use the container of your choice! - -2. Create the transcoder instance: - - ~~~cpp - icubaby::t8_16 t; - ~~~ - - [`transcoder<>`](#transcoder) is a template class which requires two arguments to define the input and output encoding. You may use `char8_t` (in C++ 20, or [`icubaby::char8`](#char8) in C++ 17 and later) for UTF-8, `char16_t` for UTF-16, and `char32_t` for UTF-32. For example, `icubaby::transcoder` will convert from UTF-16 to UTF-32; `icubaby::transcoder` will convert from UTF-8 to UTF-16. - - There is a collection of [nine typedefs](#helper-types) to make this a little more compact. Each is named `icubaby::t_I_O` where I and O are 8, 16, or 32. For example, `icubaby::t16_32` is equivalent to `icubaby::transcoder` and `icubaby::t8_16` means `icubaby::transcoder`. - -3. Pass each code unit and the output iterator to the transcoder. - - ~~~cpp - for (auto cu: {0xF0, 0x9F, 0x98, 0x80}) { - it = t (cu, it); - } - ~~~ - -4. Tell the transcoder that we’ve reached the end of the input. This ensures that the sequence didn’t end part way through a code point. - - ~~~cpp - it = t.end_cp (it); - ~~~ - - It’s only necessary to make a single call to `end_cp()` once *all* of the input has been fed to the transcoder. - -### An alternative: using icubaby::iterator - -The `icubaby::iterator<>` class is an output iterator to which code units in the source encoding can be assigned. This will produce equivalent code units in the output encoding which are written to a second output iterator. This make it straightforward to use standard library algorithms such as [`std::copy()`](https://en.cppreference.com/w/cpp/algorithm/copy) or [`std::ranges::copy()`](https://en.cppreference.com/w/cpp/algorithm/ranges/copy) with the library. - -For example: - -~~~cpp -std::vector out; -icubaby::t8_16 t; -auto it = icubaby::iterator{&t, std::back_inserter (out)}; -for (auto cu: {0xF0, 0x9F, 0x98, 0x80}) { - *(it++) = cu; -} -it = t.end_cp (it); -~~~ - -This code creates an instance of `icubaby::interator<>` named `it` which holds two values: a pointer to trancoder `t` and output interator (`std::back_insert_iterator` in this case). Assigning a series of code units from the input encoding to `it` result in the `out` vector being filled with equivalent code units in the output encoding. - -The above code snippet loops over the contents of the `in` array one code unit at a time. We can use `std::ranges::copy()` to achieve the same effect: - -~~~cpp -std::array const in {0xF0, 0x9F, 0x98, 0x80}; -std::vector out; -icubaby::t8_16 t; -auto it = std::ranges::copy (in, icubaby::iterator{&t, std::back_inserter (out)}).out; -it = t.end_cp (it); -~~~ - +2. [An iterator interface](iterator-interface.md). Enables use of iterator-based algorithms. +3. [Explicit Conversion](explicit-conversion.md). This drives the conversion one code-unit at a time. ## API From 667022175e0783571a3765db58a328ef1e485089 Mon Sep 17 00:00:00 2001 From: Paul Bowen-Huggett Date: Fri, 12 Jan 2024 18:22:54 +0100 Subject: [PATCH 2/2] Rephrase a sentence --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index f6a390d2..e9c544d6 100644 --- a/README.md +++ b/README.md @@ -20,7 +20,7 @@ C++ 17 [deprecated the standard library's `` header file](https://www.o ## Usage -There are broadly three ways to use the icubaby library: +There are three ways to use the icubaby library depending on your needs: 1. [C++ 20 Range Adaptor](#c-20-range-adaptor) 2. An iterator interface