Merge pull request #15 from paulhuggett/explicit-conversion

Move explicit conversion docs to their own page.
paulhuggett · Jan 12, 2024 · c0c4318 · c0c4318
2 parents f821631 + 6670221
commit c0c4318
Show file tree

Hide file tree

Showing 3 changed files with 58 additions and 141 deletions.
diff --git a/README.md b/README.md
@@ -20,11 +20,11 @@ C++ 17 [deprecated the standard library's `<codecvt>` header file](https://www.o
 
 ## Usage
 
-There are broadly three ways to use the icubaby library:
+There are three ways to use the icubaby library depending on your needs:
 
 1. [C++ 20 Range Adaptor](#c-20-range-adaptor)
 2. An iterator interface
-3. Manually driving the conversion
+3. [Converting one code-unit at a time](#converting-one-code-unit-at-a-time)
 
 ### C++ 20 Range Adaptor
 
@@ -54,7 +54,7 @@ it = t.end_cp (it);
 
 The `icubaby::iterator<>` class offers a familiar output iterator for using a transcoder. Each code unit from the input encoding is written to the iterator and this writes the output encoding to a second iterator. This enables use to use standard algorithms such as [`std::copy`](https://en.cppreference.com/w/cpp/algorithm/copy) with the library.
 
-### Manually Driving the Conversion
+### Converting One Code-Unit at a Time
 
 Let’s try converting a single Unicode emoji character 😀 (U+1F600 GRINNING FACE) expressed as four UTF-8 code units (0xF0, 0x9F, 0x98, 0x80) to UTF-16 (where it is the surrogate pair 0xD83D, 0xDE00).
 
@@ -68,45 +68,7 @@ for (auto cu: {0xF0, 0x9F, 0x98, 0x80}) {
 it = t.end_cp (it);
 ~~~
 
-The `out` vector will contain a two UTF-16 code units 0xD83D and 0xDE00.
-
-#### Disecting this code 
-
-1.  Define where and how the output should be written:
-
-    ~~~cpp
-    std::vector<char16_t> out;
-    auto it = std::back_inserter (out);
-    ~~~
-
-    For the purposes of this example, we write the encoded output to a `std::vector<char16_t>`. Use the container of your choice!
-
-2.  Create the transcoder instance:
-
-    ~~~cpp
-    icubaby::t8_16 t;
-    ~~~
-
-    [`transcoder<>`](#transcoder) is a template class which requires two arguments to define the input and output encoding. You may use `char8_t` (in C++ 20, or [`icubaby::char8`](#char8) in C++ 17 and later) for UTF-8, `char16_t` for UTF-16, and `char32_t` for UTF-32. For example, `icubaby::transcoder<char16_t, char32_t>` will convert from UTF-16 to UTF-32; `icubaby::transcoder<char8_t, char16_t>` will convert from UTF-8 to UTF-16.
-
-    There is a collection of [nine typedefs](#helper-types) to make this a little more compact. Each is named `icubaby::t_I_O` where I and O are 8, 16, or 32. For example, `icubaby::t16_32` is equivalent to `icubaby::transcoder<char16_t, char32_t>` and `icubaby::t8_16` means `icubaby::transcoder<char8_t, char16_t>`.
-
-3.  Pass each code unit and the output iterator to the transcoder.
-
-    ~~~cpp
-    for (auto cu: {0xF0, 0x9F, 0x98, 0x80}) {
-      it = t (cu, it);
-    }
-    ~~~
-
-4.  Tell the transcoder that we’ve reached the end of the input. This ensures that the sequence didn’t end part way through a code point.
-
-    ~~~cpp
-    it = t.end_cp (it);
-    ~~~
-
-    It’s only necessary to make a single call to `end_cp()` once *all* of the input has been fed to the transcoder.
-
+The `out` vector will contain a two UTF-16 code units 0xD83D and 0xDE00. See the [explicit conversion documentation](https://paulhuggett.github.io/icubaby/explicit-conversion.html) for more details.
 
 ## API
 

diff --git a/docs/explicit-conversion.md b/docs/explicit-conversion.md
@@ -0,0 +1,52 @@
+# Explicit Conversion
+
+Let’s try converting a single Unicode emoji character 😀 (U+1F600 GRINNING FACE) expressed as four UTF-8 code units (0xF0, 0x9F, 0x98, 0x80) to UTF-16 (where it is the surrogate pair 0xD83D, 0xDE00).
+
+~~~cpp
+std::vector<char16_t> out;
+auto it = std::back_inserter (out);
+icubaby::t8_16 t;
+for (auto cu: {0xF0, 0x9F, 0x98, 0x80}) {
+  it = t (cu, it);
+}
+it = t.end_cp (it);
+~~~
+
+The `out` vector will contain a two UTF-16 code units 0xD83D and 0xDE00.
+
+## Disecting this code
+
+1.  Define where and how the output should be written:
+
+    ~~~cpp
+    std::vector<char16_t> out;
+    auto it = std::back_inserter (out);
+    ~~~
+
+    For the purposes of this example, we write the encoded output to a `std::vector<char16_t>`. Use the container of your choice!
+
+2.  Create the transcoder instance:
+
+    ~~~cpp
+    icubaby::t8_16 t;
+    ~~~
+
+    [`transcoder<>`](#transcoder) is a template class which requires two arguments to define the input and output encoding. You may use `char8_t` (in C++ 20, or [`icubaby::char8`](#char8) in C++ 17 and later) for UTF-8, `char16_t` for UTF-16, and `char32_t` for UTF-32. For example, `icubaby::transcoder<char16_t, char32_t>` will convert from UTF-16 to UTF-32; `icubaby::transcoder<char8_t, char16_t>` will convert from UTF-8 to UTF-16.
+
+    There is a collection of [nine typedefs](#helper-types) to make this a little more compact. Each is named `icubaby::t_I_O` where I and O are 8, 16, or 32. For example, `icubaby::t16_32` is equivalent to `icubaby::transcoder<char16_t, char32_t>` and `icubaby::t8_16` means `icubaby::transcoder<char8_t, char16_t>`.
+
+3.  Pass each code unit and the output iterator to the transcoder.
+
+    ~~~cpp
+    for (auto cu: {0xF0, 0x9F, 0x98, 0x80}) {
+      it = t (cu, it);
+    }
+    ~~~
+
+4.  Tell the transcoder that we’ve reached the end of the input. This ensures that the sequence didn’t end part way through a code point.
+
+    ~~~cpp
+    it = t.end_cp (it);
+    ~~~
+
+    It’s only necessary to make a single call to `end_cp()` once *all* of the input has been fed to the transcoder.
diff --git a/docs/index.md b/docs/index.md
@@ -13,105 +13,8 @@ C++ 17 [deprecated the standard library's `<codecvt>` header file](https://www.o
 There are broadly three ways to use the icubaby library:
 
 1. [C++ 20 Range Adaptor](cxx20-range-adaptor.md)
-2. An iterator interface
-3. Manually driving the conversion
-
-### The Iterator Interface
-
-~~~cpp
-auto const in = std::vector{char8_t{0xF0}, char8_t{0x9F}, char8_t{0x98}, char8_t{0x80}};
-std::vector<char16_t> out;
-icubaby::t8_16 t;
-auto it = icubaby::iterator{&t, std::back_inserter (out)};
-for (auto cu: in) {
-  *(it++) = cu;
-}
-it = t.end_cp (it);
-~~~
-
-The `icubaby::iterator<>` class offers a familiar output iterator for using a transcoder. Each code unit from the input encoding is written to the iterator and this writes the output encoding to a second iterator. This enables use to use standard algorithms such as [`std::copy`](https://en.cppreference.com/w/cpp/algorithm/copy) with the library.
-
-### Manually Driving the Conversion
-
-Let’s try converting a single Unicode emoji character 😀 (U+1F600 GRINNING FACE) expressed as four UTF-8 code units (0xF0, 0x9F, 0x98, 0x80) to UTF-16 (where it is the surrogate pair 0xD83D, 0xDE00).
-
-~~~cpp
-std::vector<char16_t> out;
-auto it = std::back_inserter (out);
-icubaby::t8_16 t;
-for (auto cu: {0xF0, 0x9F, 0x98, 0x80}) {
-  it = t (cu, it);
-}
-it = t.end_cp (it);
-~~~
-
-The `out` vector will contain a two UTF-16 code units 0xD83D and 0xDE00.
-
-#### Disecting this code
-
-1.  Define where and how the output should be written:
-
-    ~~~cpp
-    std::vector<char16_t> out;
-    auto it = std::back_inserter (out);
-    ~~~
-
-    For the purposes of this example, we write the encoded output to a `std::vector<char16_t>`. Use the container of your choice!
-
-2.  Create the transcoder instance:
-
-    ~~~cpp
-    icubaby::t8_16 t;
-    ~~~
-
-    [`transcoder<>`](#transcoder) is a template class which requires two arguments to define the input and output encoding. You may use `char8_t` (in C++ 20, or [`icubaby::char8`](#char8) in C++ 17 and later) for UTF-8, `char16_t` for UTF-16, and `char32_t` for UTF-32. For example, `icubaby::transcoder<char16_t, char32_t>` will convert from UTF-16 to UTF-32; `icubaby::transcoder<char8_t, char16_t>` will convert from UTF-8 to UTF-16.
-
-    There is a collection of [nine typedefs](#helper-types) to make this a little more compact. Each is named `icubaby::t_I_O` where I and O are 8, 16, or 32. For example, `icubaby::t16_32` is equivalent to `icubaby::transcoder<char16_t, char32_t>` and `icubaby::t8_16` means `icubaby::transcoder<char8_t, char16_t>`.
-
-3.  Pass each code unit and the output iterator to the transcoder.
-
-    ~~~cpp
-    for (auto cu: {0xF0, 0x9F, 0x98, 0x80}) {
-      it = t (cu, it);
-    }
-    ~~~
-
-4.  Tell the transcoder that we’ve reached the end of the input. This ensures that the sequence didn’t end part way through a code point.
-
-    ~~~cpp
-    it = t.end_cp (it);
-    ~~~
-
-    It’s only necessary to make a single call to `end_cp()` once *all* of the input has been fed to the transcoder.
-
-### An alternative: using icubaby::iterator
-
-The `icubaby::iterator<>` class is an output iterator to which code units in the source encoding can be assigned. This will produce equivalent code units in the output encoding which are written to a second output iterator. This make it straightforward to use standard library algorithms such as [`std::copy()`](https://en.cppreference.com/w/cpp/algorithm/copy) or [`std::ranges::copy()`](https://en.cppreference.com/w/cpp/algorithm/ranges/copy) with the library.
-
-For example:
-
-~~~cpp
-std::vector<char16_t> out;
-icubaby::t8_16 t;
-auto it = icubaby::iterator{&t, std::back_inserter (out)};
-for (auto cu: {0xF0, 0x9F, 0x98, 0x80}) {
-  *(it++) = cu;
-}
-it = t.end_cp (it);
-~~~
-
-This code creates an instance of `icubaby::interator<>` named `it` which holds two values: a pointer to trancoder `t` and output interator (`std::back_insert_iterator` in this case). Assigning a series of code units from the input encoding to `it` result in the `out` vector being filled with equivalent code units in the output encoding.
-
-The above code snippet loops over the contents of the `in` array one code unit at a time. We can use `std::ranges::copy()` to achieve the same effect:
-
-~~~cpp
-std::array<char8_t, 4> const in {0xF0, 0x9F, 0x98, 0x80};
-std::vector<char16_t> out;
-icubaby::t8_16 t;
-auto it = std::ranges::copy (in, icubaby::iterator{&t, std::back_inserter (out)}).out;
-it = t.end_cp (it);
-~~~
-
+2. [An iterator interface](iterator-interface.md). Enables use of iterator-based algorithms.
+3. [Explicit Conversion](explicit-conversion.md). This drives the conversion one code-unit at a time.
 
 ## API