johnkerl · johnkerl · Feb 6, 2022 · Feb 6, 2022 · Feb 6, 2022
diff --git a/.vimrc b/.vimrc
@@ -1,4 +1,5 @@
 map \d :w<C-m>:!clear;echo Building ...; echo; make mlr<C-m>
 map \f :w<C-m>:!clear;echo Building ...; echo; make ut<C-m>
-map \r :w<C-m>:!clear;echo Building ...; echo; make ut-scan ut-mlv<C-m>
+"map \r :w<C-m>:!clear;echo Building ...; echo; make ut-scan ut-mlv<C-m>
+map \r :w<C-m>:!clear;echo Building ...; echo; make ut-lib<C-m>
 map \t :w<C-m>:!clear;go test github.com/johnkerl/miller/internal/pkg/transformers/...<C-m>
diff --git a/docs/src/file-formats.md b/docs/src/file-formats.md
@@ -104,36 +104,34 @@ NIDX: implicitly numerically indexed (Unix-toolkit style)
 
 When `mlr` is invoked with the `--csv` or `--csvlite` option, key names are found on the first record and values are taken from subsequent records.  This includes the case of CSV-formatted files.  See [Record Heterogeneity](record-heterogeneity.md) for how Miller handles changes of field names within a single data stream.
 
-Miller has record separator `RS` and field separator `FS`, just as `awk` does.  For TSV, use `--fs tab`; to convert TSV to CSV, use `--ifs tab --ofs comma`, etc.  (See also the [separators page](reference-main-separators.md).)
+Miller has record separator `RS` and field separator `FS`, just as `awk` does. (See also the [separators page](reference-main-separators.md).)
 
-**TSV (tab-separated values):** the following are synonymous pairs:
+**TSV (tab-separated values):** `FS` is tab and `RS` is newline (or carriage return + linefeed for
+Windows).  On input, if fields have `\r`, `\n`, `\t`, or `\\`, those are decoded as carriage return,
+newline, tab, and backslash, respectively. On output, the reverse is done -- for example, if a field
+has an embedded newline, that newline is replaced by `\n`.
 
-* `--tsv` and `--csv --fs tab`
-* `--itsv` and `--icsv --ifs tab`
-* `--otsv` and `--ocsv --ofs tab`
-* `--tsvlite` and `--csvlite --fs tab`
-* `--itsvlite` and `--icsvlite --ifs tab`
-* `--otsvlite` and `--ocsvlite --ofs tab`
+**ASV (ASCII-separated values):** the flags `--asv`, `--iasv`, `--oasv`, `--asvlite`, `--iasvlite`, and `--oasvlite` are analogous except they use ASCII FS and RS `0x1f` and `0x1e`, respectively.
 
-**ASV (ASCII-separated values):** the flags `--asv`, `--iasv`, `--oasv`, `--asvlite`, `--iasvlite`, and `--oasvlite` are analogous except they use ASCII FS and RS 0x1f and 0x1e, respectively.
-
-**USV (Unicode-separated values):** likewise, the flags `--usv`, `--iusv`, `--ousv`, `--usvlite`, `--iusvlite`, and `--ousvlite` use Unicode FS and RS U+241F (UTF-8 0x0xe2909f) and U+241E (UTF-8 0xe2909e), respectively.
+**USV (Unicode-separated values):** likewise, the flags `--usv`, `--iusv`, `--ousv`, `--usvlite`, `--iusvlite`, and `--ousvlite` use Unicode FS and RS `U+241F` (UTF-8 `0x0xe2909f`) and `U+241E` (UTF-8 `0xe2909e`), respectively.
 
 Miller's `--csv` flag supports [RFC-4180 CSV](https://tools.ietf.org/html/rfc4180). This includes CRLF line-terminators by default, regardless of platform.
 
 Here are the differences between CSV and CSV-lite:
 
+* CSV-lite naively splits lines on newline, and fields on comma -- embedded commas and newlines are not escaped in any way.
+
 * CSV supports [RFC-4180](https://tools.ietf.org/html/rfc4180)-style double-quoting, including the ability to have commas and/or LF/CRLF line-endings contained within an input field; CSV-lite does not.
 
 * CSV does not allow heterogeneous data; CSV-lite does (see also [Record Heterogeneity](record-heterogeneity.md)).
 
-* The CSV-lite input-reading code is fractionally more efficient than the CSV input-reader.
+* TSV-lite is simply CSV-lite with field separator set to tab instead of comma.
 
-Here are things they have in common:
+* CSV-lite allows changing FS and/or RS to any values, perhaps multi-character.
 
-* The ability to specify record/field separators other than the default, e.g. CR-LF vs. LF, or tab instead of comma for TSV, and so on.
+* In short, use-cases for CSV-lite and TSV-lite are often found when dealing with CSV/TSV files which are formatted in some non-standard way -- you have a little more flexibility available to you. (As an example of this flexibility: ASV and USV are nothing more than CSV-lite with different values for FS and RS.)
 
-* The `--implicit-csv-header` flag for input and the `--headerless-csv-output` flag for output.
+CSV, TSV, CSV-lite, and TSV-lite have in common the `--implicit-csv-header` flag for input and the `--headerless-csv-output` flag for output.
 
 ## JSON
 

diff --git a/docs/src/file-formats.md.in b/docs/src/file-formats.md.in
@@ -16,36 +16,34 @@ GENMD-EOF
 
 When `mlr` is invoked with the `--csv` or `--csvlite` option, key names are found on the first record and values are taken from subsequent records.  This includes the case of CSV-formatted files.  See [Record Heterogeneity](record-heterogeneity.md) for how Miller handles changes of field names within a single data stream.
 
-Miller has record separator `RS` and field separator `FS`, just as `awk` does.  For TSV, use `--fs tab`; to convert TSV to CSV, use `--ifs tab --ofs comma`, etc.  (See also the [separators page](reference-main-separators.md).)
+Miller has record separator `RS` and field separator `FS`, just as `awk` does. (See also the [separators page](reference-main-separators.md).)
 
-**TSV (tab-separated values):** the following are synonymous pairs:
+**TSV (tab-separated values):** `FS` is tab and `RS` is newline (or carriage return + linefeed for
+Windows).  On input, if fields have `\r`, `\n`, `\t`, or `\\`, those are decoded as carriage return,
+newline, tab, and backslash, respectively. On output, the reverse is done -- for example, if a field
+has an embedded newline, that newline is replaced by `\n`.
 
-* `--tsv` and `--csv --fs tab`
-* `--itsv` and `--icsv --ifs tab`
-* `--otsv` and `--ocsv --ofs tab`
-* `--tsvlite` and `--csvlite --fs tab`
-* `--itsvlite` and `--icsvlite --ifs tab`
-* `--otsvlite` and `--ocsvlite --ofs tab`
+**ASV (ASCII-separated values):** the flags `--asv`, `--iasv`, `--oasv`, `--asvlite`, `--iasvlite`, and `--oasvlite` are analogous except they use ASCII FS and RS `0x1f` and `0x1e`, respectively.
 
-**ASV (ASCII-separated values):** the flags `--asv`, `--iasv`, `--oasv`, `--asvlite`, `--iasvlite`, and `--oasvlite` are analogous except they use ASCII FS and RS 0x1f and 0x1e, respectively.
-
-**USV (Unicode-separated values):** likewise, the flags `--usv`, `--iusv`, `--ousv`, `--usvlite`, `--iusvlite`, and `--ousvlite` use Unicode FS and RS U+241F (UTF-8 0x0xe2909f) and U+241E (UTF-8 0xe2909e), respectively.
+**USV (Unicode-separated values):** likewise, the flags `--usv`, `--iusv`, `--ousv`, `--usvlite`, `--iusvlite`, and `--ousvlite` use Unicode FS and RS `U+241F` (UTF-8 `0x0xe2909f`) and `U+241E` (UTF-8 `0xe2909e`), respectively.
 
 Miller's `--csv` flag supports [RFC-4180 CSV](https://tools.ietf.org/html/rfc4180). This includes CRLF line-terminators by default, regardless of platform.
 
 Here are the differences between CSV and CSV-lite:
 
+* CSV-lite naively splits lines on newline, and fields on comma -- embedded commas and newlines are not escaped in any way.
+
 * CSV supports [RFC-4180](https://tools.ietf.org/html/rfc4180)-style double-quoting, including the ability to have commas and/or LF/CRLF line-endings contained within an input field; CSV-lite does not.
 
 * CSV does not allow heterogeneous data; CSV-lite does (see also [Record Heterogeneity](record-heterogeneity.md)).
 
-* The CSV-lite input-reading code is fractionally more efficient than the CSV input-reader.
+* TSV-lite is simply CSV-lite with field separator set to tab instead of comma.
 
-Here are things they have in common:
+* CSV-lite allows changing FS and/or RS to any values, perhaps multi-character.
 
-* The ability to specify record/field separators other than the default, e.g. CR-LF vs. LF, or tab instead of comma for TSV, and so on.
+* In short, use-cases for CSV-lite and TSV-lite are often found when dealing with CSV/TSV files which are formatted in some non-standard way -- you have a little more flexibility available to you. (As an example of this flexibility: ASV and USV are nothing more than CSV-lite with different values for FS and RS.)
 
-* The `--implicit-csv-header` flag for input and the `--headerless-csv-output` flag for output.
+CSV, TSV, CSV-lite, and TSV-lite have in common the `--implicit-csv-header` flag for input and the `--headerless-csv-output` flag for output.
 
 ## JSON
 

diff --git a/docs/src/keystroke-savers.md b/docs/src/keystroke-savers.md
@@ -92,11 +92,11 @@ If there's more than one input file, you can use `--mfrom`, then however many fi
 The following have even shorter versions:
 
 * `-c` is the same as `--csv`
-* `-t` is the same as `--tsvlite`
+* `-t` is the same as `--tsv`
 * `-j` is the same as `--json`
 
 I don't use these within these documents, since I want the docs to be self-explanatory on every page, and
-I think `mlr --csv ...` explains itself better than `mlr -c ...`. Nonetheless, they're there for you to use.
+I think `mlr --csv ...` explains itself better than `mlr -c ...`. Nonetheless, they're always there for you to use.
 
 ## .mlrrc file
 

diff --git a/docs/src/keystroke-savers.md.in b/docs/src/keystroke-savers.md.in
@@ -37,11 +37,11 @@ GENMD-EOF
 The following have even shorter versions:
 
 * `-c` is the same as `--csv`
-* `-t` is the same as `--tsvlite`
+* `-t` is the same as `--tsv`
 * `-j` is the same as `--json`
 
 I don't use these within these documents, since I want the docs to be self-explanatory on every page, and
-I think `mlr --csv ...` explains itself better than `mlr -c ...`. Nonetheless, they're there for you to use.
+I think `mlr --csv ...` explains itself better than `mlr -c ...`. Nonetheless, they're always there for you to use.
 
 ## .mlrrc file
 

diff --git a/docs/src/manpage.md b/docs/src/manpage.md
@@ -386,7 +386,7 @@ FILE-FORMAT FLAGS
        --oxtab                  Use XTAB format for output data.
        --pprint                 Use PPRINT format for input and output data.
        --tsv                    Use TSV format for input and output data.
-       --tsvlite or -t          Use TSV-lite format for input and output data.
+       --tsv or -t              Use TSV-lite format for input and output data.
        --usv or --usvlite       Use USV format for input and output data.
        --xtab                   Use XTAB format for input and output data.
        -i {format name}         Use format name for input data. For example: `-i csv`
@@ -708,7 +708,6 @@ SEPARATOR FLAGS
          alignment impossible.
        * OPS may be multi-character for XTAB format, in which case alignment is
          disabled.
-       * TSV is simply CSV using tab as field separator (`--fs tab`).
        * FS/PS are ignored for markdown format; RS is used.
        * All FS and PS options are ignored for JSON format, since they are not relevant
          to the JSON format.
@@ -763,6 +762,7 @@ SEPARATOR FLAGS
                markdown " "    N/A    "\n"
                nidx     " "    N/A    "\n"
                pprint   " "    N/A    "\n"
+               tsv      "  "    N/A    "\n"
                xtab     "\n"   " "    "\n\n"
 
        --fs {string}            Specify FS for input and output.
@@ -3157,5 +3157,5 @@ SEE ALSO
 
 
 
-                                  2022-02-05                         MILLER(1)
+                                  2022-02-06                         MILLER(1)
 </pre>
diff --git a/docs/src/manpage.txt b/docs/src/manpage.txt
@@ -365,7 +365,7 @@ FILE-FORMAT FLAGS
        --oxtab                  Use XTAB format for output data.
        --pprint                 Use PPRINT format for input and output data.
        --tsv                    Use TSV format for input and output data.
-       --tsvlite or -t          Use TSV-lite format for input and output data.
+       --tsv or -t              Use TSV-lite format for input and output data.
        --usv or --usvlite       Use USV format for input and output data.
        --xtab                   Use XTAB format for input and output data.
        -i {format name}         Use format name for input data. For example: `-i csv`
@@ -687,7 +687,6 @@ SEPARATOR FLAGS
          alignment impossible.
        * OPS may be multi-character for XTAB format, in which case alignment is
          disabled.
-       * TSV is simply CSV using tab as field separator (`--fs tab`).
        * FS/PS are ignored for markdown format; RS is used.
        * All FS and PS options are ignored for JSON format, since they are not relevant
          to the JSON format.
@@ -742,6 +741,7 @@ SEPARATOR FLAGS
                markdown " "    N/A    "\n"
                nidx     " "    N/A    "\n"
                pprint   " "    N/A    "\n"
+               tsv      "  "    N/A    "\n"
                xtab     "\n"   " "    "\n\n"
 
        --fs {string}            Specify FS for input and output.
@@ -3136,4 +3136,4 @@ SEE ALSO
 
 
 
-                                  2022-02-05                         MILLER(1)
+                                  2022-02-06                         MILLER(1)
diff --git a/docs/src/reference-main-flag-list.md b/docs/src/reference-main-flag-list.md
@@ -177,7 +177,7 @@ are overridden in all cases by setting output format to `format2`.
 * `--oxtab`: Use XTAB format for output data.
 * `--pprint`: Use PPRINT format for input and output data.
 * `--tsv`: Use TSV format for input and output data.
-* `--tsvlite or -t`: Use TSV-lite format for input and output data.
+* `--tsv`: Use TSV format for input and output data.
 * `--usv or --usvlite`: Use USV format for input and output data.
 * `--xtab`: Use XTAB format for input and output data.
 * `-i {format name}`: Use format name for input data. For example: `-i csv` is the same as `--icsv`.
@@ -405,7 +405,6 @@ Notes about all other separators:
   alignment impossible.
 * OPS may be multi-character for XTAB format, in which case alignment is
   disabled.
-* TSV is simply CSV using tab as field separator (`--fs tab`).
 * FS/PS are ignored for markdown format; RS is used.
 * All FS and PS options are ignored for JSON format, since they are not relevant
   to the JSON format.
@@ -460,6 +459,7 @@ Notes about all other separators:
         markdown " "    N/A    "\n"
         nidx     " "    N/A    "\n"
         pprint   " "    N/A    "\n"
+        tsv      "	"    N/A    "\n"
         xtab     "\n"   " "    "\n\n"
 
 

diff --git a/docs/src/reference-main-separators.md b/docs/src/reference-main-separators.md
@@ -261,8 +261,9 @@ a:4;b:5;c:6;d:>>>,|||;<<<
 
 Notes:
 
-* If CSV field separator is tab, we have TSV; see more examples (ASV, USV, etc.) at in the [CSV section](file-formats.md#csvtsvasvusvetc).
 * CSV IRS and ORS must be newline, and CSV IFS must be a single character. (CSV-lite does not have these restrictions.)
+* TSV IRS and ORS must be newline, and TSV IFS must be a tab. (TSV-lite does not have these restrictions.)
+* See the [CSV section](file-formats.md#csvtsvasvusvetc) for information about ASV and USV.
 * JSON: ignores all separator flags from the command line.
 * Headerless CSV overlaps quite a bit with NIDX format using comma for IFS. See also the page on [CSV with and without headers](csv-with-and-without-headers.md).
 * For XTAB, the record separator is a repetition of the field separator. For example, if one record has `x=1,y=2` and the next has `x=3,y=4`, and OFS is newline, then output lines are `x 1`, then `y 2`, then an extra newline, then `x 3`, then `y 4`. This means: to customize XTAB, set `OFS` rather than `ORS`.

diff --git a/docs/src/reference-main-separators.md.in b/docs/src/reference-main-separators.md.in
@@ -151,8 +151,9 @@ GENMD-EOF
 
 Notes:
 
-* If CSV field separator is tab, we have TSV; see more examples (ASV, USV, etc.) at in the [CSV section](file-formats.md#csvtsvasvusvetc).
 * CSV IRS and ORS must be newline, and CSV IFS must be a single character. (CSV-lite does not have these restrictions.)
+* TSV IRS and ORS must be newline, and TSV IFS must be a tab. (TSV-lite does not have these restrictions.)
+* See the [CSV section](file-formats.md#csvtsvasvusvetc) for information about ASV and USV.
 * JSON: ignores all separator flags from the command line.
 * Headerless CSV overlaps quite a bit with NIDX format using comma for IFS. See also the page on [CSV with and without headers](csv-with-and-without-headers.md).
 * For XTAB, the record separator is a repetition of the field separator. For example, if one record has `x=1,y=2` and the next has `x=3,y=4`, and OFS is newline, then output lines are `x 1`, then `y 2`, then an extra newline, then `x 3`, then `y 4`. This means: to customize XTAB, set `OFS` rather than `ORS`.