Skip to content

Commit

Permalink
Fix: multiple documentation tweaks (#949)
Browse files Browse the repository at this point in the history
* Fix: multiple documentation tweaks

* Fix: undo two lines removal

* Fix: recover the lost Oxford commas
  • Loading branch information
Poshi authored Feb 18, 2022
1 parent 3a7f67f commit 8aecba3
Show file tree
Hide file tree
Showing 16 changed files with 41 additions and 34 deletions.
4 changes: 2 additions & 2 deletions docs/src/file-formats.md.in
Original file line number Diff line number Diff line change
Expand Up @@ -317,15 +317,15 @@ GENMD-EOF
While you can do format conversion using `mlr --icsv --ojson cat myfile.csv`, there are also keystroke-savers for this purpose, such as `mlr --c2j cat myfile.csv`. For a complete list:

GENMD-RUN-COMMAND
mlr help format-conversion
mlr help format-conversion-keystroke-saver-flags
GENMD-EOF

## Comments in data

You can include comments within your data files, and either have them ignored, or passed directly through to the standard output as soon as they are encountered:

GENMD-RUN-COMMAND
mlr help comments-in-data
mlr help comments-in-data-flags
GENMD-EOF

Examples:
Expand Down
4 changes: 4 additions & 0 deletions docs/src/genmd-filter
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,10 @@ def main
lines = read_until_genmd_eof(input_handle)
write_card([], lines, output_handle)

elsif content_line =~ /^GENMD-CARDIFY-HIGHLIGHT-ALL$/
lines = read_until_genmd_eof(input_handle)
write_card(lines, [], output_handle)

elsif content_line =~ /^GENMD-CARDIFY-HIGHLIGHT-ONE$/
lines = read_until_genmd_eof(input_handle)
line1 = lines.shift
Expand Down
28 changes: 17 additions & 11 deletions docs/src/glossary.md.in
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,7 @@ _delimiter_ can be used as a synonym for [_separator_](#separator).

## division

Miller uses [pythonic division](http://127.0.0.1:8000/reference-main-arithmetic.md#pythonic-division)
Miller uses [pythonic division](reference-main-arithmetic.md#pythonic-division)
for quotients of integers, with the exception that integer divided by integer
is integer (not float) if the quotient can be represented exactly as an
integer.
Expand Down Expand Up @@ -196,7 +196,8 @@ See also the [emit-statements section](reference-dsl-output-statements.md#emit-s
## empty

Refers to the string with zero characters. For example, in a CSV file with [header line](#header)
`a,b,c` and data
`a,b,c` and data `,,` the three fields are empty; with data `1,2,` the first two fields (`a` and `b`)
are not empty and the third field `c` is empty.

## end

Expand Down Expand Up @@ -224,7 +225,7 @@ Same as [`printn`](#printn), except it prints to [stderr](#stderr) rather than [
## false

A [keyword](#keyword) in the [Miller programming language](miller-programming-language.md) for the
boolean literal; signified by `true` in Python; in some languages (such as C)
boolean literal; signified by `False` in Python; in some languages (such as C)
signified by the zero integer value.

## field
Expand Down Expand Up @@ -319,15 +320,15 @@ to write your own functions.

## function literal

A function without a name, like `func(a,b) { return a + 2*b + 7}`, assigned to
A function without a name, like `func(a,b) {return a + 2*b + 7}`, assigned to
a local variable or passed to a [higher-order
function](reference-dsl-higher-order-functions.md) like `apply` or `sort`. See
the [section on function literals](reference-dsl-user-defined-functions.md#function-literals).

## GZIP / .gz

A [data-compression format supported by Miller](reference-main-compressed-data.md).
Files compressed using GZIP compression normally end in`.gz`.
Files compressed using GZIP compression normally end in `.gz`.

## hashmap

Expand All @@ -346,8 +347,8 @@ and the [Miller CSV section](file-formats.md#csvtsvasvusvetc).

## heterogeneity

Referring to data where all records have the same keys, in the same order. See the
[record-heterogeneity page](record-heterogeneity.md#homogeneousrectangular-data).
Referring to data where not all records have the same keys, in the same order. See the
[record-heterogeneity page](record-heterogeneity.md#ragged-data).

## higher-order function

Expand All @@ -356,6 +357,11 @@ A function which takes another function as an argument, such as
[`apply`](reference-dsl-builtin-functions.md#apply). See the [page on
higher-order functions](reference-dsl-higher-order-functions.md).

## homogeneity

Referring to data where all records have the same keys, in the same order. See the
[record-heterogeneity page](record-heterogeneity.md#homogeneousrectangular-data).

## if

A [keyword](#keyword) which is used to indicate the start of an [if-statement](reference-dsl-control-structures.md)
Expand Down Expand Up @@ -631,7 +637,7 @@ See also the [Miller command structure page](reference-main-overview.md).
## rectangular

Referring to data where all records have the same keys, in the same order. Synonymous
with [heterogeneous](#heterogeneity). See the
with [homogeneous](#homogeneity). See the
[record-heterogeneity page](record-heterogeneity.md#homogeneousrectangular-data).

## REPL
Expand Down Expand Up @@ -678,14 +684,14 @@ page](record-heterogeneity.md#sparse-data).

A [keyword](#keyword) in the
[Miller programming language](miller-programming-language.md)
for [print, dump, and tee statements](http://127.0.0.1:8000/reference-dsl-output-statements.md#tee-statements)
indicating that data are to be sent to the [_standard output_](https://en.wikipedia.org/wiki/Standard_streams#Standard_output_(stderr)).
for [print, dump, and tee statements](reference-dsl-output-statements.md#tee-statements)
indicating that data are to be sent to the [_standard error_](https://en.wikipedia.org/wiki/Standard_streams#Standard_error_(stderr)).

## stdout

A [keyword](#keyword) in the
[Miller programming language](miller-programming-language.md)
for [print, dump, and tee statements](http://127.0.0.1:8000/reference-dsl-output-statements.md#tee-statements)
for [print, dump, and tee statements](reference-dsl-output-statements.md#tee-statements)
indicating that data are to be sent to the [_standard output_](https://en.wikipedia.org/wiki/Standard_streams#Standard_output_(stdout)).

## str
Expand Down
2 changes: 1 addition & 1 deletion docs/src/misc-examples.md.in
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ GENMD-EOF

Iterate over data using DSL expressions:

GENMD-CARDIFY-HIGHLIGHT-ONE
GENMD-CARDIFY-HIGHLIGHT-ALL
mlr --from estimates.tbl put '
for (k,v in $*) {
if (is_numeric(v) && k =~ "^[t-z].*$") {
Expand Down
2 changes: 1 addition & 1 deletion docs/src/new-in-miller-6.md.in
Original file line number Diff line number Diff line change
Expand Up @@ -268,7 +268,7 @@ IFS and IPS can be regular expressions now. Please see the section on [multi-cha
### Type-inference

* The `-S` and `-F` flags to `mlr put` and `mlr filter` are ignored, since type-inference is no longer done in `mlr put` and `mlr filter`, but rather, when records are first read. You can use `mlr -S` and `mlr -A`, respectively, instead to control type-inference within the record-readers.
* Octal numbers like `0123` and `07` are type-inferred as string. Use `mlr -O` to infer them as octal integers. Note that `08` and `09` will then infer as deicmal integers.
* Octal numbers like `0123` and `07` are type-inferred as string. Use `mlr -O` to infer them as octal integers. Note that `08` and `09` will then infer as decimal integers.
* Any numbers prefix with `0o`, e.g. `0o377`, are already treated as octal regardless of `mlr -O` -- `mlr -O` only affects how leading-zero integers are handled.
* See also the [miscellaneous-flags reference](reference-main-flag-list.md#miscellaneous-flags).

Expand Down
1 change: 0 additions & 1 deletion docs/src/polyglot-dkvp-io/dkvp_io.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,6 @@ def dkvpline2map(line, ips, ifs):
# ----------------------------------------------------------------
# ops and ofs (output pair separator and output field separator) are nominally '=' and ','.
def map2dkvpline(map , ops, ofs):
line = ''
pairs = []
for key in map:
pairs.append(str(key) + ops + str(map[key]))
Expand Down
4 changes: 2 additions & 2 deletions docs/src/record-heterogeneity.md.in
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ GENMD-EOF
This example is still homogeneous, though: every row has the same keys, in the same order: `a,b,c`.
Empty values don't make the data heterogeneous.

Note however that we can use the [`fill-down`](reference-verbs.md#fill-empty) verb to make these
Note however that we can use the [`fill-empty`](reference-verbs.md#fill-empty) verb to make these
values non-empty, if we like:

GENMD-RUN-COMMAND
Expand All @@ -65,7 +65,7 @@ GENMD-RUN-COMMAND
cat data/het/ragged.csv
GENMD-EOF

If you `mlr csv cat` this, you'll get an error message:
If you `mlr --csv cat` this, you'll get an error message:

GENMD-RUN-COMMAND-TOLERATING-ERROR
mlr --csv cat data/het/ragged.csv
Expand Down
2 changes: 1 addition & 1 deletion docs/src/reference-dsl-operators.md.in
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ for information on how to examine operator precedence interactively.

* The [`min`](reference-dsl-builtin-functions.md#min) and [`max`](reference-dsl-builtin-functions.md#max) functions are different from other multi-argument functions which return null if any of their inputs are null: for [`min`](reference-dsl-builtin-functions.md#min) and [`max`](reference-dsl-builtin-functions.md#max), by contrast, if one argument is absent-null, the other is returned. Empty-null loses min or max against numeric or boolean; empty-null is less than any other string.

* Symmetrically with respect to the bitwise OR, XOR, and AND operators
* Symmetrically with respect to the bitwise OR, AND, and XOR operators
[`|`](reference-dsl-builtin-functions.md#bitwise-or),
[`&`](reference-dsl-builtin-functions.md#bitwise-and), and
[`^`](reference-dsl-builtin-functions.md#bitwise-xor), Miller has logical operators
Expand Down
5 changes: 2 additions & 3 deletions docs/src/reference-dsl-output-statements.md.in
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,14 @@ You can **output** variable-values or expressions in **five ways**:

* **Assign** them to stream-record fields. For example, `$cumulative_sum = @sum`. For another example, `$nr = NR` adds a field named `nr` to each output record, containing the value of the built-in variable `NR` as of when that record was ingested.

* Use **emit1**/**emit**/**emitp**/**emitf** to send out-of-stream variables' current values to the output record stream, e.g. `@sum += $x; emit1 @sum` which produces an extra record such as `sum=3.1648382`. These records, just like records from input file(s), participate in downstream [then-chaining](reference-main-then-chaining.md) to other verbs.

* Use the **print** or **eprint** keywords which immediately print an expression *directly to standard output or standard error*, respectively. Note that `dump`, `edump`, `print`, and `eprint` don't output records which participate in `then`-chaining; rather, they're just immediate prints to stdout/stderr. The `printn` and `eprintn` keywords are the same except that they don't print final newlines. Additionally, you can print to a specified file instead of stdout/stderr.

* Use the **dump** or **edump** keywords, which *immediately print all out-of-stream variables as a JSON data structure to the standard output or standard error* (respectively).

* Use **tee** which formats the current stream record (not just an arbitrary string as with **print**) to a specific file.

* Use **emit1**/**emit**/**emitp**/**emitf** to send out-of-stream variables' current values to the output record stream, e.g. `@sum += $x; emit1 @sum` which produces an extra record such as `sum=3.1648382`. These records, just like records from input file(s), participate in downstream [then-chaining](reference-main-then-chaining.md) to other verbs.

For the first two options you are populating the output-records stream which feeds into the next verb in a `then`-chain (if any), or which otherwise is formatted for output using `--o...` flags.

For the last three options you are sending output directly to standard output, standard error, or a file.
Expand Down Expand Up @@ -342,4 +342,3 @@ mlr --from data/small --opprint put -q '
end{emit (@sum, @count),"a","b"}
'
GENMD-EOF

4 changes: 2 additions & 2 deletions docs/src/reference-dsl-user-defined-functions.md.in
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ the role of subroutine quite well.

If you have a file with UDFs you use frequently, say `my-udfs.mlr`, you can use
`--load` or `--mload` to define them for your Miller scripts. For example, in
your shell,
your shell,

GENMD-CARDIFY-HIGHLIGHT-ONE
alias mlr='mlr --load ~/my-functions.mlr'
Expand All @@ -111,7 +111,7 @@ See the [miscellaneous-flags page](reference-main-flag-list.md#miscellaneous-fla

## Function literals

You can define unmnamed functions and assign the to variables, or pass them to functions.
You can define unnamed functions and assign them to variables, or pass them to functions.

See also the [page on higher-order functions](reference-dsl-higher-order-functions.md)
for more information on
Expand Down
7 changes: 3 additions & 4 deletions docs/src/reference-dsl-variables.md.in
Original file line number Diff line number Diff line change
Expand Up @@ -164,7 +164,7 @@ GENMD-EOF

## Local variables

Local variables are similar to out-of-stream variables, except that their extent is limited to the expressions in which they appear (and their basenames can't be computed using square brackets). There are three kinds of local variables: **arguments** to functions/subroutines, **variables bound within for-loops**, and **locals** defined within control blocks. They may be untyped using `var`, or typed using `num`, `int`, `float`, `str`, `bool`, and `map`.
Local variables are similar to out-of-stream variables, except that their extent is limited to the expressions in which they appear (and their basenames can't be computed using square brackets). There are three kinds of local variables: **arguments** to functions/subroutines, **variables bound within for-loops**, and **locals** defined within control blocks. They may be untyped using `var`, or typed using `num`, `int`, `float`, `str`, `bool`, `arr`, and `map`.

For example:

Expand Down Expand Up @@ -203,7 +203,7 @@ Things which are completely unsurprising, resembling many other languages:

Things which are perhaps surprising compared to other languages:

* Type declarations using `var`, or typed using `num`, `int`, `float`, `str`, and `bool` are not necessary to declare local variables. Function arguments and variables bound in for-loops over stream records and out-of-stream variables are *implicitly* declared using `var`. (Some examples are shown below.)
* Type declarations using `var`, or typed using `num`, `int`, `float`, `str`, `arr`, and `bool` are not necessary to declare local variables. Function arguments and variables bound in for-loops over stream records and out-of-stream variables are *implicitly* declared using `var`. (Some examples are shown below.)

* Type-checking is done at assignment time. For example, `float f = 0` is an error (since `0` is an integer), as is `float f = 0.0; f = 1`. For this reason I prefer to use `num` over `float` in most contexts since `num` encompasses integer and floating-point values. More information is at [Type-checking](reference-dsl-variables.md#type-checking).

Expand Down Expand Up @@ -352,7 +352,7 @@ See [Data-cleaning Examples](data-cleaning-examples.md) for examples of how to u

### Type-declarations for local variables, function parameter, and function return values

Local variables can be defined either untyped as in `x = 1`, or typed as in `int x = 1`. Types include **var** (explicitly untyped), **int**, **float**, **num** (int or float), **str**, **bool**, and **map**. These optional type declarations are enforced at the time values are assigned to variables: whether at the initial value assignment as in `int x = 1` or in any subsequent assignments to the same variable farther down in the scope.
Local variables can be defined either untyped as in `x = 1`, or typed as in `int x = 1`. Types include **var** (explicitly untyped), **int**, **float**, **num** (int or float), **str**, **bool**, **arr**, and **map**. These optional type declarations are enforced at the time values are assigned to variables: whether at the initial value assignment as in `int x = 1` or in any subsequent assignments to the same variable farther down in the scope.

The reason for `num` is that `int` and `float` typedecls are very precise:

Expand Down Expand Up @@ -472,4 +472,3 @@ GENMD-EOF
GENMD-RUN-COMMAND
mlr help usage-keywords # you can also use mlr -K
GENMD-EOF

2 changes: 1 addition & 1 deletion docs/src/reference-main-compressed-data.md.in
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ allowed as they could be used for unexpected code execution. You can use

Note that this feature is quite general and is not limited to decompression
utilities. You can use it to apply per-file filters of your choice: e.g. `mlr
--prepipe head -n 10 ...`, if you like.
--prepipe 'head -n 10' ...`, if you like.

There is a `--prepipe` and a `--prepipex`:

Expand Down
2 changes: 1 addition & 1 deletion docs/src/sorting.md.in
Original file line number Diff line number Diff line change
Expand Up @@ -290,7 +290,7 @@ cat data/sortaf-example.csv
GENMD-EOF

In the following example we sort data in several ways -- the first two just
recaptiulate (for reference) what `sort` with default flags already does; the third is novel:
recapitulate (for reference) what `sort` with default flags already does; the third is novel:

GENMD-RUN-COMMAND
mlr --icsv --ojson --from data/sortaf-example.csv put '
Expand Down
2 changes: 1 addition & 1 deletion docs/src/streaming-and-memory.md.in
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ much data as needed. For example, the [`sort`](reference-verbs.md#sort) and
before emitting any -- the last input record may well end up being the first
one to be emitted.

[`stats1`](reference-verbs.md#stats1) Other verbs, such as
Other verbs, such as
[`tail`](reference-verbs.md#tail) and [`top`](reference-verbs.md#top), need to
retain only a fixed number of records -- 10, perhaps, even if the input data
has a million records.
Expand Down
2 changes: 1 addition & 1 deletion internal/pkg/dsl/cst/builtin_function_manager.go
Original file line number Diff line number Diff line change
Expand Up @@ -1666,7 +1666,7 @@ information.`,
{
name: "get_values",
class: FUNC_CLASS_COLLECTIONS,
help: "Returns array of keys of map or array -- in the latter case, returns a copy of the array",
help: "Returns array of values of map or array -- in the latter case, returns a copy of the array",
unaryFunc: bifs.BIF_get_values,
},

Expand Down
4 changes: 2 additions & 2 deletions internal/pkg/dsl/cst/keyword_usage.go
Original file line number Diff line number Diff line change
Expand Up @@ -182,8 +182,8 @@ func dumpKeywordUsage() {
`prints all currently defined out-of-stream variables immediately
to stdout as JSON.
With >, >>, or |, the data do not become part of the output record stream but
are instead redirected.
With >, >>, or |, the data do not go directly to stdout but are instead
redirected.
The > and >> are for write and append, as in the shell, but (as with awk) the
file-overwrite for > is on first write, not per record. The | is for piping to
Expand Down

0 comments on commit 8aecba3

Please sign in to comment.