Skip to content
Marcel Schmalzl edited this page Oct 11, 2024 · 8 revisions

AWK

Great resources:




This page is still work in progress

awk is a powerful line-by-line text processor.

There exists some flavours:

  • AWK - original from AT&T
  • NAWK - A newer, improved version from AT&T
  • GAWK - GNU AWK (from the Free Software foundation)

This article will cover gawk.

The documentation from GNU Awk is really good!

General

Basic syntax

pattern { action }
pattern { action }
pattern { action }
...

A pattern usually matches if parts of a line match (this can be processed later if needed) => aka. a record.

/Hello/
# ==
/Hello/ {print}
# ==
/Hello/ {print $0}

Default Behavior:

  • Pattern: If it matches, the entire line is printed
  • No pattern provided: Every line is printed

Separation

Default separator = whitespaces => aka. a field.

echo "hello world" | awk '{print $2}'
# prints: `world` ($0 = whole line, $1 = first column considering separation, ...)

Changing the separator:

echo "one|two|three" | awk -F| {print $2}'
# or
echo "one|two|three" | awk 'BEGIN {FS="|"} {print $2}'

Note: The separator used RegExes (regular expressions); if you want to separate for reserved regex characters you must escape them (via \; e.g. for . -> \.).

FS variable works in scripts as well:

# test_sep1.awk
# BEGIN block (actions before processing)
BEGIN {
    FS = "|"
}

# Main block
{
    print $2
}

# Shown for illustration; can be omitted since empty:
END {
}

Execute (aka run the awk file): echo "one|two|three" | awk -f test_sep1.awk or awk -f test_sep1.awk input.txt

Variables

  • FS: Field separator (default: whitespace)
  • OFS: Output field separator (default: space)
  • RS: Record separator (default: newline)
  • ORS: Output record separator (default is a newline)
  • NR: Number of records processed so far
  • NF: Number of fields in the current record
  • $0: The entire current record
  • $1, $2, …: The individual fields of the current record

TODO: add explanations for output parts

RegEx matching

match(string, regexp [, array])

-> array is an array of matched groups

  • array[0] is the whole match,
  • array[1]`, the first group,
  • ...

Example:

echo "one tw_#_o three" | awk '{match($2, /\w+(_#_)(\w+)/, ary)} { print ary[1] }'
#              pattern -------^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^  ^^^^^^^^^^^^^^^------- main block (action)
# prints: `_#_`

# what would also work:
echo "one tw_#_o three" | awk '{match($2, /\w+(_#_)(\w+)/, ary)} { some_var = ary[1]; print some_var }'
# or:
echo "one tw_#_o three" | awk '{match($2, /\w+(_#_)(\w+)/, ary)} { $2 = ary[1]; print $2 }'

Substitution/Replacement

gsub(regex, replacement, [target])

-> target: Input for replacement (default: `$0)

echo "one tw_#_o three" | awk '{ gsub(/_#_/, "", $2); print $2 }'         # Here no pattern defined (takes every line)
# prints `two`

By default it always overwrites the target; if you do not want that you must assign it to a variable first:

echo "one tw_#_o three" | awk '{ new_var = $2; gsub(/_#_/, "", new_var); print new_var }'

Interesting functions and applications

Skip an uninteresting line

 awk '/^\/\// {next} // { print }' ./someFile.txt

Print text aligned

Prints column 1 and 4 nicely separated by spaces (the syntax is similar to C's printf):

awk '{ printf("%-40s%s\n", $1, $4) }'

TODO

$ grep -R "CI_TYPE =" ../modules/* | grep -v dummy | awk 'match($1, /modules\/(.*)\/main\.tf/, ary) { $1 = ary[1]; gsub(/"/,"", $4); printf("%-40s %s %.5f\n", $1, $4, 5); }'
Clone this wiki locally