Have you ever had to parse a file line by line and do operations on it? Don't answer that - I know you have. awk is a wonderful little command line utility purpose-built for parsing and doing operations on a file line-by-line. The purpose of this tutorial is to convince you that you should be reaching for awk next time you need to read and transform a file instead of writing a python or js script.
The structure of an awk program is pattern {action}
.
That means that for each line, awk will evaluate the pattern and then take an action.
So for instances, if you want to print the 5th column (say the URL) of a log file when the 3rd column (maybe the status code) equals 500, you might do awk '$3 == 500 { print $5 }'
.
BEGIN
and END
are two special keywords that, when proceeded with an action, will take that action, before any lines are evaluated and after all lines are evaluated, respectively.
For instance:
awk 'BEGIN {print "hello"} END {print "goodbye"}' sample.txt
In the above example, notice how I was able to just print the 5th column by writing print $5
.
$5
refers to the 5th field, or column, in a line given a separator.
The default separator is whitespace.
You can refer to any column with $n
where n
is the 1-indexed column number.
$0
refers to the entire line.
awk also includes a number of built-in variables:
-
NR - This variable keeps track of the number of lines seen. For example, at the 5th line,
print NR
would print 5. mnemonic: Number of Records. -
NF - Number of fields in the current record. mnemonic: Number of Fields
-
FS - The field separator being used for this script. By default, it'll be whitespace. mnemonic: Field Separator
There are other special variables, but these give you the 80/20 you need to be effective with awk.
You could print the number of fields in each line like so.
awk '{print NF}' sample.txt
And to print the number of lines a file:
awk 'END {print NR}' sample.txt
The simplest conditional is no conditional. For instance if you want to just print the 1st field of every line, do the following.
awk '{ print $1 }' sample.txt
As mentioned above, you can print a line or a field if some field is equal to some value.
awk '$2 == "1" { print $1}' sample.txt
Or maybe you want to print lines whose length is greater than some value.
awk 'length($0) > 9' sample.txt
Notice how I didn't include an action in that last command? The default behavior when no action is specified is to just print the whole line.
A common task you might have is to print some lines of text where some field matches a regular expression.
With awk, you can test a string against a regex with the ~
operator.
For instance, if I want to print the line if the first field contains an a
, I'd do the following.
awk '$1 ~ /a/' sample.txt
To print the lines where the first field starts with a
, I'd do the following.
awk '$1 ~ /^a/' sample.txt
And finally, if I wanted to match against an entire line, I could do:
awk '/9/' sample.txt
Let's say you wanted to sum the values of a column in a file. With awk, that's easy peasy.
awk '{ sum += $4 } END { print sum }' sample.txt
You can define variables that can be used throughout the program using the -v
flag.
For instance, if you wanted to re-use some string, you could do the following.
awk -v search_str=a 'BEGIN { print "finding " search_str } $1 ~ search_str { print $0 }' sample.txt
I used a few resources for this tutorial.