litparse
is a Bash program that extracts, concatenates, and executes code blocks from Markdown files.
#!/usr/bin/env bash
The intermingling of natural language with code enables a basic form of literate programming similar to Docco, Pycco, and codedown. Note that the purpose of litparse
is to extract and run code rather than generate documentation.
This README
contains the literate source code for litparse
and it can build itself. We extract all the code blocks in this file with bash
as the language (-l bash
) and we don't execute the result, but merely print the result (-p
). You can read about how these arguments work below.
You can use this output to create a temporary file called litparse.sh
and then install it in /usr/bin
without an extension so that it can be called without an extension in the shell.
Note that the following code block is not fenced with a language so that these lines do not appear in the final source code.
litparse -l bash -p README.md > dist/litparse.sh
sudo install dist/litparse.sh /usr/bin/litparse
There are simple examples in the test directory. Here are some additional examples:
Python
#!/usr/bin/env python
def hello():
print "hello world"
if __name__ == '__main__':
hello() # this line also contains leading space
Ruby
#!/usr/bin/env ruby
puts "Hello World"
Licensed under the MIT License.
# Copyright 2017 Metaist LLC <http://metaist.com/>
# MIT License <https://opensource.org/licenses/mit-license>
There are many ways in which Bash programs are hard to debug. In order to make development easier, we enable certain Bash options that prevent many inadvertent errors:
-u
- Prevents us from using a variable that has not been declared.-o pipefail
- Prevents a pipe from continuing if part of the pipeline has an error.
set -uo pipefail
Another common problem is Bash's notorious Internal Field Separator. This internal variable is sometimes used to split strings, but because it defaults to whitespace (including spaces), it is often more headache than it is worth. We set it to newline and tab to handle the common cases we care about.
IFS=$'\n\t'
When we display the script usage we also display the name and version of the script. In general, we compute these values instead of hard-coding them to avoid errors when copying these lines into other scripts.
SCRIPT_SOURCE
holds the full path of the current script.SCRIPT_NAME
is the name of the scriptSCRIPT_VERSION
is the semantic version of the script. Note that this value is manually updated before each release.
SCRIPT_SOURCE=$(readlink -f ${BASH_SOURCE[0]})
SCRIPT_NAME=$(basename $SCRIPT_SOURCE)
SCRIPT_VERSION='0.1.0'
The usage
function displays a string with the description of the script and the possible options.
# Display the script usage.
usage() {
printf "\
$SCRIPT_NAME ($SCRIPT_VERSION) - Extract executable code from Markdown files.
Usage:
$SCRIPT_NAME [INPUT] [-h|--help] [-v|--version] [-l|--lang LANG] [-p|--print]
Options:
INPUT markdown file to parse (default: STDIN)
-h, --help show this message and exit
-v, --version print script version and exit
-l, --lang LANG language name to extract from fenced blocks
-p, --print print the script instead of executing
"
}
We use a regular expressions (aka regex) to detect and extract code blocks. We also us a regex to detect the shebang that a program uses to parse the resulting output.
REGEX_SHEBANG
matches the#!
and captures the command to execute at the end of processing.REGEX_BLOCK
matches four consecutive spaces at the start of a line.REGEX_FENCE
matches three backticks and it captures the name of the language for the code fence.
REGEX_SHEBANG='^#!([ /a-zA-Z]*)'
REGEX_BLOCK='^\s\s\s\s'
REGEX_FENCE='^```([a-zA-Z]*)'
In order to extract code blocks we need to know the path to the file and which language (if any) to extract.
# Process a single file.
process_file() {
local in_path=${1:-''}
local in_lang=${2:-''}
As we process each line, we will also need to keep track of whether we are currently in a fenced code block and its associated language.
local fence_start=false
local fence_lang=''
We iterate over each line using reading from the given file without escaping backslash characters (-r
). To handle files piped in via STDIN
, we use file handle number ten (-u 10
).
while read -ru 10 line; do
When we reach a fenced code block, extract the name of the language (at the start of a code block) and toggle whether we are in a fenced code block. We do not want to do anything else with this line, so we continue
.
if [[ $line =~ $REGEX_FENCE ]]; then
fence_lang="${BASH_REMATCH[1]}"
if $fence_start; then fence_start=false; else fence_start=true; fi
continue
fi # fence detected
But which lines should we include in the output? If no language was specified, then all code blocks (fenced or otherwise) should be included. Otherwise, we only output those fenced code blocks with the given langauge.
For code blocks, but not fenced code blocks, we remove the leading 4 spaces using sed
.
if [[ "$in_lang" == "" || "$in_lang" == "$fence_lang" ]]; then
if $fence_start; then
echo $line
elif [[ $line =~ $REGEX_BLOCK ]]; then
echo $line | sed "s/$REGEX_BLOCK//"
fi
fi
done 10<$in_path
}
We set default values for our arguments:
ARG_INPUT
- read fromSTDIN
ARG_LANG
- extract all blocks (i.e. don't specify a language)ARG_PRINT
- try to execute the resulting script
ARG_INPUT='/dev/stdin'
ARG_LANG=''
ARG_PRINT=false
Now we iterate over each argument to check its value.
while [[ "$#" > 0 ]]; do
case "$1" in
For --help
and --version
we echo the appropriate output and exit.
-h|--help) usage; exit 0;;
-v|--version) echo "$SCRIPT_NAME $SCRIPT_VERSION"; exit 0;;
For optional arguments we consume the option as well as any corresponding arguments.
-l|--lang) ARG_LANG="$2"; shift 2;;
-p|--print) ARG_PRINT=true; shift 1;;
For unknown arguments, the first is treated as the input file while subsequent arguments cause and error.
*)
if [[ "$ARG_INPUT" == "/dev/stdin" ]]; then
ARG_INPUT="$1"
shift 1
else
echo "unknown option: [$1]" >&2
usage
exit 1
fi;;
esac
done # args parsed
Once we have all of our arguments parsed, we can now pass those arguments to the extraction process. We save the content of the extraction in a single string so that we can do some post-processing on it.
CONTENT=$(process_file "$ARG_INPUT" "$ARG_LANG")
If the extracted blocks begin with a shebang, we save it separately.
SHEBANG=''
if [[ $CONTENT =~ $REGEX_SHEBANG ]]; then
SHEBANG="${BASH_REMATCH[1]}"
fi # extracted the shebang
Unless we are explicitly told to print the script, we will try to execute it using the shebang we extracted.
if [[ "$ARG_PRINT" == "true" || "$SHEBANG" == "" ]]; then
echo "$CONTENT"
else
echo "$CONTENT" | eval $SHEBANG
fi # printed or executed the script
And that's all there is.