Pandoc is the swiss army knife of document conversion, especially when Markdown is somewhere in the game. It supports filters transforming the Pandoc AST after the input is parsed and before the output is written.
This library helps writing filters in Groovy programming language. It has been heavily inspired by Pandoc filter for Python.
Pandoc filters perform transformation of Pandoc AST in JSON format. They are executed like this:
pandoc -f SOURCEFORMAT -t TARGETFORMAT --filter ./Behead.groovy
Filter script Behead.groovy
has to be executable. For more details on Pandoc filters see Pandoc scripting guide
The GroovyPandoc library is not working on plain JSON format, but instead converts it to object tree. Each Pandoc element has a corresponding Groovy class. See package com.github.dfrommi.pandoc.types
in library's GroovyDoc for details.
The filter entry point is the static method toJSONFilter
of class Pandoc
. It walks through Pandoc AST using breadth first search and calls the provided closure for each element. Typically, the filter closure first checks if the element is of interest and then transforms it.
The modification done on Pandoc AST depends on closure's return value:
null
: No modification of AST- Empty list
[]
: Remove current element from document - Single element: Replace current element by returned element
- List of elements: Replace current element by all elements of list
Two types of closures are supported.
If transformation closure expects one parameter, then the Pandoc element is passed.
If 2 parameters are expected, then first one is still the Pandoc element and second is metadata object (Meta
), which is a representation of YAML metadata at beginning of document.
The source code contains some example filters in directory src/examples
. Let's have a closer look at the Behead example, which transforms all headers with level greater or equal to 2 to a paragraph with emphasized text.
#!/usr/bin/env groovy
@GrabResolver('https://jitpack.io')
@Grab('com.github.dfrommi:groovy-pandoc')
import static Pandoc.*
import com.github.dfrommi.pandoc.types.*
toJSONFilter {
if(it in Header && it.level >= 2) {
new Para(new Emph(text: it.text))
}
}
The script requires shebang at first line. It tells the shell use Groovy interpreter for execution. GroovyPandoc.jar is required in classpath, so we add it to the Groovy call. The JAR file is searched relative from the directory where pandoc is executed, not relative to the directory of filter script.
toJSONFilter
is the starting point of the filter. We are interested only in Header
elements with a level greater or equal to 2. That is verified in if clause. For all elements that are not matching the condition, null
is automatically returned and therefore no modification is performed.
If the condition is met, then we return a new Para
object with Emph
as only child. The text is copied from Header
element. In that case, only a single object is returned and therefore the Header
object is replaced by Para
object Pandoc AST.
There is a second variant of toJSONFilter
, expecting a class as first parameter. The Closure is then only called if the element is of given type. With the modifired toJSONFilter
call, above example would look like this:
#!/usr/bin/env groovy
@GrabResolver('https://jitpack.io')
@Grab('com.github.dfrommi:groovy-pandoc')
import static Pandoc.*
import com.github.dfrommi.pandoc.types.*
toJSONFilter(Header) { Header h ->
if(h.level >= 2) {
new Para(new Emph(text: h.text))
}
}