Skip to content
Gregg Reynolds edited this page Feb 11, 2016 · 4 revisions

It's a series of tubes. - Ted Stevens (R, Alaska), explains the internet

The central organizing concept of Boot is pipeline. Unlike Tasks, Filesets, and Pods, pipelines are not first-class objects in boot; pipelines are constructed using clojure.core/comp. But Tasks, Filesets, and Pods were designed with the pipeline in mind, so it really is central.

Pipeline is obviously a metaphor, and like most metaphors you can only take it so far. In fact boot pipelines are not much like real-world pipelines. Consider the "pipeline" that connects a derrick in an oil field to the gas pump in your local petrol station. Crude oil goes in one end, and refined gasoline comes out the other. In between lies a series of processing nodes that transform the former into the latter. As a metaphor, this is misleading:

  1. Real-world pipelines are one-way affairs. Crude oil goes in one end, and it doesn't come back. Boot pipelines, by contrast, are bi-directional. It's probably better to think of a boot pipeline as composed of two sub-pipelines going in opposite directions.

  2. Boot pipelines are circular - the processing node (Task) at the far end connects its input pipeline to its output pipeline, so the Fileset carried by the pipeline will pass through each node (Task) twice - once on the way out, and again on the way back. (If you've ever wondered where the well-know web app framework Ring got its name, now you know. Well, I don't know if that's what the author had in mind, but it sounds good. Boot's pipeline was inspired by Ring.)

  3. Real-world pipelines involve some kind of mechanism that serves as a transport channel connecting processing nodes. The oil derrick is connected to the next node by a steel pipe; the gasoline pump is connected to the previous node by a tanker truck. There is no such intermediating mechanism in boot pipelines. Instead, tasks are connected via ordinary function composition - but only under the constraint that Tasks must be structured in a particular way, as explained in Tasks.

The last point is so important it bears repeating: boot does not use a first-class channel mechanism to connect tasks. It just uses function composition and application. Compare this with another possible design: you could use core.async channels to connect Tasks. (TODO: motivate this design choice.)

The "raw material" that moves through a boot pipeline is a Fileset. Each Task in the pipeline receives an incoming Fileset and passes it on to the next Task. The last Task in the chain passes it back to the preceding Task, which passes it back it its preceding Task, and so on until it reaches the initial Task.

Conceptually, each Task except the first and last thus has four "ports" - it receives a Fileset on the way out and passes it forward, and then it receives a Fileset on the way back, and passes it backward. But since we do not have a first-class channel mechanism, we do not have first-class ports - what really happens is that each Task, being a function, is applied to the incoming Fileset; it may do something with the Fileset (including transform it), and then it applies the next Task to the result of its processing. It then "waits" for that next Task to return its result (i.e. it receives the result on its return-trip input port), after which it may perform additional processing, and finally it returns its result - remember, it got started when it was "called" by its preceding Task.

Each Task thus has the opportunity to do some preprocessing before it passes on the Fileset, and some post-processing after it receives the Fileset on the return trip.

For such a pipeline to do anything useful, at least one of the Tasks must have side-effects. This is where Filesets come in. From the perspective of the project directories in the file system, transformations of the Fileset do not have side-effects. For example, a Java compile step will transform .java files in the Fileset into .class files. But it will not have the side effect of actually writing those class files to the project's classes output directory - instead that side effect is "sequestered" by the Fileset mechanism. The output .class files will be written to hidden temporary directories under control of the mechanism - that means they will be in the Fileset, and thus passed on to the next Task. To put it another way, once the compile Task has compiled its files, its job is done. Actually writing the output of the compile to disk in the project classes directory is a separate Task.

Writing the output files from the Fileset (i.e. from the hidden temp files managed by the Fileset mechanism) to the project output directories is the job of the target task. So to compile your Java files you need a pipeline with two Tasks: a compile task to do the compiling, and a target task to write the results to a project directory. Note that this is a side-effect of the Target task; like any other Task, it takes an input Fileset, processes it (in this case, with side-effects that change the project directories), and then passes it back up the pipeline.

Sub-pipelines

Because a boot Task may have its own pipeline, boot pipeline will often have sub-pipelines. For example, if our pipeline is composed of tasks A - B - C, and Task B contains a pipeline B1 - B2 - B3, the order of processing will be as follows (using ' to indicate return-trip processing):

A -> B -> B1 -> B2 -> B3 -> B2' -> B1' -> C -> B' -> A'.
Clone this wiki locally