Skip to content
This repository has been archived by the owner on Sep 4, 2024. It is now read-only.

Build cache

Agis Anastasopoulos edited this page Jul 3, 2019 · 7 revisions

mistry can cache contents between builds. Depending on the project, this may speed up jobs significantly. Additionally, the cache is the means to achieve incremental building.

Prerequisites:

Usage

The cache is enabled on a per-job basis by passing a value to the group option when scheduling a build. The actual value can be anything, as long as it's the same between jobs that want to use the same cache.

Enabling the cache means that the data directory of the last completed job of the same group will be used as the data directory for the new job.

Given the following jobs that are executed sequentially:

  • Job A {project: X, group: foo}
  • Job B {project: X, group: bar}
  • Job C {project: X, group: foo}
  • Job D {project: X, group: bar}

we have the following behavior:

  • Job A will start with an empty data directory, since it's the first job executed in project X with group foo
  • Job B will start with an empty data directory, since it's the first job executed in project X with group bar
  • Job C will start with the data directory from Job A, since it uses the same group as Job A.
  • Job D will start with the data directory from Job B, since it uses the same group as Job B.

Sample usage

To illustrate this with an example, let's assume a project that just writes the string "bar" to a file named foo.txt and produces an additional file for cache purposes, bar.txt (its contents are irrelevant but let's assume it contains cache data). We schedule a job of this project passing it a group so that the cache is enabled:

$ mistry build --project projectx --group abc

After the build completes we can see in the host the data path of this job containing the following:

$ tree /var/lib/mistry/data/foo/ready/var/lib/mistry/data/projectx/ready
|-- 00f46193c660ccb36d355a7a1ba104b1e66da55c24bdca26e54c9843f79c18bd
|   |-- data # mounted as `/data` inside the container
|   |   |-- artifacts  # <-- build artifacts
|   |   |   |-- foo.txt
|   |   |-- cache      # <-- build cache contents
|   |   |   |-- bar.txt
|   |   `-- params     # <-- job parameters
|   |-- out.log
|   `-- result.json

Now let's assume that another job of the same project is scheduled:

$ mistry build --project projectx --group abc -- --cachebust=$RANDOM

Note: the cachebust parameter is passed only for demonstration purposes, so that the job is not considered identical to the previous one (ie. bypass the result cache) - the name and value are both irrelevant.

When this new job is started, its /data path will initially contain all the contents of the previous job's results. So from inside the container, we would see this:

$ tree /data
|-- artifacts
|   |-- foo.txt # "bar"
|-- cache
|   |-- bar.txt
|-- params
|   |-- cachebust # eg. 312456

FAQ

Q: Do I have to take care of cleaning up the artifacts path in case there are leftovers from previous artifacts?

A: Yes.

Q: Why are artifacts from previous builds also retained?

A: Because they may be used to achieve incremental building (many build tools do not re-build artifacts when they are already present).

Clone this wiki locally