Skip to content

Commit

Permalink
chore: build-system submodule=>subrepo (#1378)
Browse files Browse the repository at this point in the history
# Description
Let's move build-system to a subrepo. The tradeoffs for submodules are
worse.
Subrepo has some edge cases but for me they're now well understood.
**The major thing** is it optimizes for the normal case by being just
normal files plus a metadata file. Even if subrepo messes up sync'ing to
the upstream repo, aztec keeps its base truth moving and we can sync
later.

Pros over submodules:
- No one will pull and have an out of sync build-system, very common
accident is to push a build-system revert right now
- The mirror action is already done and I just point it to build-system
and nothing should be lost

Cons:
- No automatic two-way mirroring. We can manually recover (or even
implement this) but for simplicity let's just work on build-system from
aztec where everyone can see it.

Do we still want the other repo: I say default yes as long as it's
basically free. We can punt folding it in if it causes pain
  • Loading branch information
ludamad authored Aug 2, 2023
1 parent 2f66de1 commit 29ab491
Show file tree
Hide file tree
Showing 49 changed files with 1,334 additions and 8 deletions.
35 changes: 35 additions & 0 deletions .github/workflows/mirror_build_system_repo.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
name: Mirror to build-system repo

on:
push:
branches:
- master
paths:
- "build-system/**"
- "!build-system/.gitrepo"

jobs:
build:
runs-on: ubuntu-latest

steps:
- name: Checkout
uses: actions/checkout@v3
with:
fetch-depth: 0
token: ${{ secrets.AZTEC_BOT_GITHUB_TOKEN }}

- name: Push to branch
run: |
# we push using git subrepo (https://github.com/ingydotnet/git-subrepo)
# with some logic to recover from squashed parent commits
SUBREPO_PATH=build-system
# identify ourselves, needed to commit
git config --global user.name AztecBot
git config --global user.email tech@aztecprotocol.com
# push to subrepo, commit to master. The commit is needed
# to continue to replay. If we still hit issues such as this
# action failing due to upstream changes, a manual resolution
# PR with ./scripts/git_subrepo.sh pull will be needed.
./scripts/git_subrepo.sh push $SUBREPO_PATH --branch=master
git push # update .gitrepo on master
8 changes: 1 addition & 7 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -1,10 +1,4 @@
[submodule "build-system"]
path = build-system
url = https://github.com/AztecProtocol/build-system
[submodule "legacy-nested-build-system1"]
path = circuits/build-system
url = https://github.com/AztecProtocol/build-system
[submodule "legacy-nested-build-system2"]
[submodule "legacy-barretenberg-build-system"]
path = circuits/cpp/barretenberg/build-system
url = https://github.com/AztecProtocol/build-system
[submodule "l1-contracts/lib/openzeppelin-contracts"]
Expand Down
1 change: 0 additions & 1 deletion build-system
Submodule build-system deleted from 0fcc7d
12 changes: 12 additions & 0 deletions build-system/.gitrepo
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
; DO NOT EDIT (unless you know what you are doing)
;
; This subdirectory is a git "subrepo", and this file is maintained by the
; git-subrepo command. See https://github.com/ingydotnet/git-subrepo#readme
;
[subrepo]
remote = https://github.com/AztecProtocol/build-system
branch = master
commit = 0fcc7d16192a6d05831ab1662fa9d878f808f87e
parent = 00bef0cd0c81f10d8c3850e516621ecbf0c2dc4d
method = merge
cmdver = 0.4.6
75 changes: 75 additions & 0 deletions build-system/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# Build System

The Aztec build system is agnostic to its underlying platform, but currently our builds run in Circle CI. There were several requirements to be met in it's design.

## Requirements

- Monorepo support (or at least, multiple projects within one repoistory).
- Builds docker containers for simple deployments.
- Docker layer caching support to minimise rebuild times.
- Don't rebuild projects that haven't changed as part of a commit (analyse diffs between commits).
- Allow fine or coarse grained control, of which file changes within a project, trigger a rebuild.
- Stateless (apart from the source repository itself, and the target container registry).
- Enable building on powerful (up to 64 core) EC2 spot instances. They're extremely cheap and powerful relative to Circle CI offerings.
- Easy to follow build graph on Circle CI.
- Deploy updated services only on a fully successful build of entire project.
- No vendor lock-in (don't use vendor specific features).

## Overview

We will assume Circle CI is the orchestration platform

There are scripts that are called from the `.circleci/config.yml` that could be fairly easily run elsewhere if needed. They are located in the `scripts` folder, and are added to `PATH` so they can be called from project directories. The actual building of the services and libraries are all done with Dockerfiles.

There are two ECR (elastic container repository) instances used in two regions (`eu-west2` and `us-east2`). As containers are built, the results are stored in `us-east2` (deemed to be generally close to Circle CI) and these are considered to be caches that can be reused in subsequent builds. In the event of a deploy, the containers are published in `eu-west2` where all infrastructure is currently hosted. These are considered our live production builds.

We do not use Circle CI's "docker layer caching" feature, because:

- There is no guarantee the cache will be available between workflow steps or builds.
- There is not one single cache, but multiple caches which are randomly attached to your job.

For this reason it's undeterministic in terms of state or performance, and is thus impossible to use it for anything useful.

## Important Concepts

We avoid using any Circle CI specific features. They are very general purpose, and are thus often flawed. Also, we don't want vendor lock-in as Circle CI has caused us multiple problems in the past. We only use Circle CI to orchestrate the build sequence. We could relatively easily shift this orchestration to another vendor, or custom internal build service.

The build system leverages image names and tags in the docker image registry to keep track of it's historical success or failure in terms of builds, tests, and deployments. It's otherwise stateless, meaning it only needs a container registry to track state.

We work in terms of _commits_ and not branches. Branches are a higher level concept that are ignored. Given a commit hash, there is a linear history of commits we scan and compare to the docker registry to determine what's changed, and thus what needs to be rebuilt.

There is a `build_mainfest.json` that desribes various settings for each project (dependencies, rebuild patterns, etc). The dependencies as listed in the build manifest represent the graph such that if project A changes, all projects that depend on A will also be rebuilt. This likely closely mirrors the workflow graph as defined in Circle CI's `config.yml`.

A rebuild pattern is a regular expression that is matched against a list of changed files. We use pretty broad regular expressions that trigger rebuilds if _any_ file in a project changes, but you can be more fine-grained, e.g. not triggering rebuilds if you change something inconsequential.

## Usage

Add the build system into your repository as a submodule located at `/build-system`. Circle CI expects a `.circleci/config.yml` file from which you can leverage the build scripts. After checking out your repository code, initialise this submodule e.g.

```
git submodule update --init build-system
```

At the start of each job, it's necessary to setup the build environment e.g.

```
./build-system/scripts/setup_env "$CIRCLE_SHA1" "$CIRCLE_TAG" "$CIRCLE_JOB" "$CIRCLE_REPOSITORY_URL" "$CIRCLE_BRANCH"
```

Once called all scripts are available directly via `PATH` update, and various other env vars expected by scripts are set. You'll want to `source` the above script if you intend to use the build system within the calling shell.

Jobs will usually leverage one of the following scripts. View the scripts themselves for further documentation:

- `build`
- `deploy`
- `deploy_global`
- `cond_spot_run_build`
- `cond_spot_run_tests`

There are more fine grained scripts that maybe used in some cases such as:

- `deploy_ecr`
- `deploy_terraform`
- `deploy_npm`
- `deploy_s3`
- `deploy_dockerhub`
Binary file added build-system/bin/jq
Binary file not shown.
6 changes: 6 additions & 0 deletions build-system/build-image/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# This build image is used for launching the small docker executor in Circle CI.
# We only ever use this executor for launching powerful EC2 instances, as it's the cheapest option.
FROM alpine:latest
RUN apk add --no-cache python3 py3-pip git openssh-client ca-certificates jq bash curl \
&& pip3 install --upgrade pip \
&& pip3 install --no-cache-dir awscli
Binary file added build-system/lib/libjq.so.1
Binary file not shown.
Binary file added build-system/lib/libonig.so.5
Binary file not shown.
14 changes: 14 additions & 0 deletions build-system/remote/32core.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{
"ImageId": "ami-0e5df77ac318c7a18",
"KeyName": "build-instance",
"SecurityGroupIds": ["sg-0ccd4e5df0dcca0c9"],
"InstanceType": "r5.8xlarge",
"BlockDeviceMappings": [
{
"DeviceName": "/dev/sda1",
"Ebs": {
"VolumeSize": 16
}
}
]
}
14 changes: 14 additions & 0 deletions build-system/remote/64core.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{
"ImageId": "ami-0e5df77ac318c7a18",
"KeyName": "build-instance",
"SecurityGroupIds": ["sg-0ccd4e5df0dcca0c9"],
"InstanceType": "r5.16xlarge",
"BlockDeviceMappings": [
{
"DeviceName": "/dev/sda1",
"Ebs": {
"VolumeSize": 16
}
}
]
}
3 changes: 3 additions & 0 deletions build-system/remote/ssh_config
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
IdentityFile ~/.ssh/build_instance_key
StrictHostKeyChecking no
User ubuntu
28 changes: 28 additions & 0 deletions build-system/remote_build/remote_build
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
#!/bin/bash
set -e

ssh-keyscan -t rsa github.com >> ~/.ssh/known_hosts

echo "Initialising remote build..."

# IF YOU'RE CHANGING THIS, YOU ALSO WANT TO CHANGE: .circleci/config.yml
# Shallow checkout this commit.
mkdir -p project
cd project
git init
git remote add origin $GIT_REPOSITORY_URL
# Only download metadata when fetching.
git config remote.origin.promisor true
git config remote.origin.partialclonefilter blob:none
git fetch --depth 50 origin $COMMIT_HASH
git checkout FETCH_HEAD
# Checkout barretenberg submodule only.
git submodule update --init build-system

echo "Git checkout completed."

BASH_ENV=/tmp/bash_env
echo "Calling setup env..."
source ./build-system/scripts/setup_env "$COMMIT_HASH" "$COMMIT_TAG" "$JOB_NAME" "$GIT_REPOSITORY_URL"
echo "Calling build..."
build $@
148 changes: 148 additions & 0 deletions build-system/scripts/build
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
#!/bin/bash
#
# Builds a docker image and pushes it to it's repository. Leverages caches where possible.
# Cached images include previous successfully built images (including multi-stages) built on this branch.
# The images output are cache images, meaning they will eventually get purged.
# The deploy phase will tag the images such that they become permanent.
#
# usage: ./build <repository>
# example: ./build aztec-connect-cpp-x86_64-linux-clang
# output image:
# 278380418400.dkr.ecr.us-east-2.amazonaws.com/aztec-connect-cpp-x86_64-linux-clang:cache-deadbeefcafebabe1337c0d3
#
# In more detail:
# - Init all submodules required to build this project.
# - Log into cache ECR, and ensures repository exists.
# - Checks if current project needs to be rebuilt, if not, retag previous image with current commit hash and early out.
# - Validate any terraform that may exist.
# - Pull down dependent images that we do not control (e.g. alpine etc).
# - For images we do control, pull the image we've built (or retagged) as part of this build.
# - For each "named stage" (usually intermittent builders before creating final image), pull previous to prime the cache, build and push the results.
# - Pull previous project image to use it as a layer cache if it exists.
# - Perform the build of the image itself. With the cache primed we should only have to rebuild the necessary layers.
# - Push the image tagged with the commit hash to the cache.

set -euo pipefail

REPOSITORY=$1
DOCKERFILE=$(query_manifest dockerfile $REPOSITORY)
PROJECT_DIR=$(query_manifest projectDir $REPOSITORY)

echo "Repository: $REPOSITORY"
echo "Working directory: $PWD"
echo "Dockerfile: $DOCKERFILE"

init_submodules $REPOSITORY

function fetch_image() {
echo "Pulling: $1"
if ! docker pull $1 > /dev/null 2>&1; then
echo "Image not found: $1"
return 1
fi
return 0
}

# Ensure ECR repository exists.
ensure_repo $REPOSITORY $ECR_REGION refresh_lifecycle

LAST_SUCCESSFUL_COMMIT=$(last_successful_commit $REPOSITORY)
echo "Last successful commit: $LAST_SUCCESSFUL_COMMIT"

cd $(query_manifest buildDir $REPOSITORY)

# If we have previously successful commit, we can early out if nothing relevant has changed since.
if check_rebuild "$LAST_SUCCESSFUL_COMMIT" $REPOSITORY; then
echo "No rebuild necessary. Retagging..."
STAGES=$(cat $DOCKERFILE | sed -n -e 's/^FROM .* AS \(.*\)/\1/p')
for STAGE in $STAGES; do
tag_remote_image $REPOSITORY cache-$LAST_SUCCESSFUL_COMMIT-$STAGE cache-$COMMIT_HASH-$STAGE || true
done
tag_remote_image $REPOSITORY cache-$LAST_SUCCESSFUL_COMMIT cache-$COMMIT_HASH
untag_remote_image $REPOSITORY tainted
exit 0
fi

# Validate any terraform if it exists.
if [ -d $ROOT_PATH/$PROJECT_DIR/terraform ]; then
ensure_terraform
export TF_IN_AUTOMATION=1
pushd $ROOT_PATH/$PROJECT_DIR/terraform
for DIR in . $(find . -maxdepth 1 -type d); do
pushd $DIR
if [ -f ./main.tf ]; then
terraform init -input=false -backend-config="key=dummy"
terraform validate
fi
popd
done
popd
fi

# Pull latest parents that are not ours. We also do not want to pull images suffixed by _, this is how we scope intermediate build images.
echo "$DOCKERHUB_PASSWORD" | docker login -u aztecprotocolci --password-stdin
PARENTS=$(cat $DOCKERFILE | sed -n -e 's/^FROM \([^[:space:]]\+\).*/\1/p' | sed '/_$/d' | grep -v $ECR_DEPLOY_URL | sort | uniq)
for PARENT in $PARENTS; do
fetch_image $PARENT
done

# For each parent that's ours, pull in the latest image.
PARENTS=$(cat $DOCKERFILE | sed -n -e "s/^FROM $ECR_DEPLOY_URL\/\([^[:space:]]\+\).*/\1/p")
for PARENT in $PARENTS; do
# Extract repository name (i.e. discard tag).
PARENT_REPO=${PARENT%:*}
PARENT_COMMIT_HASH=$(last_successful_commit $PARENT_REPO)
# There must be a parent image to continue.
if [ -z "$PARENT_COMMIT_HASH" ]; then
echo "No parent image found for $PARENT_REPO"
exit 1
fi
PARENT_IMAGE_URI=$ECR_URL/$PARENT_REPO:cache-$PARENT_COMMIT_HASH
echo "Pulling dependency $PARENT_REPO..."
fetch_image $PARENT_IMAGE_URI
# Tag it to look like an official release as that's what we use in Dockerfiles.
docker tag $PARENT_IMAGE_URI $ECR_DEPLOY_URL/$PARENT
done

# Pull, build and push each named stage to cache.
STAGE_CACHE_FROM=""
CACHE_FROM=""
STAGES=$(cat $DOCKERFILE | sed -n -e 's/^FROM .* AS \(.*\)/\1/p')
for STAGE in $STAGES; do
# Get the last build of this stage to leverage layer caching.
if [ -n "$LAST_SUCCESSFUL_COMMIT" ]; then
echo "Pulling stage: $STAGE"
STAGE_IMAGE_LAST_URI=$ECR_URL/$REPOSITORY:cache-$LAST_SUCCESSFUL_COMMIT-$STAGE
if fetch_image $STAGE_IMAGE_LAST_URI; then
STAGE_CACHE_FROM="--cache-from $STAGE_IMAGE_LAST_URI"
fi
fi

echo "Building stage: $STAGE"
STAGE_IMAGE_COMMIT_URI=$ECR_URL/$REPOSITORY:cache-$COMMIT_HASH-$STAGE
docker build --target $STAGE $STAGE_CACHE_FROM -t $STAGE_IMAGE_COMMIT_URI -f $DOCKERFILE --build-arg ARG_COMMIT_HASH=$COMMIT_HASH .

# We don't want to have redo this stages work when building the final image. Use it as a layer cache.
CACHE_FROM="--cache-from $STAGE_IMAGE_COMMIT_URI $CACHE_FROM"

echo "Pushing stage: $STAGE"
docker push $STAGE_IMAGE_COMMIT_URI > /dev/null 2>&1
echo
done

# Pull previous image to use it as a layer cache if it exists.
if [ -n "$LAST_SUCCESSFUL_COMMIT" ]; then
LAST_SUCCESSFUL_URI=$ECR_URL/$REPOSITORY:cache-$LAST_SUCCESSFUL_COMMIT
echo "Pulling previous build of $REPOSITORY..."
fetch_image $LAST_SUCCESSFUL_URI || true
CACHE_FROM="--cache-from $LAST_SUCCESSFUL_URI $CACHE_FROM"
echo
fi

# Build the actual image and give it a commit tag.
IMAGE_COMMIT_URI=$ECR_URL/$REPOSITORY:cache-$COMMIT_HASH
echo "Building image: $IMAGE_COMMIT_URI"
docker build -t $IMAGE_COMMIT_URI -f $DOCKERFILE $CACHE_FROM --build-arg COMMIT_TAG=$COMMIT_TAG --build-arg ARG_COMMIT_HASH=$COMMIT_HASH .
echo "Pushing image: $IMAGE_COMMIT_URI"
docker push $IMAGE_COMMIT_URI > /dev/null 2>&1
untag_remote_image $REPOSITORY tainted
Loading

0 comments on commit 29ab491

Please sign in to comment.