chore: build-system submodule=>subrepo (#1378)

# Description Let's move build-system to a subrepo. The tradeoffs for submodules are worse. Subrepo has some edge cases but for me they're now well understood. **The major thing** is it optimizes for the normal case by being just normal files plus a metadata file. Even if subrepo messes up sync'ing to the upstream repo, aztec keeps its base truth moving and we can sync later. Pros over submodules: - No one will pull and have an out of sync build-system, very common accident is to push a build-system revert right now - The mirror action is already done and I just point it to build-system and nothing should be lost Cons: - No automatic two-way mirroring. We can manually recover (or even implement this) but for simplicity let's just work on build-system from aztec where everyone can see it. Do we still want the other repo: I say default yes as long as it's basically free. We can punt folding it in if it causes pain
AztecProtocol · Aug 2, 2023 · 29ab491 · 29ab491
1 parent 2f66de1
commit 29ab491
Show file tree

Hide file tree

Showing 49 changed files with 1,334 additions and 8 deletions.
diff --git a/...kflows/mirror_barretenberg_repository.yml → ...ub/workflows/mirror_barretenberg_repo.yml b/...kflows/mirror_barretenberg_repository.yml → ...ub/workflows/mirror_barretenberg_repo.yml
diff --git a/.github/workflows/mirror_build_system_repo.yml b/.github/workflows/mirror_build_system_repo.yml
@@ -0,0 +1,35 @@
+name: Mirror to build-system repo
+
+on:
+  push:
+    branches:
+      - master
+    paths:
+      - "build-system/**"
+      - "!build-system/.gitrepo"
+
+jobs:
+  build:
+    runs-on: ubuntu-latest
+
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v3
+        with:
+          fetch-depth: 0
+          token: ${{ secrets.AZTEC_BOT_GITHUB_TOKEN }}
+
+      - name: Push to branch
+        run: |
+          # we push using git subrepo (https://github.com/ingydotnet/git-subrepo)
+          # with some logic to recover from squashed parent commits
+          SUBREPO_PATH=build-system
+          # identify ourselves, needed to commit
+          git config --global user.name AztecBot
+          git config --global user.email tech@aztecprotocol.com
+          # push to subrepo, commit to master. The commit is needed
+          # to continue to replay. If we still hit issues such as this
+          # action failing due to upstream changes, a manual resolution
+          # PR with ./scripts/git_subrepo.sh pull will be needed.
+          ./scripts/git_subrepo.sh push $SUBREPO_PATH --branch=master
+          git push # update .gitrepo on master
diff --git a/.gitmodules b/.gitmodules
@@ -1,10 +1,4 @@
-[submodule "build-system"]
-	path = build-system
-	url = https://github.com/AztecProtocol/build-system
-[submodule "legacy-nested-build-system1"]
-	path = circuits/build-system
-	url = https://github.com/AztecProtocol/build-system
-[submodule "legacy-nested-build-system2"]
+[submodule "legacy-barretenberg-build-system"]
 	path = circuits/cpp/barretenberg/build-system
 	url = https://github.com/AztecProtocol/build-system
 [submodule "l1-contracts/lib/openzeppelin-contracts"]

diff --git a/build-system b/build-system
diff --git a/build-system/.gitrepo b/build-system/.gitrepo
@@ -0,0 +1,12 @@
+; DO NOT EDIT (unless you know what you are doing)
+;
+; This subdirectory is a git "subrepo", and this file is maintained by the
+; git-subrepo command. See https://github.com/ingydotnet/git-subrepo#readme
+;
+[subrepo]
+	remote = https://github.com/AztecProtocol/build-system
+	branch = master
+	commit = 0fcc7d16192a6d05831ab1662fa9d878f808f87e
+	parent = 00bef0cd0c81f10d8c3850e516621ecbf0c2dc4d
+	method = merge
+	cmdver = 0.4.6
diff --git a/build-system/README.md b/build-system/README.md
@@ -0,0 +1,75 @@
+# Build System
+
+The Aztec build system is agnostic to its underlying platform, but currently our builds run in Circle CI. There were several requirements to be met in it's design.
+
+## Requirements
+
+- Monorepo support (or at least, multiple projects within one repoistory).
+- Builds docker containers for simple deployments.
+- Docker layer caching support to minimise rebuild times.
+- Don't rebuild projects that haven't changed as part of a commit (analyse diffs between commits).
+- Allow fine or coarse grained control, of which file changes within a project, trigger a rebuild.
+- Stateless (apart from the source repository itself, and the target container registry).
+- Enable building on powerful (up to 64 core) EC2 spot instances. They're extremely cheap and powerful relative to Circle CI offerings.
+- Easy to follow build graph on Circle CI.
+- Deploy updated services only on a fully successful build of entire project.
+- No vendor lock-in (don't use vendor specific features).
+
+## Overview
+
+We will assume Circle CI is the orchestration platform
+
+There are scripts that are called from the `.circleci/config.yml` that could be fairly easily run elsewhere if needed. They are located in the `scripts` folder, and are added to `PATH` so they can be called from project directories. The actual building of the services and libraries are all done with Dockerfiles.
+
+There are two ECR (elastic container repository) instances used in two regions (`eu-west2` and `us-east2`). As containers are built, the results are stored in `us-east2` (deemed to be generally close to Circle CI) and these are considered to be caches that can be reused in subsequent builds. In the event of a deploy, the containers are published in `eu-west2` where all infrastructure is currently hosted. These are considered our live production builds.
+
+We do not use Circle CI's "docker layer caching" feature, because:
+
+- There is no guarantee the cache will be available between workflow steps or builds.
+- There is not one single cache, but multiple caches which are randomly attached to your job.
+
+For this reason it's undeterministic in terms of state or performance, and is thus impossible to use it for anything useful.
+
+## Important Concepts
+
+We avoid using any Circle CI specific features. They are very general purpose, and are thus often flawed. Also, we don't want vendor lock-in as Circle CI has caused us multiple problems in the past. We only use Circle CI to orchestrate the build sequence. We could relatively easily shift this orchestration to another vendor, or custom internal build service.
+
+The build system leverages image names and tags in the docker image registry to keep track of it's historical success or failure in terms of builds, tests, and deployments. It's otherwise stateless, meaning it only needs a container registry to track state.
+
+We work in terms of _commits_ and not branches. Branches are a higher level concept that are ignored. Given a commit hash, there is a linear history of commits we scan and compare to the docker registry to determine what's changed, and thus what needs to be rebuilt.
+
+There is a `build_mainfest.json` that desribes various settings for each project (dependencies, rebuild patterns, etc). The dependencies as listed in the build manifest represent the graph such that if project A changes, all projects that depend on A will also be rebuilt. This likely closely mirrors the workflow graph as defined in Circle CI's `config.yml`.
+
+A rebuild pattern is a regular expression that is matched against a list of changed files. We use pretty broad regular expressions that trigger rebuilds if _any_ file in a project changes, but you can be more fine-grained, e.g. not triggering rebuilds if you change something inconsequential.
+
+## Usage
+
+Add the build system into your repository as a submodule located at `/build-system`. Circle CI expects a `.circleci/config.yml` file from which you can leverage the build scripts. After checking out your repository code, initialise this submodule e.g.
+
+```
+git submodule update --init build-system
+```
+
+At the start of each job, it's necessary to setup the build environment e.g.
+
+```
+./build-system/scripts/setup_env "$CIRCLE_SHA1" "$CIRCLE_TAG" "$CIRCLE_JOB" "$CIRCLE_REPOSITORY_URL" "$CIRCLE_BRANCH"
+```
+
+Once called all scripts are available directly via `PATH` update, and various other env vars expected by scripts are set. You'll want to `source` the above script if you intend to use the build system within the calling shell.
+
+Jobs will usually leverage one of the following scripts. View the scripts themselves for further documentation:
+
+- `build`
+- `deploy`
+- `deploy_global`
+- `cond_spot_run_build`
+- `cond_spot_run_tests`
+
+There are more fine grained scripts that maybe used in some cases such as:
+
+- `deploy_ecr`
+- `deploy_terraform`
+- `deploy_npm`
+- `deploy_s3`
+- `deploy_dockerhub`
diff --git a/build-system/bin/jq b/build-system/bin/jq
diff --git a/build-system/build-image/Dockerfile b/build-system/build-image/Dockerfile
@@ -0,0 +1,6 @@
+# This build image is used for launching the small docker executor in Circle CI.
+# We only ever use this executor for launching powerful EC2 instances, as it's the cheapest option.
+FROM alpine:latest
+RUN apk add --no-cache python3 py3-pip git openssh-client ca-certificates jq bash curl \
+    && pip3 install --upgrade pip \
+    && pip3 install --no-cache-dir awscli
diff --git a/build-system/lib/libjq.so.1 b/build-system/lib/libjq.so.1
diff --git a/build-system/lib/libonig.so.5 b/build-system/lib/libonig.so.5
diff --git a/build-system/remote/32core.json b/build-system/remote/32core.json
@@ -0,0 +1,14 @@
+{
+  "ImageId": "ami-0e5df77ac318c7a18",
+  "KeyName": "build-instance",
+  "SecurityGroupIds": ["sg-0ccd4e5df0dcca0c9"],
+  "InstanceType": "r5.8xlarge",
+  "BlockDeviceMappings": [
+    {
+      "DeviceName": "/dev/sda1",
+      "Ebs": {
+        "VolumeSize": 16
+      }
+    }
+  ]
+}
diff --git a/build-system/remote/64core.json b/build-system/remote/64core.json
@@ -0,0 +1,14 @@
+{
+  "ImageId": "ami-0e5df77ac318c7a18",
+  "KeyName": "build-instance",
+  "SecurityGroupIds": ["sg-0ccd4e5df0dcca0c9"],
+  "InstanceType": "r5.16xlarge",
+  "BlockDeviceMappings": [
+    {
+      "DeviceName": "/dev/sda1",
+      "Ebs": {
+        "VolumeSize": 16
+      }
+    }
+  ]
+}
diff --git a/build-system/remote/ssh_config b/build-system/remote/ssh_config
@@ -0,0 +1,3 @@
+IdentityFile ~/.ssh/build_instance_key
+StrictHostKeyChecking no
+User ubuntu
diff --git a/build-system/remote_build/remote_build b/build-system/remote_build/remote_build
@@ -0,0 +1,28 @@
+#!/bin/bash
+set -e
+
+ssh-keyscan -t rsa github.com >> ~/.ssh/known_hosts
+
+echo "Initialising remote build..."
+
+# IF YOU'RE CHANGING THIS, YOU ALSO WANT TO CHANGE: .circleci/config.yml
+# Shallow checkout this commit.
+mkdir -p project
+cd project
+git init
+git remote add origin $GIT_REPOSITORY_URL
+# Only download metadata when fetching.
+git config remote.origin.promisor true
+git config remote.origin.partialclonefilter blob:none
+git fetch --depth 50 origin $COMMIT_HASH
+git checkout FETCH_HEAD
+# Checkout barretenberg submodule only.
+git submodule update --init build-system
+
+echo "Git checkout completed."
+
+BASH_ENV=/tmp/bash_env
+echo "Calling setup env..."
+source ./build-system/scripts/setup_env "$COMMIT_HASH" "$COMMIT_TAG" "$JOB_NAME" "$GIT_REPOSITORY_URL"
+echo "Calling build..."
+build $@
diff --git a/build-system/scripts/build b/build-system/scripts/build
@@ -0,0 +1,148 @@
+#!/bin/bash
+#
+# Builds a docker image and pushes it to it's repository. Leverages caches where possible.
+# Cached images include previous successfully built images (including multi-stages) built on this branch.
+# The images output are cache images, meaning they will eventually get purged.
+# The deploy phase will tag the images such that they become permanent.
+#
+# usage: ./build <repository>
+# example: ./build aztec-connect-cpp-x86_64-linux-clang
+# output image:
+#   278380418400.dkr.ecr.us-east-2.amazonaws.com/aztec-connect-cpp-x86_64-linux-clang:cache-deadbeefcafebabe1337c0d3
+#
+# In more detail:
+# - Init all submodules required to build this project.
+# - Log into cache ECR, and ensures repository exists.
+# - Checks if current project needs to be rebuilt, if not, retag previous image with current commit hash and early out.
+# - Validate any terraform that may exist.
+# - Pull down dependent images that we do not control (e.g. alpine etc).
+# - For images we do control, pull the image we've built (or retagged) as part of this build.
+# - For each "named stage" (usually intermittent builders before creating final image), pull previous to prime the cache, build and push the results.
+# - Pull previous project image to use it as a layer cache if it exists.
+# - Perform the build of the image itself. With the cache primed we should only have to rebuild the necessary layers.
+# - Push the image tagged with the commit hash to the cache.
+
+set -euo pipefail
+
+REPOSITORY=$1
+DOCKERFILE=$(query_manifest dockerfile $REPOSITORY)
+PROJECT_DIR=$(query_manifest projectDir $REPOSITORY)
+
+echo "Repository: $REPOSITORY"
+echo "Working directory: $PWD"
+echo "Dockerfile: $DOCKERFILE"
+
+init_submodules $REPOSITORY
+
+function fetch_image() {
+  echo "Pulling: $1"
+  if ! docker pull $1 > /dev/null 2>&1; then
+    echo "Image not found: $1"
+    return 1
+  fi
+  return 0
+}
+
+# Ensure ECR repository exists.
+ensure_repo $REPOSITORY $ECR_REGION refresh_lifecycle
+
+LAST_SUCCESSFUL_COMMIT=$(last_successful_commit $REPOSITORY)
+echo "Last successful commit: $LAST_SUCCESSFUL_COMMIT"
+
+cd $(query_manifest buildDir $REPOSITORY)
+
+# If we have previously successful commit, we can early out if nothing relevant has changed since.
+if check_rebuild "$LAST_SUCCESSFUL_COMMIT" $REPOSITORY; then
+  echo "No rebuild necessary. Retagging..."
+  STAGES=$(cat $DOCKERFILE | sed -n -e 's/^FROM .* AS \(.*\)/\1/p')
+  for STAGE in $STAGES; do
+    tag_remote_image $REPOSITORY cache-$LAST_SUCCESSFUL_COMMIT-$STAGE cache-$COMMIT_HASH-$STAGE || true
+  done
+  tag_remote_image $REPOSITORY cache-$LAST_SUCCESSFUL_COMMIT cache-$COMMIT_HASH
+  untag_remote_image $REPOSITORY tainted
+  exit 0
+fi
+
+# Validate any terraform if it exists.
+if [ -d $ROOT_PATH/$PROJECT_DIR/terraform ]; then
+  ensure_terraform
+  export TF_IN_AUTOMATION=1
+  pushd $ROOT_PATH/$PROJECT_DIR/terraform
+  for DIR in . $(find . -maxdepth 1 -type d); do
+    pushd $DIR
+    if [ -f ./main.tf ]; then
+      terraform init -input=false -backend-config="key=dummy"
+      terraform validate
+    fi
+    popd
+  done
+  popd
+fi
+
+# Pull latest parents that are not ours. We also do not want to pull images suffixed by _, this is how we scope intermediate build images.
+echo "$DOCKERHUB_PASSWORD" | docker login -u aztecprotocolci --password-stdin
+PARENTS=$(cat $DOCKERFILE | sed -n -e 's/^FROM \([^[:space:]]\+\).*/\1/p' | sed '/_$/d' | grep -v $ECR_DEPLOY_URL  | sort | uniq)
+for PARENT in $PARENTS; do
+  fetch_image $PARENT
+done
+
+# For each parent that's ours, pull in the latest image.
+PARENTS=$(cat $DOCKERFILE | sed -n -e "s/^FROM $ECR_DEPLOY_URL\/\([^[:space:]]\+\).*/\1/p")
+for PARENT in $PARENTS; do
+  # Extract repository name (i.e. discard tag).
+  PARENT_REPO=${PARENT%:*}
+  PARENT_COMMIT_HASH=$(last_successful_commit $PARENT_REPO)
+  # There must be a parent image to continue.
+  if [ -z "$PARENT_COMMIT_HASH" ]; then
+    echo "No parent image found for $PARENT_REPO"
+    exit 1
+  fi
+  PARENT_IMAGE_URI=$ECR_URL/$PARENT_REPO:cache-$PARENT_COMMIT_HASH
+  echo "Pulling dependency $PARENT_REPO..."
+  fetch_image $PARENT_IMAGE_URI
+  # Tag it to look like an official release as that's what we use in Dockerfiles.
+  docker tag $PARENT_IMAGE_URI $ECR_DEPLOY_URL/$PARENT
+done
+
+# Pull, build and push each named stage to cache.
+STAGE_CACHE_FROM=""
+CACHE_FROM=""
+STAGES=$(cat $DOCKERFILE | sed -n -e 's/^FROM .* AS \(.*\)/\1/p')
+for STAGE in $STAGES; do
+  # Get the last build of this stage to leverage layer caching.
+  if [ -n "$LAST_SUCCESSFUL_COMMIT" ]; then
+    echo "Pulling stage: $STAGE"
+    STAGE_IMAGE_LAST_URI=$ECR_URL/$REPOSITORY:cache-$LAST_SUCCESSFUL_COMMIT-$STAGE
+    if fetch_image $STAGE_IMAGE_LAST_URI; then
+      STAGE_CACHE_FROM="--cache-from $STAGE_IMAGE_LAST_URI"
+    fi
+  fi
+
+  echo "Building stage: $STAGE"
+  STAGE_IMAGE_COMMIT_URI=$ECR_URL/$REPOSITORY:cache-$COMMIT_HASH-$STAGE
+  docker build --target $STAGE $STAGE_CACHE_FROM -t $STAGE_IMAGE_COMMIT_URI -f $DOCKERFILE --build-arg ARG_COMMIT_HASH=$COMMIT_HASH .
+
+  # We don't want to have redo this stages work when building the final image. Use it as a layer cache.
+  CACHE_FROM="--cache-from $STAGE_IMAGE_COMMIT_URI $CACHE_FROM"
+
+  echo "Pushing stage: $STAGE"
+  docker push $STAGE_IMAGE_COMMIT_URI > /dev/null 2>&1
+  echo
+done
+
+# Pull previous image to use it as a layer cache if it exists.
+if [ -n "$LAST_SUCCESSFUL_COMMIT" ]; then
+  LAST_SUCCESSFUL_URI=$ECR_URL/$REPOSITORY:cache-$LAST_SUCCESSFUL_COMMIT
+  echo "Pulling previous build of $REPOSITORY..."
+  fetch_image $LAST_SUCCESSFUL_URI || true
+  CACHE_FROM="--cache-from $LAST_SUCCESSFUL_URI $CACHE_FROM"
+  echo
+fi
+
+# Build the actual image and give it a commit tag.
+IMAGE_COMMIT_URI=$ECR_URL/$REPOSITORY:cache-$COMMIT_HASH
+echo "Building image: $IMAGE_COMMIT_URI"
+docker build -t $IMAGE_COMMIT_URI -f $DOCKERFILE $CACHE_FROM --build-arg COMMIT_TAG=$COMMIT_TAG --build-arg ARG_COMMIT_HASH=$COMMIT_HASH .
+echo "Pushing image: $IMAGE_COMMIT_URI"
+docker push $IMAGE_COMMIT_URI > /dev/null 2>&1
+untag_remote_image $REPOSITORY tainted