Skip to content

Commit

Permalink
Performing and documenting regular maintenance. (#7)
Browse files Browse the repository at this point in the history
This PR combines the following updates:
- update of dependencies used in the AMIs built by Packer.
- update of Node.js dependencies.

Changes in this PR have **not** been tested on the AWS account used by Envoy's CI. As discussed offline, testing and deployment will be done during our meeting. I have verified that Packer builds the AMIs and the instances can start on a separate AWS account.

Done here:
- updated to the latest AZP agent as per https://github.com/microsoft/azure-pipelines-agent/releases.
- Updated the source image for the AMIs to be Ubuntu Server 20.04 LTS. We cannot use 22.04 LTS yet due to hashicorp/packer#11733.
- Switched to binary images for `bazel-remote` instead of building it form source, since the build process now also requires `build-essential` due to addition of `cgo` in buchgr/bazel-remote#559.
- Updated Node.js dependencies.
- documented the update process.

Signed-off-by: Jakub Sobon <mumak@google.com>
  • Loading branch information
mum4k authored Nov 3, 2022
1 parent 89839d0 commit 1bc9389
Show file tree
Hide file tree
Showing 11 changed files with 302 additions and 51 deletions.
124 changes: 124 additions & 0 deletions MAINTENANCE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
# Regular maintenance

## Goals

The goals of the regular maintenance process are to update dependencies used by
the CI infrastructure, to pull in bug fixes and improvements and to ensure the
deployed infrastructure doesn't fall too far behind which would result in costly
updates.

## Overview

These are the steps taken when performing the regular maintenance:

1. Update binaries and dependencies used in the AMIs (Amazon Machine Images),
the disk images used to start the VMs that run the CI infrastructure.
[Packer](https://www.packer.io/)
is used to create these images, this step is referred to as the Packer update.
1. Update Node.js dependencies used by AWS Lambdas that perform cleanup tasks
like AMI de-registration.
1. Update the infrastructure using [Terraform](https://www.terraform.io/), so
that the VMs use the newly built images.

## Example update

See https://github.com/envoyproxy/ci-infra/pull/7 for an example of a PR that
performed this update.

## Packer update

All packer configuration files and scripts are in the [ami-build](ami-build/)
directory.

### Update the AZP agent version

Edit the [ami-build/agent-setup.sh](ami-build/agent-setup.sh) file and update
the `AGENT_VERSION` variable to the [latest released
version](https://github.com/microsoft/azure-pipelines-agent/releases) of the AZP
agent.

### Update the Ubuntu OS version

Packer is used to build two AMIs, one for x64 architecture (intel/amd) and one
for the arm64 architecture. The Packer configuration for these two AMIs is in
these files:

- [ami-build/azp-x64.json](ami-build/azp-x64.json)
- [ami-build/azp-arm64.json](ami-build/azp-arm64.json)

Refer to this
[howto](https://learn.hashicorp.com/tutorials/packer/aws-get-started-build-image?in=packer/aws-get-started)
for details on how to build AMIs with Packer. You can also review the
[documentation](https://www.packer.io/plugins/builders/amazon/ebs) for the
Amazon EBS Packer builder.

Edit each of the Packer configuration files and update the `name` under the
`source_ami_filter` to the latest LTS (long-term support) version of the Ubuntu
server image. This
[tutorial](https://ubuntu.com/tutorials/search-and-launch-ubuntu-22-04-in-aws-using-cli#2-search-for-the-right-ami)
outlines how to list images available in AWS.

### Update the bazel-remote version

Edit the
[ami-build/scripts/install-bazel-remote.sh](ami-build/scripts/install-bazel-remote.sh)
file and modify the target of the `wget` command to the latest released
`bazel-remote` version from https://github.com/buchgr/bazel-remote/tags.

### Build updated AMIs with Packer

Once the updates are performed, build and push the new AMIs to AWS by running:

- `packer build azp-x64.json`.
- `packer build azp-arm64.json`.

Note that this step should be done shortly before updating the infrastructure
using Terraform, since the `azp-dereg-lambda` runs daily and removes all but
the latest AMI. If the infrastructure isn't updated to use the latest AMI, the
lambda may delete an AMI that is in use.

## Node.js dependencies update

The directories
[instances/azp-cleanup-snapshots](instances/azp-cleanup-snapshots) and
[instances/azp-dereg-lambda](instances/azp-dereg-lambda) contain two [AWS
Lambdas](https://docs.aws.amazon.com/lambda/latest/dg/welcome.html) written in
Node.js.

To update the dependencies, first make sure you have
[npm-check-updates](https://www.npmjs.com/package/npm-check-updates) installed.

Then go to each of the two directories and run `ncu -u`.

## Terraform update

### Build Node.js zip files

Go to directories
[instances/azp-cleanup-snapshots](instances/azp-cleanup-snapshots) and
[instances/azp-dereg-lambda](instances/azp-dereg-lambda) and run:

- `npm run build`

This will produce two zip files in the [instances](instances) directory that
will be used by Terraform.

### Apply terraform configs

You can refer to [this
documentation](https://learn.hashicorp.com/tutorials/terraform/aws-build?in=terraform/aws-get-started)
for details on how to manage AWS infrastructure using Terraform.

First get the AZP token used between AZP and the CI Agent:

- `export TF_VAR_azp_token=$(aws s3 cp s3://cncf-envoy-token/azp_token -)`


Then run the Terraform update step. This should only be done after the PR is
reviewed and approved. In short, execute:

- `terraform init` - to initialize the local Terraform installation.
- `terraform fmt` - to format any Terraform configuration files that were
modified.
- `terraform apply` - to update the AWS infrastructure applying local changes
and switching to the new AMIs.
5 changes: 5 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,8 @@ The general idea is:

- A Lambda watches the EC2 CloudWatch instance termination event,
and properly deregisters the agent from AZP.

# Regulare maintenance

The regular maintenance process is documented in
[MAINTENANCE.md](MAINTENANCE.md).
2 changes: 1 addition & 1 deletion ami-build/agent-setup.sh
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ sudo mkdir -p /srv/azure-pipelines
sudo chown -R azure-pipelines:azure-pipelines /srv/azure-pipelines/

[[ "${ARCH}" == "amd64" ]] && ARCH=x64
AGENT_VERSION=2.185.1
AGENT_VERSION=2.211.0
AGENT_FILE=vsts-agent-linux-${ARCH}-${AGENT_VERSION}

sudo -u azure-pipelines /bin/bash -c "wget -q -O - https://vstsagentpackage.azureedge.net/agent/${AGENT_VERSION}/${AGENT_FILE}.tar.gz | tar zx -C /srv/azure-pipelines"
Expand Down
2 changes: 1 addition & 1 deletion ami-build/azp-arm64.json
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
"source_ami_filter": {
"filters": {
"virtualization-type": "hvm",
"name": "ubuntu/images/*ubuntu-bionic-18.04-arm64-server-*",
"name": "ubuntu/images/hvm-ssd/ubuntu-focal-20.04-arm64-server-*",
"root-device-type": "ebs"
},
"owners": [
Expand Down
2 changes: 1 addition & 1 deletion ami-build/azp-x64.json
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
"source_ami_filter": {
"filters": {
"virtualization-type": "hvm",
"name": "ubuntu/images/*ubuntu-bionic-18.04-amd64-server-*",
"name": "ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-*",
"root-device-type": "ebs"
},
"owners": [
Expand Down
17 changes: 4 additions & 13 deletions ami-build/scripts/install-bazel-remote.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,8 @@

set -eu -o pipefail

export GOARCH=$(dpkg --print-architecture)
export ARCH=$(dpkg --print-architecture)
[[ "${ARCH}" == "amd64" ]] && ARCH="x86_64"

BUILD_TMP="$(mktemp -d)"
trap "chmod +w -R ${BUILD_TMP} && rm -rf ${BUILD_TMP}" EXIT

cd "${BUILD_TMP}"

curl -fsSL https://golang.org/dl/go1.16.3.linux-${GOARCH}.tar.gz | tar zx
export GOPATH="${BUILD_TMP}/gopath"

git clone https://github.com/buchgr/bazel-remote
cd bazel-remote
PATH="${BUILD_TMP}/go/bin:${PATH}" ./linux-build.sh
sudo cp ./bazel-remote /usr/local/bin/bazel-remote
sudo wget -O /usr/local/bin/bazel-remote https://github.com/buchgr/bazel-remote/releases/download/v2.3.9/bazel-remote-2.3.9-linux-${ARCH}
sudo chmod 0755 /usr/local/bin/bazel-remote
2 changes: 1 addition & 1 deletion instances/azp-build-asg/init.sh.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ Type=simple
Restart=always
RestartSec=1
User=bazel-remote
ExecStart=/usr/local/bin/bazel-remote --experimental_remote_asset_api --s3.endpoint s3.$AWS_DEFAULT_REGION.amazonaws.com --s3.bucket ${bazel_cache_bucket} --s3.prefix ${cache_prefix} --s3.iam_role_endpoint http://169.254.169.254 --max_size 30 --dir /dev/shm/bazel-remote-cache
ExecStart=/usr/local/bin/bazel-remote --experimental_remote_asset_api --s3.endpoint s3.$AWS_DEFAULT_REGION.amazonaws.com --s3.bucket ${bazel_cache_bucket} --s3.prefix ${cache_prefix} --s3.iam_role_endpoint http://169.254.169.254 --s3.auth_method=iam_role --max_size 30 --dir /dev/shm/bazel-remote-cache
[Install]
WantedBy=multi-user.target
Expand Down
26 changes: 18 additions & 8 deletions instances/azp-cleanup-snapshots/package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion instances/azp-cleanup-snapshots/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,6 @@
},
"license": "MIT",
"devDependencies": {
"prettier": "^2.0.5"
"prettier": "^2.7.1"
}
}
Loading

0 comments on commit 1bc9389

Please sign in to comment.