Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP]Add packer support for building AWS AMI #441

Merged
merged 21 commits into from
Feb 3, 2024
Merged
Show file tree
Hide file tree
Changes from 20 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
78 changes: 78 additions & 0 deletions .github/workflows/build-ami.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
name: Build AWS Neuron AMI

on:
push:
branches:
- main
paths:
- 'infrastructure/ami/**'
pull_request:
branches:
- main
paths:
- 'infrastructure/ami/**'
workflow_dispatch:
inputs:
tag:
description: 'Tag to use for the AMI build'
default: 'main'
schedule:
# Schedule the workflow to run every second day at midnight UTC
- cron: '0 0 */2 * *'

jobs:
build-ami:
defaults:
run:
working-directory: infrastructure/ami
runs-on: ubuntu-latest
env:
AWS_REGION: us-east-1
steps:
- name: Checkout
uses: actions/checkout@v3
with:
ref: ${{ github.event.inputs.tag || github.ref }}

- name: Setup Packer
uses: hashicorp/setup-packer@main

- name: configure aws credentials
uses: aws-actions/configure-aws-credentials@v1
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID_BUILD_AMI }}
aws-secret-access-key: ${{ secrets.AWS_ACCESS_KEY_SECRET_BUILD_AMI }}
aws-region: ${{ env.AWS_REGION }}

- name: Packer format
id: format
run: packer fmt hcl2-files
continue-on-error: true

- name: Packer Init
id: init
run: packer init hcl2-files
continue-on-error: true

- name: Packer Validate
id: validate
run: packer validate -var "optimum_version=${{ github.event.inputs.tag || github.event.repository.default_branch }}" -var "region=${{ env.AWS_REGION }}" hcl2-files
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it here github.event.repository.default_branch and not github.ref? Wouldn't this mean that when we open a PR with changes, we would still install the main branch?

Lets align this. Otherwise i can be confusing for others when the need to makes changes for the ami and the main branch.

Copy link
Contributor Author

@shub-kris shub-kris Feb 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it would then install the main branch.

The workflow either gets triggered when someone makes changes to infrastructure/ami/**' then in that case the optimum_version would be main and if it is triggered automatically using scheduler or manually through cli then it would either be main or the tag passed.

If we want to build this image at any particular change in ref repo, which makes sense then I need to remove the path: infrastructure/ami/**

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay let's keep it but add a comment that others understand why.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@philschmid Added the comments

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So shall I go ahead and merge it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes feel free to merge

continue-on-error: true

- name: Packer Build
id: build
run: |
packer build -var "optimum_version=${{ github.event.inputs.tag || github.event.repository.default_branch }}" -var "region=${{ env.AWS_REGION }}" hcl2-files

- name: Slack Notification on Failure
id: slack
uses: slackapi/slack-github-action@v1.25.0
if: ${{ failure() && github.event_name == 'schedule' }}
with:
channel-id: 'C06GAEQJLNN' #copied from slack channel
payload: |
{
"text": "GitHub Action HuggingFace Neuron AMI Build result: ${{job.status}}"
}
env:
SLACK_BOT_TOKEN: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
78 changes: 78 additions & 0 deletions infrastructure/ami/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# Building AMI with Packer

This directory contains the files for building AMI using [Packer](https://github.com/hashicorp/packer) that is later published as a AWS Marketplace asset.


## Folder Structure

- [hcl2-files](./hcl2-files/) - Includes different files which are used by a Packer pipeline to build an AMI. The files are:
- [build.pkr.hcl](./hcl2-files/build.pkr.hcl): contains the [build](https://developer.hashicorp.com/packer/docs/templates/hcl_templates/blocks/build) block, defining the builders to start, provisioning them using [provisioner](https://developer.hashicorp.com/packer/docs/templates/hcl_templates/blocks/build/provisioner), and specifying actions to take with the built artifacts using `post-process`.
- [variables.pkr.hcl](./hcl2-files/variables.pkr.hcl): contains the [variables](https://developer.hashicorp.com/packer/docs/templates/hcl_templates/blocks/variable) block, defining variables within your Packer configuration.
- [sources.pkr.hcl](./hcl2-files/sources.pkr.hcl): contains the [source](https://developer.hashicorp.com/packer/docs/templates/hcl_templates/blocks/source) block, defining reusable builder configuration blocks.
- [packer.pkr.hcl](./hcl2-files/packer.pkr.hcl): contains the [packer](https://developer.hashicorp.com/packer/docs/templates/hcl_templates/blocks/packer) block, used to configure some behaviors of Packer itself, such as the minimum required Packer version needed to apply to your configuration.
- [scripts](./scripts): contains scripts used by [provisioner](https://developer.hashicorp.com/packer/docs/templates/hcl_templates/blocks/build/provisioner) for installing additonal packages/softwares.


### Prerequisites
- [Packer](https://developer.hashicorp.com/packer/docs/intro): Packer is an open source tool for creating identical machine images for multiple platforms from a single source configuration.

- AWS Credentials: You need to have AWS credentials configured on your machine. You can configure AWS credentials using [AWS CLI](https://github.com/aws/aws-cli) or by setting environment variables.

#### Install Packer on Ubuntu/Debian
```bash
curl -fsSL https://apt.releases.hashicorp.com/gpg | sudo apt-key add -
sudo apt-add-repository "deb [arch=amd64] https://apt.releases.hashicorp.com $(lsb_release -cs) main"
sudo apt-get update && sudo apt-get install packer
```

You can also install Packer for other OS from [here](https://developer.hashicorp.com/packer/tutorials/docker-get-started/get-started-install-cli).

#### Configure AWS Credentials

Using Environment Variables:
```bash
export AWS_ACCESS_KEY_ID=<access_key>
export AWS_SECRET_ACCESS_KEY=<secret_key>
```

Using AWS CLI:
```bash
aws configure sso
```

There are other ways to configure AWS credentials. You can read more about it [here](https://github.com/aws/aws-cli?tab=readme-ov-file#configuration).

### Build AMI

#### Format Packer blocks
You can format your HCL2 files locally. This command will update your files in place.

Format a single file:
```bash
packer fmt build.pkr.hcl
```

Format all files in a directory:
```bash
packer fmt ./hcl2-files
```

#### Validate Packer blocks
You can validate the syntax and configuration of your files locally. This command will return a zero exit status on success, and a non-zero exit status on failure.

```bash
packer validate -var 'region=us-west-2' -var 'optimum_version=v0.0.17' ./hcl2-files
```

#### Run Packer build
You can run Packer locally. This command will build the AMI and upload it to AWS.

You need to set variables with no default values using `-var` flag. For example:
```bash
packer build -var 'region=us-west-2' -var 'optimum_version=v0.0.17' ./hcl2-files
```

To trigger a github action workflow manually, you can use GitHub CLI:
```bash
gh workflow run build-ami.yml -f tag=<tag>
```
29 changes: 29 additions & 0 deletions infrastructure/ami/hcl2-files/build.pkr.hcl
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
build {
name = "build-hf-dl-neuron"
sources = [
"source.amazon-ebs.ubuntu"
]
provisioner "shell" {
script = "scripts/validate-neuron.sh"
}
provisioner "shell" {
script = "scripts/install-huggingface-libraries.sh"
environment_vars = [
"TRANSFORMERS_VERSION=${var.transformers_version}",
"OPTIMUM_VERSION=${var.optimum_version}",
]
}
provisioner "shell" {
inline = ["echo 'source /opt/aws_neuron_venv_pytorch/bin/activate' >> /home/ubuntu/.bashrc"]
}
provisioner "file" {
source = "scripts/welcome-msg.sh"
destination = "/tmp/99-custom-message"
}
provisioner "shell" {
inline = [
"sudo mv /tmp/99-custom-message /etc/update-motd.d/",
"sudo chmod +x /etc/update-motd.d/99-custom-message",
]
}
}
8 changes: 8 additions & 0 deletions infrastructure/ami/hcl2-files/packer.pkr.hcl
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
packer {
required_plugins {
amazon = {
version = ">= 1.2.8"
source = "github.com/hashicorp/amazon"
}
}
}
15 changes: 15 additions & 0 deletions infrastructure/ami/hcl2-files/sources.pkr.hcl
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
source "amazon-ebs" "ubuntu" {
philschmid marked this conversation as resolved.
Show resolved Hide resolved
ami_name = "huggingface-neuron-{{isotime \"2006-01-02T15-04-05Z\"}}"
instance_type = var.instance_type
region = var.region
source_ami = var.source_ami
ssh_username = var.ssh_username
launch_block_device_mappings {
device_name = "/dev/sda1"
volume_size = 512
volume_type = "gp2"
delete_on_termination = true
}
ami_users = var.ami_users
ami_regions = var.ami_regions
}
54 changes: 54 additions & 0 deletions infrastructure/ami/hcl2-files/variables.pkr.hcl
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
variable "region" {
description = "The AWS region"
type = string
}

variable "instance_type" {
default = "trn1.2xlarge"
description = "EC2 machine type for building AMI"
type = string
}

variable "source_ami" {
default = "ami-0fbea04d7389bcd4e"
description = "Base Image"
type = string
/*
To get latest value, run the following command:
aws ec2 describe-images \
--region us-east-1 \
--owners amazon \
--filters 'Name=name,Values=Deep Learning AMI Neuron PyTorch 1.13 (Ubuntu 20.04) ????????' 'Name=state,Values=available' \
--query 'reverse(sort_by(Images, &CreationDate))[:1].ImageId' \
--output text
*/
}

variable "ssh_username" {
default = "ubuntu"
description = "Username to connect to SSH with"
type = string
}

variable "optimum_version" {
description = "Optimum Neuron version to install"
type = string
}

variable "transformers_version" {
default = "4.36.2"
description = "Transformers version to install"
type = string
}

variable "ami_users" {
default = ["754289655784", "558105141721"]
description = "AWS accounts to share AMI with"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure we want to share this on a public repository ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Umm, then we add it in secrets somewhere I guess.
@philschmid wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes AWS Account IDs are not sensitive.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While account IDs, like any identifying information, should be used and shared carefully, they are not considered secret, sensitive, or confidential information.

https://docs.aws.amazon.com/accounts/latest/reference/manage-acct-identifiers.html#:~:text=While%20account%20IDs%2C%20like%20any,Canonical%20user%20ID

type = list(string)
}

variable "ami_regions" {
default = ["eu-west-1"]
description = "AWS regions to share AMI with"
type = list(string)
}
39 changes: 39 additions & 0 deletions infrastructure/ami/scripts/install-huggingface-libraries.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
#!/bin/bash

# Activate the neuron virtual environment
source /opt/aws_neuron_venv_pytorch/bin/activate

echo "Step: install-hugging-face-libraries"

echo "TRANSFORMERS_VERSION: $TRANSFORMERS_VERSION"
echo "OPTIMUM_VERSION: $OPTIMUM_VERSION"

pip install --upgrade --no-cache-dir \
"transformers[sklearn,sentencepiece,vision]==$TRANSFORMERS_VERSION" \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not letting optimum-neuron pull the right version ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this install the extras? i don't think thats possible.

Copy link
Contributor Author

@shub-kris shub-kris Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We require the transformers extras with sklearn,sentencepiece,vision to make sure training for all model works correctly, e.g. models needing sentence piece or pillow.

"datasets==2.16.1" \
"accelerate==0.23.0" \
"diffusers==0.25.0" \
"evaluate==0.4.1" \
"requests==2.31.0" \
"notebook==7.0.6" \
"markupsafe==2.1.1" \
"jinja2==3.1.2" \
"attrs==23.1.0"

echo 'export PATH="${HOME}/.local/bin:$PATH"' >> "${HOME}/.bashrc"

echo "Step: install-and-copy-optimum-neuron-examples"
git clone -b $OPTIMUM_VERSION https://github.com/huggingface/optimum-neuron.git

cd optimum-neuron
python setup.py install
cd ..

mkdir /home/ubuntu/huggingface-neuron-samples/ /home/ubuntu/huggingface-neuron-notebooks/
mv optimum-neuron/examples/* /home/ubuntu/huggingface-neuron-samples/
mv optimum-neuron/notebooks/* /home/ubuntu/huggingface-neuron-notebooks/
rm -rf optimum-neuron
chmod -R 777 /home/ubuntu/huggingface-neuron-samples /home/ubuntu/huggingface-neuron-notebooks

echo "Step: validate-imports-of-huggingface-libraries"
bash -c 'python -c "import transformers;import datasets;import accelerate;import evaluate;import tensorboard; import torch;"'
13 changes: 13 additions & 0 deletions infrastructure/ami/scripts/validate-neuron.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
#!/bin/bash
echo "Step: validate-neuron-devices"
neuron-ls

# Activate the neuron virtual environment
source /opt/aws_neuron_venv_pytorch/bin/activate

python -c 'import torch'
python -c 'import torch_neuronx'

echo "Installing Tensorboard Plugin for Neuron"
pip install --upgrade --no-cache-dir \
"tensorboard-plugin-neuronx"
11 changes: 11 additions & 0 deletions infrastructure/ami/scripts/welcome-msg.sh
philschmid marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#!/bin/bash
printf "=============================================================================\n"
printf " __| __|_ )\n"
printf " _| ( / HuggingFace Deep Learning Neuron AMI (Ubuntu 20.04)\n"
printf " ___|\___|___|\n"
printf "=============================================================================\n"
printf "Welcome to the HuggingFace Deep Learning Neuron AMI (Ubuntu 20.04)\n"
printf "* Examples: /home/ubuntu/huggingface-neuron-samples \n"
printf "* Notebooks: /home/ubuntu/huggingface-neuron-notebooks \n"
printf "* Documentation: https://huggingface.co/docs/optimum-neuron/ \n"
printf "=============================================================================\n"
Loading