Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP]Add packer support for building AWS AMI #441

Merged
merged 21 commits into from
Feb 3, 2024

Conversation

shub-kris
Copy link
Contributor

@shub-kris shub-kris commented Jan 25, 2024

This PR aims to fix issue #433.

@shub-kris shub-kris linked an issue Jan 25, 2024 that may be closed by this pull request
6 tasks
@shub-kris shub-kris changed the title Add packer support for building AWS AMI [WIP]Add packer support for building AWS AMI Jan 25, 2024
Copy link
Member

@philschmid philschmid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments!

Do you know if we also can add a "print" when the instance is launched with some instructions, e.g. where to find examples, docs and how to run something. Like a welcome messages.

.github/workflows/build-ami.yml Outdated Show resolved Hide resolved
.github/workflows/build-ami.yml Outdated Show resolved Hide resolved
aws-ami-creation/README.md Outdated Show resolved Hide resolved
aws-ami-creation/README.md Outdated Show resolved Hide resolved
aws-ami-creation/README.md Outdated Show resolved Hide resolved
aws-ami-creation/README.md Outdated Show resolved Hide resolved
aws-ami-creation/hcl2-files/sources.pkr.hcl Outdated Show resolved Hide resolved
aws-ami-creation/hcl2-files/variables.pkr.hcl Outdated Show resolved Hide resolved
aws-ami-creation/scripts/install_huggingface_libraries.sh Outdated Show resolved Hide resolved
echo "Step: install-hugging-face-libraries"

echo "Activating the virtual-env"
source /opt/aws_neuron_venv_pytorch/bin/activate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be a big breaking change to the existing AMI. Where you can straight up use "python". Can we either activate it by default or not use a virtual env?

Copy link
Contributor Author

@shub-kris shub-kris Jan 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The base AMI let's us use the neuron package and other pre-installed packages only if we activate this environment. If we plan to not use a virtual env then we need to install everything just like the earlier repo.
Let me check if I can still access the neuron stuff atleast without activating the venv

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And in that case we should use https://aws.amazon.com/releasenotes/aws-deep-learning-ami-base-neuron-ubuntu-20-04/ as mentioned in the documentation that

The Base AMI comes with a foundational platform of GPU drivers and acceleration libraries to deploy your own customized deep learning environment. 

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If those are only installed in venv then we should defaulty activate those on instance start and ssh connection.

Copy link
Contributor Author

@shub-kris shub-kris Jan 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, will add it in bashrc which should solve it in the final AMI

@shub-kris
Copy link
Contributor Author

Left some comments!

Do you know if we also can add a "print" when the instance is launched with some instructions, e.g. where to find examples, docs and how to run something. Like a welcome messages.

Hi @philschmid regarding this, I tried making a file called 99-custom-message in /etc/update-motd.d, it doesn't work when ssshing for the first time but works in the second time and I have no clue why?

I also tried doing something like this

  provisioner "shell" {
    inline = [
      "sudo mv /tmp/welcome.sh /etc/ssh/welcome.sh",
      "sudo chmod +x /etc/ssh/welcome.sh",
      "sudo sed -i 's/#Banner/Banner/' /etc/ssh/sshd_config",
      "sudo sed -i '/^Banner/c\\Banner /etc/ssh/welcome.sh' /etc/ssh/sshd_config",
      "sudo systemctl restart sshd"
    ]
  }

This works but the script is not seen as a script but it rather prints out everything.

So if my /tmp/welcome.sh was like this

#!/bin/bash

"Welcome to your instance!"
"You can find examples and documentation at <link>"
"To run something, use <command>"

Then everything is printed as it is

@philschmid
Copy link
Member

Left some comments!
Do you know if we also can add a "print" when the instance is launched with some instructions, e.g. where to find examples, docs and how to run something. Like a welcome messages.

Hi @philschmid regarding this, I tried making a file called 99-custom-message in /etc/update-motd.d, it doesn't work when ssshing for the first time but works in the second time and I have no clue why?

I also tried doing something like this

  provisioner "shell" {
    inline = [
      "sudo mv /tmp/welcome.sh /etc/ssh/welcome.sh",
      "sudo chmod +x /etc/ssh/welcome.sh",
      "sudo sed -i 's/#Banner/Banner/' /etc/ssh/sshd_config",
      "sudo sed -i '/^Banner/c\\Banner /etc/ssh/welcome.sh' /etc/ssh/sshd_config",
      "sudo systemctl restart sshd"
    ]
  }

This works but the script is not seen as a script but it rather prints out everything.

So if my /tmp/welcome.sh was like this

#!/bin/bash

"Welcome to your instance!"
"You can find examples and documentation at <link>"
"To run something, use <command>"

Then everything is printed as it is

Please try look into what it is not shown. Having a message for the second ssh is not helpful. Have you tried searching for it?, e.g. https://askubuntu.com/questions/1394600/motd-not-showing-up-on-ubuntu-21-10
Or does the base AMI already have something which can lead to issues etc. It should work since i used other ubuntu based ami in the past which had this.

Copy link
Member

@philschmid philschmid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did another review. Please look at my comments there are a lot of things unclear to me. Also we want to use the existing Neuron AMI. We don't want to install any neuronx dependencies.

.github/workflows/build-ami.yml Outdated Show resolved Hide resolved
.github/workflows/build-ami.yml Outdated Show resolved Hide resolved
.github/workflows/build-ami.yml Outdated Show resolved Hide resolved
.github/workflows/build-ami.yml Outdated Show resolved Hide resolved
.github/workflows/build-ami.yml Outdated Show resolved Hide resolved
infrastructure/ami/scripts/setup-neuron.sh Outdated Show resolved Hide resolved
infrastructure/ami/scripts/setup-neuron.sh Outdated Show resolved Hide resolved
infrastructure/ami/scripts/setup-neuron.sh Outdated Show resolved Hide resolved
infrastructure/ami/scripts/setup-neuron.sh Outdated Show resolved Hide resolved
infrastructure/ami/scripts/setup-neuron.sh Outdated Show resolved Hide resolved
env:
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to create a channel and an app in order to set this up.
I was thinking of naming it optimum-neuron-ci-daily
@philschmid

Copy link
Member

@philschmid philschmid Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you check with the infra team. They have some conventions for alerts, and they can create a channel

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay

Copy link
Member

@philschmid philschmid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left two minor comments, but overall looks good!

script = "scripts/install-huggingface-libraries.sh"
environment_vars = [
"TRANSFORMERS_VERSION=${var.transformers_version}",
"OPTIMUM_NEURON_TAG=${var.optimum_neuron_tag}",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets unify this an call it OPTIMUM_VERSION

}

variable "source_ami" {
default = "ami-0fbea04d7389bcd4e"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a comment where/how to get new ami ids?

infrastructure/ami/scripts/welcome-msg.sh Outdated Show resolved Hide resolved
Copy link
Member

@philschmid philschmid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

## Folder Structure

- [hcl2-files](./hcl2-files/) - Includes different files which are used by a Packer pipeline to build an AMI. The files are:
- [build.pkr.hcl](./hcl2-files/build.pkr.hcl): contains the [build](https://developer.hashicorp.com/packer/docs/templates/hcl_templates/blocks/build) defines what builders are started, how to provision them using [provisioner](https://developer.hashicorp.com/packer/docs/templates/hcl_templates/blocks/build/provisioner) and if necessary what to do with their artifacts using `post-process`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- [build.pkr.hcl](./hcl2-files/build.pkr.hcl): contains the [build](https://developer.hashicorp.com/packer/docs/templates/hcl_templates/blocks/build) defines what builders are started, how to provision them using [provisioner](https://developer.hashicorp.com/packer/docs/templates/hcl_templates/blocks/build/provisioner) and if necessary what to do with their artifacts using `post-process`.
- [build.pkr.hcl](./hcl2-files/build.pkr.hcl): contains the [build](https://developer.hashicorp.com/packer/docs/templates/hcl_templates/blocks/build), defines what builders are started, how to provision them using [provisioner](https://developer.hashicorp.com/packer/docs/templates/hcl_templates/blocks/build/provisioner) and if necessary what to do with their artifacts using `post-process`.

}

variable "source_ami" {
default = "ami-0fbea04d7389bcd4e" # To get latest AMI use: aws ec2 describe-images --region us-east-1 --owners amazon --filters 'Name=name,Values=Deep Learning AMI Neuron PyTorch 1.13 (Ubuntu 20.04) ????????' 'Name=state,Values=available' --query 'reverse(sort_by(Images, &CreationDate))[:1].ImageId' --output text
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider splitting this comment on multiple lines: not all editors wrap lines automatically (mine doesn't)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay


variable "ami_users" {
default = ["754289655784", "558105141721"]
description = "AWS accounts to share AMI with"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure we want to share this on a public repository ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Umm, then we add it in secrets somewhere I guess.
@philschmid wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes AWS Account IDs are not sensitive.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While account IDs, like any identifying information, should be used and shared carefully, they are not considered secret, sensitive, or confidential information.

https://docs.aws.amazon.com/accounts/latest/reference/manage-acct-identifiers.html#:~:text=While%20account%20IDs%2C%20like%20any,Canonical%20user%20ID

echo "OPTIMUM_VERSION: $OPTIMUM_VERSION"

pip install --upgrade --no-cache-dir \
"transformers[sklearn,sentencepiece,vision]==$TRANSFORMERS_VERSION" \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not letting optimum-neuron pull the right version ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this install the extras? i don't think thats possible.

Copy link
Contributor Author

@shub-kris shub-kris Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We require the transformers extras with sklearn,sentencepiece,vision to make sure training for all model works correctly, e.g. models needing sentence piece or pillow.

Copy link
Collaborator

@dacorvo dacorvo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the pull-request !

@dacorvo
Copy link
Collaborator

dacorvo commented Feb 1, 2024

The CI is failing: is this expected ?

@shub-kris
Copy link
Contributor Author

The CI is failing: is this expected ?

@dacorvo fails because of not having the aws account credentials yet in the repo.

Copy link
Member

@philschmid philschmid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets align the ref and then we are good.

@@ -56,13 +56,13 @@ jobs:

- name: Packer Validate
id: validate
run: packer validate -var "optimum_version=${{ github.event.inputs.tag }}" -var "region=${{ env.AWS_REGION }}" hcl2-files
run: packer validate -var "optimum_version=${{ github.event.inputs.tag || github.event.repository.default_branch }}" -var "region=${{ env.AWS_REGION }}" hcl2-files
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it here github.event.repository.default_branch and not github.ref? Wouldn't this mean that when we open a PR with changes, we would still install the main branch?

Lets align this. Otherwise i can be confusing for others when the need to makes changes for the ami and the main branch.

Copy link
Contributor Author

@shub-kris shub-kris Feb 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it would then install the main branch.

The workflow either gets triggered when someone makes changes to infrastructure/ami/**' then in that case the optimum_version would be main and if it is triggered automatically using scheduler or manually through cli then it would either be main or the tag passed.

If we want to build this image at any particular change in ref repo, which makes sense then I need to remove the path: infrastructure/ami/**

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay let's keep it but add a comment that others understand why.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@philschmid Added the comments

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So shall I go ahead and merge it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes feel free to merge

@shub-kris shub-kris merged commit 7c7d508 into main Feb 3, 2024
1 check passed
@shub-kris shub-kris deleted the feature/aws-neuron-internal-migrate branch February 3, 2024 12:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrate Hugging Face DLAMI into optimum neuron
4 participants