-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP]Add packer support for building AWS AMI #441
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments!
Do you know if we also can add a "print" when the instance is launched with some instructions, e.g. where to find examples, docs and how to run something. Like a welcome messages.
echo "Step: install-hugging-face-libraries" | ||
|
||
echo "Activating the virtual-env" | ||
source /opt/aws_neuron_venv_pytorch/bin/activate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be a big breaking change to the existing AMI. Where you can straight up use "python". Can we either activate it by default or not use a virtual env?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The base AMI let's us use the neuron package and other pre-installed packages only if we activate this environment. If we plan to not use a virtual env then we need to install everything just like the earlier repo.
Let me check if I can still access the neuron stuff atleast without activating the venv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And in that case we should use https://aws.amazon.com/releasenotes/aws-deep-learning-ami-base-neuron-ubuntu-20-04/ as mentioned in the documentation that
The Base AMI comes with a foundational platform of GPU drivers and acceleration libraries to deploy your own customized deep learning environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If those are only installed in venv
then we should defaulty activate those on instance start and ssh connection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, will add it in bashrc which should solve it in the final AMI
Hi @philschmid regarding this, I tried making a file called 99-custom-message in I also tried doing something like this
This works but the script is not seen as a script but it rather prints out everything. So if my
Then everything is printed as it is |
Please try look into what it is not shown. Having a message for the second ssh is not helpful. Have you tried searching for it?, e.g. https://askubuntu.com/questions/1394600/motd-not-showing-up-on-ubuntu-21-10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did another review. Please look at my comments there are a lot of things unclear to me. Also we want to use the existing Neuron AMI. We don't want to install any neuronx dependencies.
.github/workflows/build-ami.yml
Outdated
env: | ||
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }} | ||
SLACK_CHANNEL: ${{ secrets.SLACK_CHANNEL }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to create a channel and an app in order to set this up.
I was thinking of naming it optimum-neuron-ci-daily
@philschmid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you check with the infra team. They have some conventions for alerts, and they can create a channel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left two minor comments, but overall looks good!
script = "scripts/install-huggingface-libraries.sh" | ||
environment_vars = [ | ||
"TRANSFORMERS_VERSION=${var.transformers_version}", | ||
"OPTIMUM_NEURON_TAG=${var.optimum_neuron_tag}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets unify this an call it OPTIMUM_VERSION
} | ||
|
||
variable "source_ami" { | ||
default = "ami-0fbea04d7389bcd4e" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add a comment where/how to get new ami ids?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
infrastructure/ami/README.md
Outdated
## Folder Structure | ||
|
||
- [hcl2-files](./hcl2-files/) - Includes different files which are used by a Packer pipeline to build an AMI. The files are: | ||
- [build.pkr.hcl](./hcl2-files/build.pkr.hcl): contains the [build](https://developer.hashicorp.com/packer/docs/templates/hcl_templates/blocks/build) defines what builders are started, how to provision them using [provisioner](https://developer.hashicorp.com/packer/docs/templates/hcl_templates/blocks/build/provisioner) and if necessary what to do with their artifacts using `post-process`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- [build.pkr.hcl](./hcl2-files/build.pkr.hcl): contains the [build](https://developer.hashicorp.com/packer/docs/templates/hcl_templates/blocks/build) defines what builders are started, how to provision them using [provisioner](https://developer.hashicorp.com/packer/docs/templates/hcl_templates/blocks/build/provisioner) and if necessary what to do with their artifacts using `post-process`. | |
- [build.pkr.hcl](./hcl2-files/build.pkr.hcl): contains the [build](https://developer.hashicorp.com/packer/docs/templates/hcl_templates/blocks/build), defines what builders are started, how to provision them using [provisioner](https://developer.hashicorp.com/packer/docs/templates/hcl_templates/blocks/build/provisioner) and if necessary what to do with their artifacts using `post-process`. |
} | ||
|
||
variable "source_ami" { | ||
default = "ami-0fbea04d7389bcd4e" # To get latest AMI use: aws ec2 describe-images --region us-east-1 --owners amazon --filters 'Name=name,Values=Deep Learning AMI Neuron PyTorch 1.13 (Ubuntu 20.04) ????????' 'Name=state,Values=available' --query 'reverse(sort_by(Images, &CreationDate))[:1].ImageId' --output text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider splitting this comment on multiple lines: not all editors wrap lines automatically (mine doesn't)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay
|
||
variable "ami_users" { | ||
default = ["754289655784", "558105141721"] | ||
description = "AWS accounts to share AMI with" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure we want to share this on a public repository ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Umm, then we add it in secrets somewhere I guess.
@philschmid wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes AWS Account IDs are not sensitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While account IDs, like any identifying information, should be used and shared carefully, they are not considered secret, sensitive, or confidential information.
echo "OPTIMUM_VERSION: $OPTIMUM_VERSION" | ||
|
||
pip install --upgrade --no-cache-dir \ | ||
"transformers[sklearn,sentencepiece,vision]==$TRANSFORMERS_VERSION" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not letting optimum-neuron pull the right version ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this install the extras? i don't think thats possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it will install extras, https://github.com/huggingface/optimum-neuron/blob/main/setup.py#L16
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We require the transformers extras with sklearn,sentencepiece,vision
to make sure training for all model works correctly, e.g. models needing sentence piece or pillow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the pull-request !
The CI is failing: is this expected ? |
@dacorvo fails because of not having the aws account credentials yet in the repo. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets align the ref and then we are good.
.github/workflows/build-ami.yml
Outdated
@@ -56,13 +56,13 @@ jobs: | |||
|
|||
- name: Packer Validate | |||
id: validate | |||
run: packer validate -var "optimum_version=${{ github.event.inputs.tag }}" -var "region=${{ env.AWS_REGION }}" hcl2-files | |||
run: packer validate -var "optimum_version=${{ github.event.inputs.tag || github.event.repository.default_branch }}" -var "region=${{ env.AWS_REGION }}" hcl2-files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it here github.event.repository.default_branch
and not github.ref
? Wouldn't this mean that when we open a PR with changes, we would still install the main branch?
Lets align this. Otherwise i can be confusing for others when the need to makes changes for the ami and the main branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it would then install the main branch.
The workflow either gets triggered when someone makes changes to infrastructure/ami/**'
then in that case the optimum_version
would be main
and if it is triggered automatically using scheduler or manually through cli then it would either be main
or the tag
passed.
If we want to build this image at any particular change in ref repo, which makes sense then I need to remove the path: infrastructure/ami/**
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay let's keep it but add a comment that others understand why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@philschmid Added the comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So shall I go ahead and merge it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes feel free to merge
This PR aims to fix issue #433.