Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Added custom AMI support for managed node groups #1473

Merged
merged 13 commits into from
Sep 8, 2021
Merged

feat: Added custom AMI support for managed node groups #1473

merged 13 commits into from
Sep 8, 2021

Conversation

AndriiBarabash
Copy link
Contributor

@AndriiBarabash AndriiBarabash commented Jul 2, 2021

PR o'clock

Description

Added support for custom AMI inside Managed Node Groups to keep up with AWS feature - https://aws.amazon.com/blogs/containers/introducing-launch-template-and-custom-ami-support-in-amazon-eks-managed-node-groups/.

Please note, that for the correct work of custom AMI the user should supply a bootstrap script to the machine. It can be done via user data or built-in inside AMI itself.

Changes:

  • Added support for custom AMI inside Managed Node Groups.
  • Updated Readme to reflect the changes.
  • Updated docs via terraform-docs in pre-commit since they was outdated

Checklist

@jjhidalgar
Copy link
Contributor

jjhidalgar commented Jul 2, 2021

Help me understand it. When you don't specify a custom AMI, then EKS merges your template with something else that calls bootstrap.sh, but if you do specify a custom AMI it doesn't, right? That is why you add /etc/eks/bootstrap.sh ${cluster_name} ?

It that case, shouldn't that line be conditionally added only if image_id is present?

(Also, if it is not really required, I'd prefer not to make changes that trigger changes to existing clusters, so adding something like %{if image_id} ..... %{ endif } could be a good idea, although, not sure, maybe "improving the template" is better than doing tricky conditionals, in any case, this is not the point of my question in this specific case)

@AndriiBarabash
Copy link
Contributor Author

Ok, I'll try to explain. About bootstrapping managed worker node with default AMI from AWS blog:

Note that user data is used by EKS to insert the EKS bootstrap invocation for your managed nodes. EKS will automatically merge this in for you, unless a custom AMI is specified. In that case, you’ll need to add that in.

So as we can see, if we are going to use custom AMI in Managed Node Group, then provisioning of the bootstrap script and execution of it are solely on us. Before my PR there was no invocation of bootstrap.sh, but the code contained a line that substitutes kubelet_extra_args inside bootstrap.sh script, which is weird because the script will never be called.

I got your idea, thank you. It makes perfect sense to call bootstrap.sh only if custom AMI is provided. This way we will not ruin AWS merge-in logic with default AMI. I will make changes to the code, thank you.

@AndriiBarabash
Copy link
Contributor Author

I added bootstrap execution on condition - only when custom AMI is specified. Sorry for the delay, review will be much appreciated.

@AndriiBarabash
Copy link
Contributor Author

@barryib @jaimehrubiks may I ask you for a review, please?

@jjhidalgar
Copy link
Contributor

jjhidalgar commented Jul 16, 2021

I am not a maintainer, although I may test it in a near future. I do have another question though. With this approach, I guess you are expecting that your custom AMI does have a file called /etc/eks/bootstrap.sh don't you? I wonder if it could be better to read the content of that file from the original AMI, and then put it between those IFs.

@AndriiBarabash
Copy link
Contributor Author

@jaimehrubiks, yes, you are right, at this moment I expect that the custom AMI must have a bootstrap.sh file to join the cluster. I don't want to invent a bicycle and I'd like to stick to the AWS standards as they noted in their blog:

Through the support of custom AMIs in managed node groups, EKS can now provision and managed the lifecycle of your Kubernetes nodes running any OS, in nearly any configuration. Again, it’s your responsibility to ensure that the required software is in place for your node to join an EKS cluster and fulfill its duties.
...
There is one additional requirement, in that you need to supply the EKS bootstrap invocation in your launch template user data. Since using a custom AMI is a full customization use case, EKS will not merge its default bootstrap information into your user data as discussed above. You are free to fully configure your OS environment, and providing the bootstrap is now your responsibility.

So I think it's better to leave the bootstrap script provisioning for a customer, since the definition of the custom AMI itself means that the AMI should be created and it can be customized as one needs. We just give guidelines, such as "Provide the script in the /etc/eks/bootstrap.sh file", but the content of the file is on them.

@AndriiBarabash
Copy link
Contributor Author

@barryib @jaimehrubiks hey guys, could I have a review please?

@AndriiBarabash
Copy link
Contributor Author

@barryib @jaimehrubiks @antonbabenko guys, I'm sorry for pushing, but can we get it going? In any direction at least 😅
It's been almost two months...

@mvoitko
Copy link

mvoitko commented Aug 25, 2021

Great job! Could any of maintainers review this PR?

@AndriiBarabash
Copy link
Contributor Author

@mvoitko thank you for approving! I see that one of a jobs failed, I'll fix it and hopefully we merge this.

@audriusb
Copy link

Oh i need this.. Can i have it now? Pretty please :)

@AndriiBarabash
Copy link
Contributor Author

@mvoitko @antonbabenko hey guys, I've merged master into my branch and now I can't run the workflows. Could you please trigger them? On my local machine pre-commit works fine.
Screenshot 2021-08-28 at 18 01 01

@antonbabenko
Copy link
Member

@andreyBar I clicked the "Approve and run" button here in this PR, so it should be fine now.

@AndriiBarabash
Copy link
Contributor Author

@antonbabenko thank you. Nevertheless, I still get the same error on the terraform_docs pre-commit hook. Trying to find out why...

@antonbabenko
Copy link
Member

@andreyBar You need to use terraform-docs 0.13.0 and not the latest one. Unfortunately, this is a limitation we have for now. No need to fix it, if you can't. I will fix it during the review/merge a bit later.

@AndriiBarabash
Copy link
Contributor Author

AndriiBarabash commented Aug 28, 2021

@antonbabenko I find out the issue. My local terraform init created lock files with different versions than the pre-commit hook creates in CI/CD, probably because I've checked out a repository a long time ago and the lock files are old. After deleting all of them, and also .terraform directories, the pre-commit hook works the same as in CI/CD, creating desired docs change. I've committed them, should be fine now.

Edit: and yes, I've also switched to terraform-docs 0.13.0. It also matters.

@AndriiBarabash
Copy link
Contributor Author

AndriiBarabash commented Aug 29, 2021

@antonbabenko @barryib @mvoitko I believe the PR is ready to be merged, could you please take a look at it and review it?
Sorry for pushing. Really looking forward to merge this, want to finally switch back to the community version and stop using the fork 😅

@AndriiBarabash
Copy link
Contributor Author

@antonbabenko may I please ask you to have a look?

modules/node_groups/launch_template.tf Outdated Show resolved Hide resolved
modules/node_groups/templates/userdata.sh.tpl Outdated Show resolved Hide resolved
Co-authored-by: Anton Babenko <anton@antonbabenko.com>
@AndriiBarabash
Copy link
Contributor Author

@antonbabenko thank you for the suggestions. You are right! I've added them to PR.

@AndriiBarabash
Copy link
Contributor Author

@antonbabenko am I blacklisted? 😅
6 days ago I applied your suggestions, could you please elaborate on why this PR is not merged yet?

@antonbabenko
Copy link
Member

@andreyBar You are far from being blacklisted for making such a great contribution :)

The problem or rather the state of fact is that we have been working through 70+ issues and PRs over the last 3 weeks when the active work resumed in addition to other things we have (e.g., full-time work).

Let's see if @daroga0002 or @lisfo4ka approve this PR. If they have no objections, I will merge it and release it.

@AndriiBarabash
Copy link
Contributor Author

@antonbabenko thank you very much, I'm glad to hear it 😃

Yeah, it's a tough time as I can see. A lot of PR's and issues. I'm not aware of the news, but probably increasing the number of maintainers could help?

P.S. Sorry for pushing with PR.

@@ -4,3 +4,6 @@
${pre_userdata}

sed -i '/^KUBELET_EXTRA_ARGS=/a KUBELET_EXTRA_ARGS+=" ${kubelet_extra_args}"' /etc/eks/bootstrap.sh
%{ if run_bootstrap_script }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this is optional and rely on ami_id?

how this differ from amazon delivered AMI (how they launching bootstrap part?), asking as I never checked this and just using AWS AMIs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't now exactly how it works, but I think that if you don't specify a custom AMI, then EKS merges your init scripts with their snippet that does the bootstrap process. However, when you use a custom AMI, they expect you to provide a fully-working init script that includes the bootstrapping section

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, @jaimehrubiks is totally right. Here is my comment about this topic - #1473 (comment).

@antonbabenko
Copy link
Member

@andreyBar You said: "I'm not aware of the news, but probably increasing the number of maintainers could help?"

Well, everybody is welcome to join the discussions happening in the issues and PRs to help us with triage and close existing issues. It is a lot of work that requires EKS and Terraform knowledge.

@antonbabenko antonbabenko changed the title feat: Added custom AMI support for MNG feat: Added custom AMI support for managed node groups Sep 8, 2021
@antonbabenko antonbabenko merged commit 1d322eb into terraform-aws-modules:master Sep 8, 2021
@antonbabenko
Copy link
Member

v17.17.0 has been just released.

@AndriiBarabash AndriiBarabash deleted the feat_custom_ami_mng branch September 8, 2021 20:44
@stevehipwell
Copy link
Contributor

@antonbabenko as mentioned by @ArchiFleKs in another PR #1577 (comment), it looks like this work is incomplete as the module isn't adding all of the correct bootstrap.sh arguments. Either we should replicate the bootstrap.sh call correctly, my preference, or we should leave it to the user to call it in their pre_userdata; the current pattern is the worst of both worlds.

@ArchiFleKs I don't have any MNGs running as without ASG tagging or an alternative they're not usable to us, do you have the full merged userdata from a default one, ideally one with taints and labels set? With this we should be able to see how to structure the bootstrap.sh call.

@ArchiFleKs
Copy link
Contributor

I think now with taint being supported they are set at runtime by an external controller, so there is no need to replicate them here, I dont see them in usedata but I see the custom labels set with labels {}

Here is the launchtemplate userdata created by this terraform module for example:

Content-Type: multipart/mixed; boundary="//"
MIME-Version: 1.0

--//
Content-Transfer-Encoding: 7bit
Content-Type: text/x-shellscript
Mime-Version: 1.0

#!/bin/bash -e

# Allow user supplied pre userdata code


sed -i '/^KUBELET_EXTRA_ARGS=/a KUBELET_EXTRA_ARGS+=" "' /etc/eks/bootstrap.sh

--//--

Then another launchtemplate is created by EKS with this userdata:

--//
Content-Type: text/x-shellscript; charset="us-ascii"
#!/bin/bash
set -ex
B64_CLUSTER_CA=REDACTED
API_SERVER_URL=https://REDACTED.eks.amazonaws.com
K8S_CLUSTER_DNS_IP=172.20.0.10
/etc/eks/bootstrap.sh mycluster --kubelet-extra-args '--node-labels=eks.amazonaws.com/nodegroup-image=ami-0473d43e37891261a,eks.amazonaws.com/capacityType=ON_DEMAND,eks.amazonaws.com/sourceLaunchTemplateVersion=1,eks.amazonaws.com/nodegroup=default-a-20210831090454066900000005,eks.amazonaws.com/sourceLaunchTemplateId=lt-0aa2abf08a23e8ddc,topology.ebs.csi.aws.com/zone=eu-west-1a,role=default --max-pods=58' --b64-cluster-ca $B64_CLUSTER_CA --apiserver-endpoint $API_SERVER_URL --dns-cluster-ip $K8S_CLUSTER_DNS_IP --use-max-pods false


--//--

My guess is that when using custom AMI the launchtemplate clone is not being created ? I'm really a guess here because the doc does not explain much about how it works internally and how label and taints are implemented:

  • labels seems to be passed to kubelet
  • taints do not so they may be managed by an external cloud controller

I think we would also need some insights from people who actually are using MNG with custom AMIs

@stevehipwell
Copy link
Contributor

@ArchiFleKs the docs (see Specifying an AMI > Need to provide user data to pass arguments to the bootstrap.sh file included with an Amazon EKS optimized AMI and the User data in a launch template tab) specifically call out the use of manually setting the bootstrap arguments. So for me the outstanding question is how the taints are set, everything else is well documented.

@stevehipwell
Copy link
Contributor

The readme for ami_id is also incorrect as the current behaviour expects it to be based on an AWS EKS optimised AMI or at least exposing bootstrap.sh. I'll open a PR to resolve these issues.

@stevehipwell
Copy link
Contributor

I've added #1580 to resolve this. @ArchiFleKs could you take a look for me?

@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 12, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants