Replies: 11 comments 1 reply
-
That's going to depend on the way you're deploying (and scaling) your clusters. If it's with terraform, you need to ensure you have the right BottleRocket-specific UserData passed in. If you're using Karpenter for scaling worker nodes on EKS, you'll also want to make sure you have the same UserData in your NodeTemplate(s). We've been migrating more and more of our new infrastructure architecture to BottleRocket from AL2. One of the biggest differences is UserData. In AL2, you can pass a script to UserData and have it run when the machine boots. In BottleRocket, a good way to think about things is that everything is jailed inside a container. This is by design...for the sake of security, but also helps to make the system more reason-about-able. https://github.com/bottlerocket-os/bottlerocket#bootstrap-containers-settings You're going to want to read about bootstrap containers. The shell scripts or Ansible PlayBooks you wish to run as part of your UserData bootstrap process will have to be packaged up as container images, with the correct ENTRYPOINT and/or CMD in the Dockerfile. You'll then declare and configure a bootstrap container for BottleRocket which pulls and runs that container on the BottleRocket host. Bootstrap containers are priveleged, and should provide the filesystem-level access you need to the host in order to set everything up. "Converting" your brain to think about things in a "BottleRocket-friendly way" takes a little time. The whole OS exposes an API for itself (which takes some time getting used to), and it essentially requires that you use Docker containers for any host-related interactions with the underlying system, similar to the way you'd think of using old-school chroot-ed jails on a traditional server in order to do specific things on that host, without exposing the whole host to your specifically jailed scripts or executables. I defer to the BottleRocket maintainers in case there's something I've misrepresented, or there's something that I - myself - have misunderstood. |
Beta Was this translation helpful? Give feedback.
-
@armenr Thank you very much for sharing your thoughts. It matches what I have understood from reading the documentation as well and I am pretty comfortable with that approach. I bet that looking at the default bootstrap container image, its entry point and its definition would give an idea of how to inject our commands before the "official" behaviour kicks in. If we implement the integration this way, we can still base our container image in the official bootstrap, and enjoy our automation pieces, but continue with the official configuration method to not miss any critical configuration made by the default one. |
Beta Was this translation helpful? Give feedback.
-
Hi @ajdelaguila-rbro, there isn't any default / official bootstrap container image, although there are a few examples that different users have posted across a few discussions. Bottlerocket doesn't define any default bootstrapping through bootstrap containers, they are a feature to allow users set up their hosts for cases that aren't solved through the API (e.g., mount an ephemeral disk to either Keep in mind that, even though bootstrap containers have some privilege, they don't run with full privilege and the things they are allowed to do are limited. Please refer to the documentation to know what capabilities(7) they were granted. @armenr, that was an excellent answer! Although I want to make a small clarification. Both host and bootstrap containers aren't Docker containers. They don't run with Docker as the container runtime, rather Bottlerocket ships with a small container client called |
Beta Was this translation helpful? Give feedback.
-
@arnaldo2792 Could you please lend us a hand with the bootstrap container? We have built it to use as our bootstrap container, the entry point of the image is set to run a shell script that, right now, just echoes a message to the standard output. #!/bin/bash
echo "Initializing Bottlerocket Bootstrap..." User data for the node is configured as:
If we set "essential" to true, the node never makes it up nor joins the EKS cluster. Following some other threads here in GitHub, we have set it up to "false" and gathered the logs from inside the node. In journalctl.log, among other initialization steps, we can see this:
We understand from the log that the bootstrap container has been loaded and has completed its entry point also successfully. What are we doing wrong? Thanks in advance! |
Beta Was this translation helpful? Give feedback.
-
Let me dig into this, and try to reproduce |
Beta Was this translation helpful? Give feedback.
-
Hey @ajdelaguila-rbro, I wasn't able to replicate what you are experiencing. I created a container with this Dockerfile: FROM alpine
# Just for debugging purpuses
RUN apk add coreutils
ENTRYPOINT ["echo", "HI, I'm bootstraping!"] Then, I updated my API settings to configure a new bootstrap container: {
"settings": {
"bootstrap-containers": {
"bootstrap": {
"essential": true,
"mode": "always",
"source": "<my-image>/bootstrap:latest"
}
}
}
}
My instance came back up after reboot, and the bootstrap container was executed: # From the admin container
[root@admin]# sheltie journalctl -u bootstrap-containers@bootstrap.service
Jun 28 20:47:19 <> host-ctr[1764]: time="2023-06-28T20:47:19Z" level=info msg="successfully started container task"
Jun 28 20:47:19 <> host-ctr[1764]: HI, I'm bootstraping!
Jun 28 20:47:19 <> host-ctr[1764]: time="2023-06-28T20:47:19Z" level=info msg="container task exited" code=0
Jun 28 20:47:20 <> bootstrap-containers[1833]: 20:47:20 [INFO] bootstrap-containers started
Jun 28 20:47:20 <> bootstrap-containers[1833]: 20:47:20 [INFO] Mode for 'bootstrap' is 'always'
Jun 28 20:47:20 <> systemd[1]: Finished bootstrap container bootstrap. Could you please share how you are creating your bootstrap container? Also, if you can get the logs of an instance that doesn't join a cluster after you apply your changes will help! 👍 |
Beta Was this translation helpful? Give feedback.
-
Good afternoon @arnaldo2792, Here is my Dockerfile:
and this is the content of the configure.sh script #!/bin/bash
echo "Initializing Bottlerocket Bootstrap..."
# Load content of the settings file in environment variable
# BOTTLEROCKET_JSON_SETTINGS=$(jq -cr . /etc/bottlerocket-settings.json)
# Send configuration to BottleRocket through its cli
# apiclient set --json '$BOTTLEROCKET_JSON_SETTINGS'
I've tried running this image using just docker and from Kubernetes and in both cases it traces the message inside the script configre.sh We're now trying to check again if we can gain access through SSH when the node is in that state; the previous try we did, we couldn't connect. Is there something in the Dockerfile that catches your attention? Is there something you would recommend testing/trying? Thank you in advance, |
Beta Was this translation helpful? Give feedback.
-
Today we tried again and it worked right away, we got the message in the log and the node joined the cluster with no issue. We believe that the behaviour is due to some cache timeout that prevents the image from been always pulled during boot time. If it's present in the node, it never checks if the SHA code of the image in the remote repository matches the one pulled locally, and it does not download it again. Last Monday, when we ran this procedure, we realized that our bootstrap image had an issue during its initialization and we built a second version of the image (the one that today worked right away). Today, after getting this working we have updated our bootstrap image to perform actual configuration changes and they have not been executed in any of the nodes. However, they have been performed in a new node added to the cluster. Since we are talking about a bootstrap image that needs to be configured when the cluster is first created, we are pointing to an elastic tag (latest) that we move as we release newer versions of the image. We have checked that if we modify the node group to change the source of the bootstrap image to point to a different version it recreates all nodes that belong to the node group causing a major outage in the cluster and we want to avoid that situation. Is there any configuration value we can set so the image is always pulled similarly to the flag in Kubernetes deployments? If not, what approach would you recommend? Thank you again for your time and support, |
Beta Was this translation helpful? Give feedback.
-
Good morning @arnaldo2792, We use Artifactory as a container registry and we have confirmed it always serves the actual image behind the tag with no caching. I wonder if this behaviour might be related to the fact that we are testing this implementation on version 1.13.0, we'll give this a try on 1.14.1 as well, just to confirm. On the other hand, when running "apiclient" commands from the bootstrap container we receive "Permission denied.". Is there any particular user id or group id we need to use in our bootstrap image? Following standard good practices when creating containers we are not using the "root" user but user 1001; however, for this particular use case, should we run the commands using the root user of the container? Have a wonderful day, |
Beta Was this translation helpful? Give feedback.
-
I can confirm that using the root user within the bootstrap container fixes the permissions issue and now commands go through as expected. Regarding the cache... there might be some corner cases when the bootstrap image fails to run or OS v1.13.0 or something similar because today we get the new container pulled every time. I think that, at this point, we are good to close this conversation, everything is working as intended. @arnaldo2792, We cannot thank you enough for all the time you've dedicated and the help provided, Have a great weekend ahead, |
Beta Was this translation helpful? Give feedback.
-
@ajdelaguila-rbro, I'm glad you folks got your setup working! 🎉 , I'm sorry I missed your update about the user ID, that's unexpected, let me try to confirm this since in host containers the user isn't UID 0, and they can execute [ssm-user@control]$ id -u
1000
[ssm-user@control]$ apiclient get os
[ssm-user@control]$ id -u
1000
[ssm-user@control]$ apiclient get settings.host-containers.control
{
"settings": {
"host-containers": {
"control": {
"enabled": true,
"source": "328549459982.dkr.ecr.us-west-2.amazonaws.com/bottlerocket-control:v0.7.2",
"superpowered": false
}
}
}
} Regarding the 1.13.0 version, there were several patches to that particular version, would it be possible for you folks to move to the latest version of that release? Or even better, to 1.14.X? |
Beta Was this translation helpful? Give feedback.
-
Good morning,
We are looking for a way to standardize configuration across all our EKS clusters and use this script to pull configurations such as container registry credentials, mirroring configuration, SSH keys for automation processes through Ansible, the initial "version-lock" that should apply to the node to match the current upgrade state of all other nodes in the cluster, etc... from a git repository.
What are the options to execute a shell script as part of the user data injected by AWS from the launch template? Could you please advise on a recommended procedure or best practices?
Thank you in advance,
ajdelaguila
Beta Was this translation helpful? Give feedback.
All reactions