From b90ded7d7947103b7d1c9a0af02d4f82021ae81c Mon Sep 17 00:00:00 2001 From: Suraj Deshmukh Date: Tue, 9 Jun 2020 16:57:41 +0530 Subject: [PATCH] docs: How to upgrade bootstrap kubelet? Signed-off-by: Suraj Deshmukh --- .../upgrade-bootstrap-kubelet.md | 113 ++++++++++++++++++ 1 file changed, 113 insertions(+) create mode 100644 docs/how-to-guides/upgrade-bootstrap-kubelet.md diff --git a/docs/how-to-guides/upgrade-bootstrap-kubelet.md b/docs/how-to-guides/upgrade-bootstrap-kubelet.md new file mode 100644 index 000000000..a0ae94163 --- /dev/null +++ b/docs/how-to-guides/upgrade-bootstrap-kubelet.md @@ -0,0 +1,113 @@ +# Upgrading bootstrap kubelet + +## Contents + +- [Introduction](#introduction) +- [Steps](#steps) + - [Step 1: Drain the node](#step-1-drain-the-node) + - [Step 2: Find out the IP and SSH](#step-2-find-out-the-ip-and-ssh) + - [Step 3: Upgrade kubelet on node](#step-3-upgrade-kubelet-on-node) + - [Step 4: Verify](#step-4-verify) +- [Caveats](#caveats) + +## Introduction + +[Kubelet](https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/) is a daemon +that runs on every node and is responsible for managing Pods on the node. + +Lokomotive cluster runs two different sets of kubelet processes. Initially, **bootstrap** kubelet +configured on the node as a `rkt` pod joins the cluster, and then kubelet pod managed using +DaemonSet (self-hosted kubelet) takes over the bootstrap kubelet. Self-hosted kubelet allows +seamless updates between Kubernetes patch versions and node configuration using tools like +`kubectl`. + +Currently `lokoctl` cannot update bootstrap kubelet, so this document explains how to perform this +operation manually. + +## Steps + +Perform the following steps on each node, one node at a time. + +### Step 1: Drain the node + +> **Caution:** If you are using a local directory as a storage for a workload, it will be disturbed +> by this operation. To avoid this move the workload to another node and let the application +> replicate the data. If the application does not support data replication across instances, then +> expect downtime. + +```bash +kubectl drain --ignore-daemonsets +``` + +### Step 2: Find out the IP and SSH + +Find the IP of the node by visiting the cloud provider dashboard. Then, connect to selected machine +using SSH. + +```bash +ssh core@ +``` + +### Step 3: Upgrade kubelet on node + +Run the following commands: + +> **NOTE**: Before proceeding to other commands, set the `latest_kube` variable to the latest +> Kubernetes version. Latest Kubernetes version can be found by running this command after a cluster +> upgrade: `kubectl version -ojson | jq -r '.serverVersion.gitVersion'`. + +```bash +export latest_kube= +sudo sed -i "s|$(grep -i kubelet_image_tag /etc/kubernetes/kubelet.env)|KUBELET_IMAGE_TAG=${latest_kube}|g" /etc/kubernetes/kubelet.env +sudo systemctl restart kubelet +sudo journalctl -fu kubelet +``` + +Check the logs carefully. If kubelet fails to restart and instructs to do something (e.g. deleting +some file), follow the instructions and reboot the node: + +```bash +sudo reboot +``` + +### Step 4: Verify + +**When `disable_self_hosted_kubelet` is `true`**: + +Once the node reboots and kubelet rejoins the cluster, output of following command will show new +version across the node name: + +```bash +kubectl get nodes +``` + +**When `disable_self_hosted_kubelet` is `false`**: + +Verify that the kubelet service is in active (running) state: + +```bash +sudo systemctl status --no-pager kubelet +``` + +Run the following command to see logs of the process since the last restart: + +```bash +sudo journalctl _SYSTEMD_INVOCATION_ID=$(sudo systemctl \ + show -p InvocationID --value kubelet) +``` + +Once you see the following log lines, you can discern that the kubelet daemon has come up without +errors. Kubelet daemon tries to rejoin the cluster it is taken over by the self hosted kubelet pod +and you see the following logs: + +``` +Version: +acquiring file lock on "/var/run/lock/kubelet.lock" +``` + +## Caveats + +- When upgrading kubelet on nodes which are running Rook Ceph, verify that the Ceph cluster is in + the **`HEALTH_OK`** state. If it is in any other state, **do not proceed with the upgrade** as + doing so could lead to data loss. When the cluster is in the `HEALTH_OK` state it can tolerate the + downtime caused by rebooting nodes.