You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
etcd 3.5.3+ has support for checking the health of a specific member, as opposed to the cluster as a whole. Upstream kubeadm is switching to using this for static pod health checks, as there's no point in restarting the pod if the cluster as a whole is unhealthy - as a matter of fact it may actually make it worse. We should do the same.
$ uname -a
Linux ip-172-31-41-231 5.13.0-1029-aws #32~20.04.1-Ubuntu SMP Thu Jun 9 13:03:13 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
$ cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.4 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.4 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
Cluster Configuration:
1 server
Config.yaml:
N/A
Additional files
N/A
Testing Steps
Install RKE2 using v1.24.2-rc1+rke2r1
Get the pod info using describe, get, and exec.
Replication Results:
New feature so N/A
Validation Results:
# Exec into the pod, curl the new endpoint, see that it is successful
$ k -n kube-system exec -it pod/etcd-ip-172-31-41-231 -- /bin/bash
bash-4.2# curl localhost:2381/health?serializable=true
{"health":"true","reason":""}
# Describe the pod and ensure there are no errors in the events:
$ k -n kube-system describe pod/etcd-ip-172-31-41-231
...
Liveness: http-get http://localhost:2381/health%3Fserializable=true delay=15s timeout=15s period=10s #success=1 #failure=8
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulled 3m19s kubelet Container image "index.docker.io/rancher/hardened-etcd:v3.5.4-k3s1-build20220504" already present on machine
Normal Created 3m19s kubelet Created container etcd
Normal Started 3m19s kubelet Started container etcd
# Get the pod yaml for cleaner output
$ k -n kube-system get pod/etcd-ip-172-31-41-231 -o yaml
...
livenessProbe:
failureThreshold: 8
httpGet:
host: localhost
path: /health?serializable=true
port: 2381
scheme: HTTP
initialDelaySeconds: 15
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 15
...
etcd 3.5.3+ has support for checking the health of a specific member, as opposed to the cluster as a whole. Upstream kubeadm is switching to using this for static pod health checks, as there's no point in restarting the pod if the cluster as a whole is unhealthy - as a matter of fact it may actually make it worse. We should do the same.
References:
The text was updated successfully, but these errors were encountered: