Use serializable health checks for etcd probes #3076

brandond · 2022-06-16T21:12:31Z

etcd 3.5.3+ has support for checking the health of a specific member, as opposed to the cluster as a whole. Upstream kubeadm is switching to using this for static pod health checks, as there's no point in restarting the pod if the cluster as a whole is unhealthy - as a matter of fact it may actually make it worse. We should do the same.

References:

kubeadm: add serializable health checks for etcd probes kubernetes/kubernetes#110072
Enhance health check endpoint to support serializable request etcd-io/etcd#13399

rancher-max · 2022-06-24T23:34:39Z

Validated on v1.24.2-rc1+rke2r1

Environment Details

Infrastructure

Cloud (AWS)
Hosted

Node(s) CPU architecture, OS, and Version:

$ uname -a
Linux ip-172-31-41-231 5.13.0-1029-aws #32~20.04.1-Ubuntu SMP Thu Jun 9 13:03:13 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

$ cat /etc/os-release 
NAME="Ubuntu"
VERSION="20.04.4 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.4 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

Cluster Configuration:

1 server

Config.yaml:

N/A

Additional files

N/A

Testing Steps

Install RKE2 using v1.24.2-rc1+rke2r1
Get the pod info using describe, get, and exec.

Replication Results:
New feature so N/A

Validation Results:

# Exec into the pod, curl the new endpoint, see that it is successful
$ k -n kube-system exec -it pod/etcd-ip-172-31-41-231 -- /bin/bash
bash-4.2# curl localhost:2381/health?serializable=true
{"health":"true","reason":""}

# Describe the pod and ensure there are no errors in the events:
$ k -n kube-system describe pod/etcd-ip-172-31-41-231
...
Liveness:  http-get http://localhost:2381/health%3Fserializable=true delay=15s timeout=15s period=10s #success=1 #failure=8
...
Events:
  Type    Reason   Age    From     Message
  ----    ------   ----   ----     -------
  Normal  Pulled   3m19s  kubelet  Container image "index.docker.io/rancher/hardened-etcd:v3.5.4-k3s1-build20220504" already present on machine
  Normal  Created  3m19s  kubelet  Created container etcd
  Normal  Started  3m19s  kubelet  Started container etcd

# Get the pod yaml for cleaner output
$ k -n kube-system get pod/etcd-ip-172-31-41-231 -o yaml
...
    livenessProbe:
      failureThreshold: 8
      httpGet:
        host: localhost
        path: /health?serializable=true
        port: 2381
        scheme: HTTP
      initialDelaySeconds: 15
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 15
...

brandond self-assigned this Jun 16, 2022

brandond added this to the v1.24.2+rke2r1 milestone Jun 16, 2022

This was referenced Jun 16, 2022

Use serializable health checks for etcd probes #3073

Merged

[release-1.23] Use serializable health checks for etcd probes #3077

Closed

[release-1.22] Use serializable health checks for etcd probes #3078

Closed

rancher-max self-assigned this Jun 24, 2022

rancher-max closed this as completed Jun 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use serializable health checks for etcd probes #3076

Use serializable health checks for etcd probes #3076

brandond commented Jun 16, 2022

rancher-max commented Jun 24, 2022

Use serializable health checks for etcd probes #3076

Use serializable health checks for etcd probes #3076

Comments

brandond commented Jun 16, 2022

rancher-max commented Jun 24, 2022

Validated on v1.24.2-rc1+rke2r1

Environment Details

Testing Steps