Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use serializable health checks for etcd probes #3076

Closed
brandond opened this issue Jun 16, 2022 · 1 comment
Closed

Use serializable health checks for etcd probes #3076

brandond opened this issue Jun 16, 2022 · 1 comment
Assignees

Comments

@brandond
Copy link
Member

etcd 3.5.3+ has support for checking the health of a specific member, as opposed to the cluster as a whole. Upstream kubeadm is switching to using this for static pod health checks, as there's no point in restarting the pod if the cluster as a whole is unhealthy - as a matter of fact it may actually make it worse. We should do the same.

References:

@rancher-max
Copy link
Contributor

Validated on v1.24.2-rc1+rke2r1

Environment Details

Infrastructure

  • Cloud (AWS)
  • Hosted

Node(s) CPU architecture, OS, and Version:

$ uname -a
Linux ip-172-31-41-231 5.13.0-1029-aws #32~20.04.1-Ubuntu SMP Thu Jun 9 13:03:13 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

$ cat /etc/os-release 
NAME="Ubuntu"
VERSION="20.04.4 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.4 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

Cluster Configuration:

1 server

Config.yaml:

N/A

Additional files

N/A

Testing Steps

  1. Install RKE2 using v1.24.2-rc1+rke2r1
  2. Get the pod info using describe, get, and exec.

Replication Results:
New feature so N/A

Validation Results:

# Exec into the pod, curl the new endpoint, see that it is successful
$ k -n kube-system exec -it pod/etcd-ip-172-31-41-231 -- /bin/bash
bash-4.2# curl localhost:2381/health?serializable=true
{"health":"true","reason":""}

# Describe the pod and ensure there are no errors in the events:
$ k -n kube-system describe pod/etcd-ip-172-31-41-231
...
Liveness:  http-get http://localhost:2381/health%3Fserializable=true delay=15s timeout=15s period=10s #success=1 #failure=8
...
Events:
  Type    Reason   Age    From     Message
  ----    ------   ----   ----     -------
  Normal  Pulled   3m19s  kubelet  Container image "index.docker.io/rancher/hardened-etcd:v3.5.4-k3s1-build20220504" already present on machine
  Normal  Created  3m19s  kubelet  Created container etcd
  Normal  Started  3m19s  kubelet  Started container etcd

# Get the pod yaml for cleaner output
$ k -n kube-system get pod/etcd-ip-172-31-41-231 -o yaml
...
    livenessProbe:
      failureThreshold: 8
      httpGet:
        host: localhost
        path: /health?serializable=true
        port: 2381
        scheme: HTTP
      initialDelaySeconds: 15
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 15
...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants