gracefully shutdown cluster #493

jampy · 2024-11-28T09:21:48Z

jampy
Nov 28, 2024

I know that hetzner-k3s delete can be used to completely destroy a cluster.

In my understanding (and since it is so quick), this directly deletes the Hetzner VMs, meaning that any workloads will be killed the hard way. For PVs, that survive the cluster deletion and are intended to be reused, this probably means a somewhat crashed filesystem.

I regularily re-create my cluster and altough the filesystems usually recover just fine from a hash cluster shutdown, it's probably not sane.

Is there a good way to gracefully shutdown all workloads and especially disconnect all PVs (Hetzner Volumes) in a safe way before running hetzner-k3s delete?

I'd like to create a script that does exactly that.

vitobotta · 2024-11-28T15:48:50Z

vitobotta
Nov 28, 2024
Maintainer

Hey, the only method I can think of is to drain all the nodes in the cluster using a command like this:

kubectl get nodes -o json | jq -r '.items[].metadata.name' | xargs -I {} kubectl drain --ignore-daemonsets --delete-emptydir-data {}

This will gracefully terminate the pods, so there shouldn't be any file system corruption in the persistent volumes.

Out of curiosity, how do you reuse existing volumes in a new cluster? I haven't tried that myself. I don't need it often, but when I do, I usually restore volumes along with other data using Velero.

0 replies

jampy · 2024-11-29T10:14:41Z

jampy
Nov 29, 2024
Author

I tried your command and while it indeed drains all the nodes, I see that during the eviction (which takes quite some time) pods are being restarted.

I first suspected that it is ArgoCD that's fixing the missing ressources but calling kuebctl delete namespace argocd first didn't help.

Once the drain commands are completed, I'm left with a bunch of freshly started pods.

Apparently draining a cluster is not a one-liner, so I created a script which seems working fine to me:

#!/bin/bash


# exit code 0, if the given namespace should not be drained
is_excluded_ns()
{

  # do not mess with "kube-system" as CSI management lives there

  if [ "$1" = "kube-system" ]; then
    return 0
  else
    return 1
  fi

}


# calls `kubectl delete` on relevant resources of the given (single) namespace
drain_namespace()
{

  local ns="$1"

  echo ""
  echo "NAMESPACE: $ns"

  # Delete deployments and similar. Kubernetes will automatically
  # delete/downscale associated pods in a graceful way.
  # We ignore manually created pods as we don't expect them to be anything
  # relevant.

  for kind in deployment daemonset statefulset replicaset ; do

    for name in $(kubectl -n "$ns" get $kind -o json | jq -r '.items[].metadata.name') ; do

      kubectl -n "$ns" delete $kind "$name"

    done

  done

}


# drains the given namespaces and waits for all pods to terminate
drain_and_wait()
{

  local ns_list="$*"
  local temp

  for ns in $ns_list ; do
    drain_namespace "$ns"
  done


  # wait for pods to terminate
  started=$(date +%s)
  timeout=60
  while :; do

    remain=""
    for ns in $ns_list ; do
      temp="$(kubectl get pods -n "$ns" -o json | jq -r '.items[].metadata.name')"
      if [ -n "$temp" ]; then
        remain="$remain $temp"
      fi
    done


    if [ -z "$remain" ]; then
      return 0
    fi

    secs_left=$(expr $started + $timeout - $(date +%s))

    if [ "$secs_left" -le 0 ]; then
      echo "Giving up on these pods."
      return 1
    fi

    echo ""
    echo "Remaining pods (giving up in ${secs_left}s):" $remain

    sleep 3

  done

}


# returns a list of all namespaces that should be drained
get_relevant_namespaces()
{

  for ns in $(kubectl get namespaces -o json | jq -r '.items[].metadata.name') ; do

    if ! is_excluded_ns "$ns"; then
      echo "$ns"
    fi

  done

}



# drain ArgoCD first, so that it does not recreate resources
if drain_and_wait argocd ; then

  # then drain all remaining namespaces
  if drain_and_wait $(get_relevant_namespaces) ; then

    echo ""
    echo "All pods terminated."

    exit 0

  fi

else

  echo "ArgoCD not fully terminated. Not safe to continue draining. Giving up!"

fi

exit 1

Do you see any pitfalls/problems with this approach?

Out of curiosity, how do you reuse existing volumes in a new cluster? I haven't tried that myself. I don't need it often, but when I do, I usually restore volumes along with other data using Velero.

I create the volume using the Hetzner web UI. When creating unfortunately I'm forced to attach it to some server, but I immediately detach it afterwards, so that I'm left wth a unused volume.
In the volumes list (web UI) I take note of the volume ID
I define the PV/PVC like so:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: whatever-pv
spec:
  storageClassName: hcloud-volumes
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  csi:
    fsType: ext4
    driver: csi.hetzner.cloud
    volumeHandle: "1234567"  # <-- the volume ID here
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: whatever-pvc
spec:
  volumeName: whatever-pv
  accessModes:
    - ReadWriteOnce
  storageClassName: hcloud-volumes
  resources:
    requests:
      storage: 10Gi

Works like a charm.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gracefully shutdown cluster #493

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

gracefully shutdown cluster #493

jampy Nov 28, 2024

Replies: 2 comments

vitobotta Nov 28, 2024 Maintainer

jampy Nov 29, 2024 Author

jampy
Nov 28, 2024

vitobotta
Nov 28, 2024
Maintainer

jampy
Nov 29, 2024
Author