Replies: 2 comments
-
Hey, the only method I can think of is to drain all the nodes in the cluster using a command like this: kubectl get nodes -o json | jq -r '.items[].metadata.name' | xargs -I {} kubectl drain --ignore-daemonsets --delete-emptydir-data {} This will gracefully terminate the pods, so there shouldn't be any file system corruption in the persistent volumes. Out of curiosity, how do you reuse existing volumes in a new cluster? I haven't tried that myself. I don't need it often, but when I do, I usually restore volumes along with other data using Velero. |
Beta Was this translation helpful? Give feedback.
-
I tried your command and while it indeed drains all the nodes, I see that during the eviction (which takes quite some time) pods are being restarted. I first suspected that it is ArgoCD that's fixing the missing ressources but calling Once the Apparently draining a cluster is not a one-liner, so I created a script which seems working fine to me: #!/bin/bash
# exit code 0, if the given namespace should not be drained
is_excluded_ns()
{
# do not mess with "kube-system" as CSI management lives there
if [ "$1" = "kube-system" ]; then
return 0
else
return 1
fi
}
# calls `kubectl delete` on relevant resources of the given (single) namespace
drain_namespace()
{
local ns="$1"
echo ""
echo "NAMESPACE: $ns"
# Delete deployments and similar. Kubernetes will automatically
# delete/downscale associated pods in a graceful way.
# We ignore manually created pods as we don't expect them to be anything
# relevant.
for kind in deployment daemonset statefulset replicaset ; do
for name in $(kubectl -n "$ns" get $kind -o json | jq -r '.items[].metadata.name') ; do
kubectl -n "$ns" delete $kind "$name"
done
done
}
# drains the given namespaces and waits for all pods to terminate
drain_and_wait()
{
local ns_list="$*"
local temp
for ns in $ns_list ; do
drain_namespace "$ns"
done
# wait for pods to terminate
started=$(date +%s)
timeout=60
while :; do
remain=""
for ns in $ns_list ; do
temp="$(kubectl get pods -n "$ns" -o json | jq -r '.items[].metadata.name')"
if [ -n "$temp" ]; then
remain="$remain $temp"
fi
done
if [ -z "$remain" ]; then
return 0
fi
secs_left=$(expr $started + $timeout - $(date +%s))
if [ "$secs_left" -le 0 ]; then
echo "Giving up on these pods."
return 1
fi
echo ""
echo "Remaining pods (giving up in ${secs_left}s):" $remain
sleep 3
done
}
# returns a list of all namespaces that should be drained
get_relevant_namespaces()
{
for ns in $(kubectl get namespaces -o json | jq -r '.items[].metadata.name') ; do
if ! is_excluded_ns "$ns"; then
echo "$ns"
fi
done
}
# drain ArgoCD first, so that it does not recreate resources
if drain_and_wait argocd ; then
# then drain all remaining namespaces
if drain_and_wait $(get_relevant_namespaces) ; then
echo ""
echo "All pods terminated."
exit 0
fi
else
echo "ArgoCD not fully terminated. Not safe to continue draining. Giving up!"
fi
exit 1
Do you see any pitfalls/problems with this approach?
apiVersion: v1
kind: PersistentVolume
metadata:
name: whatever-pv
spec:
storageClassName: hcloud-volumes
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
csi:
fsType: ext4
driver: csi.hetzner.cloud
volumeHandle: "1234567" # <-- the volume ID here
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: whatever-pvc
spec:
volumeName: whatever-pv
accessModes:
- ReadWriteOnce
storageClassName: hcloud-volumes
resources:
requests:
storage: 10Gi Works like a charm. |
Beta Was this translation helpful? Give feedback.
-
I know that
hetzner-k3s delete
can be used to completely destroy a cluster.In my understanding (and since it is so quick), this directly deletes the Hetzner VMs, meaning that any workloads will be killed the hard way. For PVs, that survive the cluster deletion and are intended to be reused, this probably means a somewhat crashed filesystem.
I regularily re-create my cluster and altough the filesystems usually recover just fine from a hash cluster shutdown, it's probably not sane.
Is there a good way to gracefully shutdown all workloads and especially disconnect all PVs (Hetzner Volumes) in a safe way before running
hetzner-k3s delete
?I'd like to create a script that does exactly that.
Beta Was this translation helpful? Give feedback.
All reactions