Skip to content

Commit

Permalink
Restart kubelet if it does not report an internal/external ip address
Browse files Browse the repository at this point in the history
On kubelet start-up some functions to set the node status are generated. One of those functions propagates the node addresses into the `Node` object the kubelet is responsible for (`.status.addresses`).
The kube-apiserver uses these addresses to talk to the actual node.

To identify the IP address of the node the kubelet communicates with the cloud provider. kubernetes/kubernetes#62543 introduced a timeout of 10s when trying to connect to the cloud. In case the IP cannot
be determined within 10s, the `Node` object does not report an `InternalIP` address.

Consequently, the kube-apiserver will never be able to talk to that node; particularly VPN won't work in case the vpn-shoot pod is scheduled on it.

Once the connection failed, it is never retried, and only a kubelet process restart can trigger it again. Hence, our kubelet monitoring script will now do the same when it cannot find an `InternalIP` or an `ExternalIP`
address on the `Node` object.

closes gardener#283
  • Loading branch information
rfranzke authored and yupeng.richard committed Mar 26, 2019
1 parent 71bcef6 commit 38370fd
Showing 1 changed file with 13 additions and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -48,8 +48,20 @@
continue
fi

node_status="$(kubectl get nodes -l kubernetes.io/hostname=$(hostname) -o json | jq -r '.items[0].status')"

# Check whether the kubelet does report an InternalIP node address
if node_ip_addresses="$(echo $node_status | jq -r '.addresses[] | select(.type=="InternalIP" or .type=="ExternalIP") | .address')"; then
if [[ -z "$node_ip_addresses" ]]; then
echo "Kubelet has not reported an InternalIP nor an ExternalIP node address yet. Restarting kubelet!";
restart_kubelet
sleep 20
continue
fi
fi

# Check whether kubelet ready status toggles between true and false and reboot VM if happened too often.
if status="$(kubectl get nodes -l kubernetes.io/hostname=$(hostname) -o json | jq -r '.items[0].status.conditions[] | select(.type=="Ready") | .status')"; then
if status="$(echo $node_status | jq -r '.conditions[] | select(.type=="Ready") | .status')"; then
if [[ "$status" != "True" ]]; then
if [[ $time_kubelet_not_ready_first_occurrence == 0 ]]; then
time_kubelet_not_ready_first_occurrence=$(date +%s)
Expand Down

0 comments on commit 38370fd

Please sign in to comment.