Skip to content

Commit

Permalink
Remove node self-deletion behavior on GCP and DO
Browse files Browse the repository at this point in the history
* Node self-deletion is now only performed on AWS clusters
* Differentiating an unhealthy/NotRead node from a node that
has shutdown (temporary reboot or permanent preemption) is
tricky. To cut down on noisy alerts, we've favored having nodes
delete themselves when shutdown gracefully.
* Though node self-deletion is useful, we may now be able to
remove this behavior on some platforms to alert in more cases
where an alert is warranted
* On Digital Ocean, there is no managed instance group / ASG to
replace a deleted node. Losing a node for a long enough period
(reboot are fine) probably merits an alert that's not sent today
with the self-deletion design. As a tradeoff, admins performing
a terraform scale-down must use kubectl to manually delete nodes
that are removed.
* On Google Cloud, node self-deletion prevents alerting when a
significant number of nodes are preempted. Fortunately, Google
Cloud uses consistent naming of a node between preemptions so
reboots and the daily preemption shouldn't trigger an alert.
A node that's preempted and doesn't get replaced for a long
enough period does merit an alert. As a tradeoff, admins
performing a terraform scale-down must use kubectl to manually
delete nodes that are removed.
* On AWS, spot instances lack Google Cloud's consistent naming
feature. We must keep using the node self-deletion behavior to
avoid preempted spot instances from accumulating and causing
an alert. Self-deletion does mean in cases where many spot workers
are preempted, no alert will be sent (undesired).
  • Loading branch information
dghubble committed May 10, 2018
1 parent 8b8e364 commit b36da9c
Show file tree
Hide file tree
Showing 2 changed files with 0 additions and 56 deletions.
28 changes: 0 additions & 28 deletions digital-ocean/container-linux/kubernetes/cl/worker.yaml.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -77,18 +77,6 @@ systemd:
RestartSec=5
[Install]
WantedBy=multi-user.target
- name: delete-node.service
enable: true
contents: |
[Unit]
Description=Waiting to delete Kubernetes node on shutdown
[Service]
Type=oneshot
RemainAfterExit=true
ExecStart=/bin/true
ExecStop=/etc/kubernetes/delete-node
[Install]
WantedBy=multi-user.target
storage:
files:
- path: /etc/kubernetes/kubelet.env
Expand All @@ -103,19 +91,3 @@ storage:
contents:
inline: |
fs.inotify.max_user_watches=16184
- path: /etc/kubernetes/delete-node
filesystem: root
mode: 0744
contents:
inline: |
#!/bin/bash
set -e
exec /usr/bin/rkt run \
--trust-keys-from-https \
--volume config,kind=host,source=/etc/kubernetes \
--mount volume=config,target=/etc/kubernetes \
--insecure-options=image \
docker://k8s.gcr.io/hyperkube:v1.10.2 \
--net=host \
--dns=host \
--exec=/kubectl -- --kubeconfig=/etc/kubernetes/kubeconfig delete node $(hostname)
Original file line number Diff line number Diff line change
Expand Up @@ -66,18 +66,6 @@ systemd:
RestartSec=5
[Install]
WantedBy=multi-user.target
- name: delete-node.service
enable: true
contents: |
[Unit]
Description=Waiting to delete Kubernetes node on shutdown
[Service]
Type=oneshot
RemainAfterExit=true
ExecStart=/bin/true
ExecStop=/etc/kubernetes/delete-node
[Install]
WantedBy=multi-user.target
storage:
files:
- path: /etc/kubernetes/kubeconfig
Expand All @@ -98,22 +86,6 @@ storage:
contents:
inline: |
fs.inotify.max_user_watches=16184
- path: /etc/kubernetes/delete-node
filesystem: root
mode: 0744
contents:
inline: |
#!/bin/bash
set -e
exec /usr/bin/rkt run \
--trust-keys-from-https \
--volume config,kind=host,source=/etc/kubernetes \
--mount volume=config,target=/etc/kubernetes \
--insecure-options=image \
docker://k8s.gcr.io/hyperkube:v1.10.2 \
--net=host \
--dns=host \
--exec=/kubectl -- --kubeconfig=/etc/kubernetes/kubeconfig delete node $(hostname)
passwd:
users:
- name: core
Expand Down

0 comments on commit b36da9c

Please sign in to comment.