Skip to content

Commit

Permalink
Fix Kubelet starting before hostname set on FCOS AWS
Browse files Browse the repository at this point in the history
* Fedora CoreOS `kubelet.service` can start before the hostname
is set. Kubelet reads the hostname to determine the node name to
register. If the hostname was read as localhost, Kubelet will
continue trying to register as localhost (problem)
* This race manifests as a node that appears NotReady, the Kubelet
is trying to register as localhost, while the host itself (by then)
has an AWS provided hostname. Restarting kubelet.service is a
manual fix so Kubelet re-reads the hostname
* This race could only be shown on AWS, not on Google Cloud or
Azure despite attempts. Bare-metal and DigitalOcean differ and
use hostname-override (e.g. afterburn) so they're not affected
* Wait for nodes to have a non-localhost hostname in the oneshot
that awaits /etc/resolve.conf. Typhoon has no valid cases for a
node hostname being localhost (not even single-node clusters)

Related Openshift: openshift/machine-config-operator#1813
Close #765
  • Loading branch information
dghubble committed Jun 19, 2020
1 parent 90e23f5 commit 4cfafea
Show file tree
Hide file tree
Showing 3 changed files with 9 additions and 2 deletions.
5 changes: 5 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,11 @@ Notable changes between versions.

* Use `strict` Fedora CoreOS Config (FCC) snippet parsing ([#755](https://github.com/poseidon/typhoon/pull/755))

#### AWS

* Fix Kubelet service race with hostname update ([#766](https://github.com/poseidon/typhoon/pull/766))
* Wait for a hostname to avoid Kubelet trying to register as `localhost`

### Flatcar Linux

* Use `strict` Container Linux Config (CLC) snippet parsing ([#755](https://github.com/poseidon/typhoon/pull/755))
Expand Down
3 changes: 2 additions & 1 deletion aws/fedora-coreos/kubernetes/fcc/controller.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -38,11 +38,12 @@ systemd:
enabled: true
contents: |
[Unit]
Description=Wait for DNS entries
Description=Wait for DNS and hostname
Before=kubelet.service
[Service]
Type=oneshot
RemainAfterExit=true
ExecStartPre=/bin/sh -c 'while [ `hostname -s` == "localhost" ]; do sleep 1; done;'
ExecStart=/bin/sh -c 'while ! /usr/bin/grep '^[^#[:space:]]' /etc/resolv.conf > /dev/null; do sleep 1; done'
[Install]
RequiredBy=kubelet.service
Expand Down
3 changes: 2 additions & 1 deletion aws/fedora-coreos/kubernetes/workers/fcc/worker.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,12 @@ systemd:
enabled: true
contents: |
[Unit]
Description=Wait for DNS entries
Description=Wait for DNS and hostname
Before=kubelet.service
[Service]
Type=oneshot
RemainAfterExit=true
ExecStartPre=/bin/sh -c 'while [ `hostname -s` == "localhost" ]; do sleep 1; done;'
ExecStart=/bin/sh -c 'while ! /usr/bin/grep '^[^#[:space:]]' /etc/resolv.conf > /dev/null; do sleep 1; done'
[Install]
RequiredBy=kubelet.service
Expand Down

0 comments on commit 4cfafea

Please sign in to comment.