From 5fbbda50bd9251ad963fd99bd62ddce3ebd7024d Mon Sep 17 00:00:00 2001 From: Jakob Meng Date: Thu, 26 Oct 2023 16:42:26 +0200 Subject: [PATCH 1/3] OCPBUGS-22453: Fixed systemd-resolved's split dns config in OKD/FCOS OCP requires DNS records api. and *.apps.\ to be externally resolvable ( is .). For SNO this list also includes DNS record api-int.. However, OCP does not enforce ownership of all subdomains of . For example, it is allowed to host a disconnected image registry at . and OCP shall be able to resolve it using the user-supplied external DNS resolver. PR #7516 changed the systemd-resolved config of the bootstrap node / rendezvous host to associate the complete with the DNS server at 127.0.0.1 where CoreDNS is supposed to be listening. When a disconnected image registry is used for cluster installation, the registry is hosted at . and the bootstrap node / rendezvous host does not retrieve its domain from the DHCP server, then the registry's DNS name cannot be resolved. That is because in order to pull the CoreDNS image, the disconnected registry must be connected. The split dns mechanism of systemd-\ resolved would cause it to send DNS requests for . to 127.0.0.1 where CoreDNS is expected to be running which is not. When a bootstrap node / rendezvous host retrieves its domain from a DHCP server (e.g. dnsmasq's '--domain' option) then systemd-resolved would associate not only with 127.0.0.1 but also with the physical network interface, causing DNS requests for . to be send out to 127.0.0.1 as well as the external DNS resolver. This patch mitigates the DNS issue for other network setups. It changes the systemd-resolved config to forward DNS requests to CoreDNS only for domains which are resolvable by CoreDNS: * api. * api-int.. * apps. DNS requests for . and other subdomains of will be send out to the external DNS resolver. Fixes #7516 --- .../dispatcher.d/30-local-dns-prepender.template | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/data/data/bootstrap/baremetal/files/etc/NetworkManager/dispatcher.d/30-local-dns-prepender.template b/data/data/bootstrap/baremetal/files/etc/NetworkManager/dispatcher.d/30-local-dns-prepender.template index 85c136fb08..192895335c 100755 --- a/data/data/bootstrap/baremetal/files/etc/NetworkManager/dispatcher.d/30-local-dns-prepender.template +++ b/data/data/bootstrap/baremetal/files/etc/NetworkManager/dispatcher.d/30-local-dns-prepender.template @@ -25,7 +25,8 @@ EOF mkdir -p /etc/systemd/resolved.conf.d echo "[Resolve]" > /etc/systemd/resolved.conf.d/60-kni.conf echo "DNS=$DNS_IP" >> /etc/systemd/resolved.conf.d/60-kni.conf - echo "Domains={{.ClusterDomain}}" >> /etc/systemd/resolved.conf.d/60-kni.conf + echo "Domains=api.{{.ClusterDomain}} api-int.{{.ClusterDomain}} apps.{{.ClusterDomain}}" >> \ + /etc/systemd/resolved.conf.d/60-kni.conf if systemctl -q is-active systemd-resolved; then >&2 echo "NM resolv-prepender: restarting systemd-resolved" systemctl restart systemd-resolved From 715bb1c5bcab815e76938038e1c7155c48cbe334 Mon Sep 17 00:00:00 2001 From: Jakob Meng Date: Wed, 13 Sep 2023 14:56:11 +0200 Subject: [PATCH 2/3] OCPBUGS-19303: Changed OKD/FCOS workaround to also support Agent-based Installer OKD/FCOS uses FCOS as its bootimage, i.e. when booting cluster nodes the first time during installation. FCOS does not provide tools such as OpenShift Client (oc) or crio.service which Agent-based Installer uses at the rendezvous host, e.g. to launch the bootstrap control plane. RHCOS and SCOS include these tools, but FCOS has to pivot the root fs [1] to okd-machine-os [2] first in order to make those tools available. Pivoting uses 'rpm-ostree rebase' but the rendezvous host is booted the first time the node boots from a FCOS Live ISO where the root fs and /sysroot are mounted read-only. Thus 'rpm-ostree rebase' fails and necessary tools will not be available, causing the setup to stall. Until rpm-ostree has implemented support for rebasing Live ISOs [3], this patch adapts the workaround for SNO installations [4] to also support Agent-based Installer. In particular, the Go conditional {{- if .BootstrapInPlace }} which is used to mark a SNO install has been replaced with a shell if-else which checks at runtime whether the system is launched from are on a Live ISO. Most code in the OpenShift ecosystem is written with RHCOS in mind and often assumes that tools like oc or crio.service are available. These assumptions can be satisfied by applying this workaround to all Live ISO boots. It will not remove functionality or overwrite configuration files in /etc and thus side effects should be minimal. The Go conditional {{- if .BootstrapInPlace }} in the release-image-\ pivot.service has been dropped completely. This service is only used in OKD only, so OCP will not be impacted at all. The 'Before=' option will not cause systemd to fail if a service does not exist. So, in case bootkube.service or kubelet.service do not exist, the option will have no effect. When bootkube.service or kubelet.service do exist, it must always be ensured that release-image-pivot.service is started first because it might reboot the system or change /usr in the Live ISO use case. So it is safe to drop the Go conditional and ask systemd to always launch release-image-pivot.service before bootkube.service and kubelet.service. [0] https://github.com/openshift/installer/blob/master/data/data/bootstrap/files/usr/local/bin/bootkube.sh.template [1] https://github.com/openshift/installer/blob/master/data/data/bootstrap/files/usr/local/bin/bootstrap-pivot.sh.template [2] https://github.com/openshift/okd-machine-os [3] https://github.com/coreos/rpm-ostree/issues/4547 [4] https://github.com/openshift/installer/pull/7445 --- .../usr/local/bin/bootstrap-pivot.sh.template | 60 +++++++++++++------ .../common/units/kubelet.service.template | 4 +- .../release-image-pivot.service.template | 4 -- 3 files changed, 45 insertions(+), 23 deletions(-) diff --git a/data/data/bootstrap/files/usr/local/bin/bootstrap-pivot.sh.template b/data/data/bootstrap/files/usr/local/bin/bootstrap-pivot.sh.template index 7ef780be48..18d9be2be5 100644 --- a/data/data/bootstrap/files/usr/local/bin/bootstrap-pivot.sh.template +++ b/data/data/bootstrap/files/usr/local/bin/bootstrap-pivot.sh.template @@ -42,25 +42,51 @@ if [ ! -f /opt/openshift/.pivot-done ]; then record_service_stage_start "rebase-to-okd-os-image" {{if .IsFCOS -}} mnt="$(podman image mount "${MACHINE_OS_IMAGE}")" -{{- if or (.BootstrapInPlace) (eq .Invoker "agent-installer") }} - # SNO setup boots into Live ISO which cannot be rebased - # https://github.com/coreos/rpm-ostree/issues/4547 - mkdir /var/mnt/{upper,worker} - mount -t overlay overlay -o "lowerdir=/usr:$mnt/usr" /usr - mount -t overlay overlay -o "lowerdir=/etc:$mnt/etc,upperdir=/var/mnt/upper,workdir=/var/mnt/worker" /etc - systemctl daemon-reload - # Workaround for SELinux denials when launching crio.service from overlayfs - setenforce Permissive + # The bootstrap host during SNO installation and the rendezvous host of Agent-based Installer both boot into a Live + # ISO which cannot be rebased. Until rpm-ostree supports this live rebase [0], the following workaround will mount the + # proper OKD/FCOS Machine OS image over the existing mount at /usr and copy new config files to /etc. + # [0] https://github.com/coreos/rpm-ostree/issues/4547 + if grep -q coreos.liveiso= /proc/cmdline; then + mount -t tmpfs -o size=50% none /var/mnt/ + rsync -aHAXx "$mnt/" /var/mnt/ + mount -t overlay overlay -o lowerdir=/usr:/var/mnt/usr /usr + rsync -rlt --ignore-existing /var/mnt/etc/ /etc/ - systemctl start crio.service - # No reboot necessary because SNO setup will reboot system -{{ else }} - pushd "${mnt}/bootstrap" - # shellcheck disable=SC1091 - . ./pre-pivot.sh - popd -{{ end -}} + # Agent-based Installer will launch a ephemeral control plane at the rendezvous host which will create and publish + # Ignition configs for the other master nodes. These Ignition configs must match what the in-cluster control plane + # would generate else machine config operator will fail [0]. Because the rendezvous host is booted with a FCOS Live + # ISO without any OKD/FCOS related changes, we have to copy the manifests from OKD Machine OS manually to the + # bootstrap manifests folder of the rendezvous host. + # [0] https://access.redhat.com/solutions/4970731 + mkdir -p /var/opt/openshift/manifests + cp -av /var/mnt/manifests/*.* /var/opt/openshift/manifests/ + + # Load new systemd unit files and configuration such as crio.service after mounting the content of OKD/FCOS Machine + # OS over /usr and copying new files to /etc + systemctl daemon-reload + + # CoreDNS fails to listen to 127.0.0.53:53 when Agent-based Installer boots its the rendezvous host with a Fedora + # CoreOS bootimage because by default FCOS' systemd-resolved already listens to this port. OKD/FCOS disables + # resolved's stub listener [0] but the resolved must be restarted for this setting to take effect. + # [0] https://github.com/openshift/okd-machine-os/blob/master/overlay.d/99okd/etc/systemd/resolved.conf.d/okd-no-dns-stub.conf + systemctl restart systemd-resolved.service + + # Workaround for SELinux denials when launching crio.service from overlayfs + setenforce Permissive + + # crio.service is not part of FCOS but of OKD Machine OS. It will loaded after systemctl daemon-reload above but has + # to be started manually + systemctl start crio.service + + # No reboot necessary because setup will reboot the system automatically + else + pushd "${mnt}/bootstrap" + # shellcheck disable=SC1091 + . ./pre-pivot.sh + popd + fi + record_service_stage_success {{else if .IsSCOS -}} chmod 0644 /etc/containers/registries.conf rpm-ostree rebase --experimental "ostree-unverified-registry:${MACHINE_OS_IMAGE}" diff --git a/data/data/bootstrap/systemd/common/units/kubelet.service.template b/data/data/bootstrap/systemd/common/units/kubelet.service.template index 092d4c8e6e..a19c998c60 100644 --- a/data/data/bootstrap/systemd/common/units/kubelet.service.template +++ b/data/data/bootstrap/systemd/common/units/kubelet.service.template @@ -1,7 +1,7 @@ [Unit] Description=Kubernetes Kubelet -Wants=rpc-statd.service crio.service release-image.service -After=crio.service release-image.service +Wants=rpc-statd.service crio.service release-image.service{{if .IsOKD}} release-image-pivot.service{{end}} +After=crio.service release-image.service{{if .IsOKD}} release-image-pivot.service{{end}} [Service] Type=notify diff --git a/data/data/bootstrap/systemd/common/units/release-image-pivot.service.template b/data/data/bootstrap/systemd/common/units/release-image-pivot.service.template index 2a12fdde25..fd3763f44c 100644 --- a/data/data/bootstrap/systemd/common/units/release-image-pivot.service.template +++ b/data/data/bootstrap/systemd/common/units/release-image-pivot.service.template @@ -3,11 +3,7 @@ Description=Pivot bootstrap to the OpenShift Release Image Wants=release-image.service After=release-image.service -{{- if or (.BootstrapInPlace) (eq .Invoker "agent-installer") }} Before=bootkube.service kubelet.service -{{ else }} -Before=bootkube.service -{{ end -}} [Service] Type=oneshot From f8d29c5a5057de63655052d40cb8906a15fa4c5e Mon Sep 17 00:00:00 2001 From: Jakob Meng Date: Mon, 30 Oct 2023 19:38:51 +0100 Subject: [PATCH 3/3] [DNM] Debug output --- .../bootstrap/files/usr/local/bin/bootstrap-pivot.sh.template | 3 +++ 1 file changed, 3 insertions(+) diff --git a/data/data/bootstrap/files/usr/local/bin/bootstrap-pivot.sh.template b/data/data/bootstrap/files/usr/local/bin/bootstrap-pivot.sh.template index 18d9be2be5..250b93c9e2 100644 --- a/data/data/bootstrap/files/usr/local/bin/bootstrap-pivot.sh.template +++ b/data/data/bootstrap/files/usr/local/bin/bootstrap-pivot.sh.template @@ -1,6 +1,7 @@ {{if .IsOKD -}} #!/usr/bin/env bash set -euo pipefail +set -x # Exit early if pivot is attempted on SCOS Live ISO {{if .IsSCOS -}} @@ -66,6 +67,8 @@ if [ ! -f /opt/openshift/.pivot-done ]; then # OS over /usr and copying new files to /etc systemctl daemon-reload + systemctl status systemd-resolved.service || true + # CoreDNS fails to listen to 127.0.0.53:53 when Agent-based Installer boots its the rendezvous host with a Fedora # CoreOS bootimage because by default FCOS' systemd-resolved already listens to this port. OKD/FCOS disables # resolved's stub listener [0] but the resolved must be restarted for this setting to take effect.