-
Notifications
You must be signed in to change notification settings - Fork 716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
resolvConf value ignored if systemd-resolved active - override value exhibits race condition between kubelet and systemd-networkd #2111
Comments
the kubelet is removing deprecated flags soon, so we need to stop adding xref #949 |
/assign |
Indeed. The original logic, however, still needs to be kept the same. That is trying to set However, if no stub overwrite is provided, the original race between the kubelet and systemd-resolved would still remain. |
So it seems that we are missing these from the systemd service file for the kubelet.
|
@hickeng about the race between kubelet and systemd-resolved, PhotonOS 3.0 uses systemd v239. Digging through the source code of that version of systemd, I came to the conclusion that it's unlikely for a partial Relevant systemd source code can be found here: |
Completely agree - this isn’t a partially written file being accessed, it’s a file with contents that are not what we expect.
I don’t think it’s that it cannot detect DNS, but that it clears the DNS and reapplies them. Quite possible this window doesn’t exist with a static IP/DNS config. The problem is that kubelet can create a container in that time frame and that container gets an empty resolve.conf which is not fixed up when systemd updates the conf. While this isn’t an easy thing to fix I think it’s still an issue, particularly as the network being restarted is likely to induce restarts of workloads that use DNS, and networking meaning probable surge of newly created containers over the period of interest.
|
The code here seems not correct: https://github.com/kubernetes/kubernetes/blob/v1.28.2/pkg/kubelet/network/dns/dns.go#L225-L274 If the Then, if the https://github.com/kubernetes/kubernetes/blob/v1.28.2/pkg/kubelet/network/dns/dns.go#L316-L317 Then, containerd will copy the Finally, that will cause forwarding loop to coredns. |
kubelet bugs must be reported in kubernetes/kubernetes. |
Okay, I created: kubernetes/kubernetes#120748 |
What keywords did you search in kubeadm issues before filing this one?
resolvConf, resolv-conf, resolved, dns
Is this a BUG REPORT or FEATURE REQUEST?
BUG REPORT
Versions
kubeadm version (use
kubeadm version
):Environment:
kubectl version
):uname -a
):What happened?
When systemd-resolved is enabled
kubeadm
ignores the value specified inresolvConf
in favour of the systemd managed file/run/systemd/resolve/resolv.conf
.The specific value used in this case was:
This is a problem for two reasons:
/run/systemd/resolve/resolv.conf
introduces a race betweensystemd
andkubelet
. We have observed intermittent instances of containers being created with/etc/resolv.conf
(inside the container) only containing the leading comment block but no DNS entries. Hypothesis is that kubelet is racing with systemd regenerating the file.On an environment with DHCP configured DNS running
systemctl restart systemd-networkd
in a separate shell generates the following output. It can be seen that there are multiple (7 in this case) steps in regenerating this file, and all but the last are missing the DNS servers.What you expected to happen?
kubeadm
honours the explicit value when present in the config.kubeadm
documents the race with systemd-networkd, or choses a different means of supplying DNS.How to reproduce it (as minimally and precisely as possible)?
In a system with systemd-resolved enabled specify
resolvConfig
inkubeadm.yaml
The generated
/var/lib/kubelet/kubeadm-flags.env
file contains:--resolv-conf=/run/systemd/resolve/resolv.conf
instead of:
--resolv-conf=/run/systemd/resolve/stub-resolv.conf
Anything else we need to know?
https://github.com/kubernetes/kubernetes/blob/8d8aa39598534325ad77120c120a22b3a990b5ea/cmd/kubeadm/app/phases/kubelet/flags.go#L113
This behaviour was added in kubernetes/kubernetes#64665
The text was updated successfully, but these errors were encountered: