-
Notifications
You must be signed in to change notification settings - Fork 273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kube-proxy pod fails to run due to unmounted volumes #4864
Comments
You're correctly using The kubelet log indicates that it's unable to complete a syncronization of pods, either from the apiserver, or the static pod manifests. Note that the seenSources map is empty:
It doesn't ever seem to complete synchronizing the pod sources, it doesn't know what to create or delete. Are you able to reproduce this if you don't change the cpu manager policy, but just run with the default? I see it changing several times in the logs. Can you provide an updated copy of your config.yaml, as well as your containerd.config.toml if you're customizing the containerd config? |
kubelet.log.gz uploaded all and fresh kubelet.log. the node kub-b9 is still broken, so I can test everything. |
i had something like this happen this morning with a 12 node cluster. Turns out that i had set the event-config according to the CIS guide and my misconfigured longhorn deployment was spamming the scheduler and therefore it never got to the fact to restart it. (at least thats my guess) After i removed the longhorn deployment and deleted the proxies i was able to get everything up and running by restarting the rke2 service through node shell. Can you relate to that? you would see something spam in the apiserver logs on the master nodes. |
I do not have longhorn installed. |
I have been doing some tracing of this issue, and it appears that there is a race condition in the kubelet startup, where it will fail to start a static pod if the static pod's mirror pod is discovered from the apiserver at the same time volumes are being mounted. I don't have a good fix for it at the moment, other than removing the kube-proxy static pod from disk prior to starting rke2 - this seems to delay the static pod creation long enough to avoid triggering the race condition. Unfortunately I haven't been able to reproduce the issue myself, so I am relying on some teammates to provide logs from an environment where it does seem to reoccur intermittently. |
@brandond as one of our nodes is still in this state - kube-proxy does not start - I can try/trace anything needed here (I could provide you even remote ssh access). |
Can you try deleting |
Yes, this helps on a different node, but not on kub-b9, I noticed this, that if I removed the manifest, kube proxy starts fine on other nodes, I thought it is because a new UID is generated.. Did you mean 'remove from disk' removing content from containerd or the abovementioned manifest? I could try the first, but if it fixes the node, we lose the testing case.. |
Just do exactly as I said and delete the file. If that does not resolve it, you may try also running |
even after few minutes, after |
Im pretty sure that if your event-config is setup to be super low and you have a daemonset failing rapidly this happens by itself after 24h. There was nothing else "wrong" with my cluster. As for the solution for me:
|
I'm sorry, you're saying that for some reason the kube-proxy static pod wasn't running because of an issue with an unrelated daemonset? Or are you talking about one of your daemonset pods not running? |
Im sorry if i did not make myself clear, but what happend for me was that the apiserver was spammed and full of logs of that daemonset (longhorn) and that i had the exact same problem (kube-proxy failed, not comming back up) with the same kubernetes version as OP. My current guess is that it has something to do with the admission config, more specifically the event limit that comes with cis-conformity. Or in short apiserver spammed kube-proxy goes pufff and does not come back Further info:
after that everything just worked without issue while i was trying before to restart the server (with the reboot too) and nothing was running. |
Hey, I'm new to rancher and it was not a really smooth experience to have such issues. |
what rke2 version do you use? |
I'm using v1.26.9+rke2r1 on an ubuntu 22.04 cloud image. |
version 1.26.7+rke2r1 should work fine. But remove everything from and perhaps you can remove the |
@TravDev01 I have never seen this issue on a fresh install. I have seen what LOOKS like a similar issue on nodes that do not have sufficient CPU or memory resources available - the control-plane fails to start completely due to lack of resources, which blocks kube-proxy from running. Can you provide the rke2-server log from journald, as well as information on the core count and memory size on your node? |
Oh wow. I double-checked my resource configuration and noticed that proxmox was resetting it when I restored my snapshot between my various tests. Might it be beneficial to have a warning message in the log during the first bootstrap, when a user tries to create a cluster with fewer than the recommended resources? Thanks a lot! |
I don't know they there is an issue tracking this upstream, but the kubelet doesn't give any feedback if static pods fail to schedule. You have to turn the log up to like v=4 or higher if I remember correctly, and then there is a message complaining about scheduling and resources. |
Interesting! so this should solve it from the config?
I just had it happen today again on 4 nodes from 12. What strikes me as odd is that i can, the first time of two delete the kube-proxy from the node without issue, but when i restart rke2, the proxy still remains in pending. So i killed it with:
but i still have no change, its still terminating. Further investigation:
On the same node, in the middle of the night i interestingly have a restart:
|
The service was explicitly stopped, can you tell why that happened? |
Well nothing really i had everything up and running that night. That day i upgraded the masters for more cpu and ram so that wasnt it. My current estimate ist etcd memory or networking pressure. Even now i only have been able to recover one node. I have two in proxy "pending" and two in proxy "terminating". The masters are explicitly CriticalAddonsOnly=true:NoExecute though so the only thing running on them actually is rke2-server. I have like 110 pods on those 12 servers and plenty of ram and cpus to go around for all. My current guesstimate is that i have bad neighours on my droplets. But having three of them is like winning the lottery... On the etcd pods i find these entries:
so my request also seem to take too long even though they are running on 8gp 4 cpu droplets |
I still have not found a way around this even though i upgraded to more cpu based masters with dedicated vcpu instead of shared cores. Again this happened to two out of 12 nodes New Notes:
Logs:
containerd.log around the same time:
after that calico is just never comming back. i dont actually know what is to blame. these are specific nodegroups with specific load types so running out of memory or cpu should be out of the question. There is a scheduled kured reboot, but its configured to be on Sunday not monday night. Im currently investigating if using calico and bgp will allow me to deactivate the kube-proxy all together. but i had tried that during initial setup and could not get it working. Other notworthy things:
ContainerD logs for the one where it did not work:
When that happens i get the big hammer: and restart the service for the kube-proxy to come back. |
So i have been able to lock this down further:
because the kube-proxy does not come back the rest of the system remains pending or restarting or crashing since they cant retrieve the kubernetes service, because its ip is managed by the proxy and henceforth nothing really works. The uptime of the systems suggest they are properly restarted and all my nodes are also configured for a gracefull shutdown of 60 seconds. Solutions so far:
|
Sometimes this still doesnt work, and afterwards the kube-proxy remains pending: Also deleting the pod without force does not work:
That is stuck until i cancel it. (and the pod is now not failed but stuck terminating) When i do it with force the command runs but still its not able to actually acces the volumemounts on its local disk and will go back to pending. at that point (which does not happen always) only force deleting the pod, deleting the container with crictl and actually restarting again seems to fix it. but there does not seem to be a reasonable way to script all of that except an extra operator :( |
We are experiencing the same issue, kube-proxy (v1.26.8-rke2r1-build20230824) stuck and only this will fix it (temporarily)
Happens across different types of nodes, all running Ubuntu 22.04 Sometimes we see these events on the kube-proxy:
|
@stephanbertl can you provide information on anything that might be unique about your environment? Being able to reproduce this on demand would be very helpful in isolating the root cause. What is going on prior to kube-proxy getting stuck? How many cpus / how much memory do these nodes have? Is this happening only on agents, or also on servers? |
It should only get a new UID when the configuration is changed, RKE2 is upgraded, or an etcd restore is done (on servers running etcd). The pod manifest should NOT change following a restart of the service or reboot of the node with no other change to state. The UID change is necessary to update the mirror pod in the apiserver - if the UID does not change, the mirror pod displayed in the apiserver will not be updated. The kubelet SHOULD NOT be trying to run the pod based on the mirror pod spec. The desired state of the pod 100% comes from the static pod manifest, and the pod resource displayed in the apiserver is a read-only "mirror" of that. I agree with you that there appears to be a timing element to this, that's what I was referring to in #4864 (comment) |
This is now every sunday to me. removing the manifest and rebooting works most of the time. Overall i do believe though that the kube-proxy should be a daemonset again. E:
does not even work. personally i blame: #3752 which is simply overcautious and wrong. Either have a daemonset or do proper cleanup. You are optimizing for the case that you need to restart rke2 without having run updates and you could easily provide a flag for that. the potential service disruption is also limited to maybe 30 seconds. |
It looks like kubernetes/kubernetes#122211 won't be available until the January patch releases.
Thanks? I don't believe it's either. It should be possible to update the static pod manifest at any time. If the kubelet can be forced out of sync by writing the file at the wrong time, then there's a bug in the kubelet. We don't want random rke2 service restarts triggering static pod restarts if the config has not changed; our goal is to leave the components undisturbed to the largest extent possible. |
First of all if you feel personally attacked, that was not my intention and i very much value your work and can only aspire to contribute as much to this project as you do someday. I am only trying to voice my opinion to get around this problem as it literally haunts me for christmas. My pods are not stuck in "terminating" they simply fail and dont come back. They are only stuck in terminating once i try to delete the failed pod through kubectl. The solution to delete the static manifest and then restart the service has worked for me without fail over the course of the last two days. I believe this is two issues interacting with each other and that it should be expected behavior that when i restart rke2 that the static pod it provides do so as well as, and you may correct me here, the only usecase where the pods should not restart when the service restartds are minor configuration changes, which may occour yearly? Proper pod budgeting might even completly work around the possible downtime ( i have not tried that with static pods) I dont believe that upstream has this problem in the same way, as the proxy is operated by a daemonset and therefore a pod with another container id shall simply be spawned thats that. I do understand the decision making as to why rke2 uses a static pod and i actually do support ist because having different data directories for server and agent make it really obvious what im currently working on too, but then again the proxy should simply be recycled on restart. I also dont think that the kubelet fails, because it does provision CSI and also networking daemonsets just fine. It just happens that though of course depend on the kube.proxy and therefore crashloop. If we would not be talking about running production clusters, i would simply switch to a kube-proxy less calico setup, but its actually best practise to use both still. Or maybe i simply dont understand why there would be "random rke2 service restarts" at all if the do not point to a larger problem. |
Im trying now a new strategy using kured where i delete the static pod pre reboot, ill let you know how it works out for me next sunday:
|
So we've run into this in one of my customer's environments and my suggestion was to add a script via bootstrap that runs every few minutes (per cronjob) that will check if kube-proxy is running. If not, it would just run the config removal and agent restart:
@jhoelzel I'm not sure I fully understand the problem, so please forgive me if this is a stupid suggestion, but could this work for you (or am I missing something obviously wrong with the idea that I should note to my customer?) |
To add some information: after updating RKE2 minor version from 1.26.8 to 1.26.11, all pods get updated except, kube-proxy pods on agent nodes (server node pods get updated). They stay in their old version. They also show this:
The only fix is to manually remove the static pod manifest. Then the new image is used.
|
FYI - just ran into exactly this problem as well during the upgrade of RKE2 1.25.16 to 1.26.11. This workaround helped for a moment:
-> in my case this issue did not show up on the "rke2-server" side of things - just on the "rke2-agent" side (all agents).. Would it make sense to add this issue to the list of known issues in RKE2 1.26? |
So just an update, the addition of the simple bash script that checks if kube-proxy is running and if not, runs the config removal and agent restart successfully worked around the problem and I've not been made aware of any negative repercussions |
I think its the best solution we have currently. my "kured" prexecution script only made it worse and I am back to manual upadtes for now. We will see, hopefully brandond is right and this will show up with then next patch release: kubernetes/kubernetes#122211 |
Hi everyone! I can reproduce the problem with v1.25.16+rke2r1 and v1.26.13+rke2r1. Workaround: rke2-killall and service restart. |
Hi, We are also facing the problem when upgrading from v1.25.14 to v1.26.14. |
Community please feel free to try and test these fixes (not in prod) as I am unable to try to reproduce the original errors. I can see though that the agent process is pulling and rewriting the kube-proxy.yaml on reboots but we already have this behavior on existing releases so it isn't net new. Are you all able to divulge how many volumes you have mounted in these large clusters? Are you able to expand on how resource constrained the nodes may be? I'm trying to figure out where the bottle neck is exactly to reproduce but can't do so with my limited testing environments. https://www.suse.com/fr-fr/support/kb/doc/?id=000021284 // existing behavior - I basically have followed the steps associated with getting the various PVs in a cluster (trying to orphan them...) then intentionally wrecking the kube-proxy.yaml then immediately rebooting. //after the reboot
$ kg pv,pvc -o wide -A
|
Those of you who can, just use cilium in kube-proxy-less mode. I didnt really have any luck with kube-proxy-less calico, but cillium worked right out of the box. |
I'm going to close this out based on the fixes made and repro attempts, but we can reopen if it is still noticed after these releases: v1.29.4, v1.28.9, and v1.27.13. |
I upgraded my cluster to 1.27.14 and the issue is gone. |
To add to the conversation, I have recently tried cilium in kube-proxy-less mode with a Rancher v2.7.6 environment (and all the rk2+other versions that results in), and I found Cilium extremely unstable. It would crash itself every single time (100% of the time) in a random time range between about 5mins and 1h30min. It would never fix itself, and to "fix" it would take extreme efforts, and I'm not even sure what the consistent steps to take would be. As such I actually highly recommend against Cilium in kube-proxy-less mode as it seems to be unreliable code. I also want to add that there were zero consistencies in the logs whenever it happened, so I was completely unable to determine the root cause, let alone what to do about it. Hence I've recently switched back to trying to use Calico + eBPF to solve my SourceIP problem. But I am having issues with that reaching the kubernetes api-server VIP, and the somewhat maybe working workaround is to use localhost/127.0.0.1 but then calico-kube-controllers never succeeds (despite everything else seeming to work). So I'm trying to figure out the solution to that. I might just bump my Rancher install to v2.8.x and try what @stephanbertl mentioned and upgrade test cluster nodes to 1.27.14 (or higher). Because higher numbers are always better, right? Anyways... my 3 year struggle to solve SourceIP problems continues... I feel I'm almost at the end of it, but hoo boy is the actual Calico and Rancher RKE2 official documentation not producing the results they say they would in this regards! |
Environmental Info:
RKE2 Version: 1.26.9+rke2r1
Node(s) CPU architecture, OS, and Version:
Linux kub-b9.priv.cerit-sc.cz 6.2.0-33-generic #33~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Sep 7 10:33:52 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
Cluster Configuration:
5 servers, 30 agetns
Describe the bug:
Sometimes it happens that
kube-proxy
is unable to start. I can see that sandbox should be present:However, it is not actually running as it cannot be stopped:
It can be force removed:
However, it does not run again.
service rke2-agent restart
does not spawn thekube-proxy
pod again.kubelet log is attached.
kubelet.log.gz
The text was updated successfully, but these errors were encountered: