-
Notifications
You must be signed in to change notification settings - Fork 353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
k0s and swap: Pods got swapped but memory-pressure taint triggers. #3830
Comments
Yes, the memory-pressure taint is managed by kubelet. It's futile to try to remove it. Moreover, when kubelet deems the node is under pressure, it will start to evict pods, no matter if the taint is on the node or not, or if the pods tolerate it.
Not sure what you mean by this. Memory/RAM are not the same as swap space, simply adding more swap doesn't mean that there's more entire RAM? In what regard would k0s recognize this?
I've never tried to enable swapping for a Kubernetes node, but I can instantly imagine that this will be extremely tricky to configure. First of all, after a quick glance at the Kubernetes docs, I think the fundamental things are the ones that you already did: Enabling the
The culprit here is that 1 GiB is not much memory in the fist place to run a Kubernetes control plane plus workloads. For once, I assume you don't plan to run an HA control plane, so you might want to use kine instead of etcd. This will be lighter on resources. Then you probably need to tell the kernel to swap more aggressively, and the kubelet that it should be even more conservative about its eviction thresholds. Otherwise swapping will kick in too late to save the pods from being evicted. Also, you might want to limit the workloads that are run on the controller by re-enabling the master taints (remove Also have a look at kubernetes/kubernetes#120800. There's some discussions around this, including some examples in how to test this.
You definitely want to run a single k0s process. The enable-worker flag is exactly for that purpose. Having two processes will only add an annoying lot of extra config trouble without any benefit. Besides that it will need more memory, too. |
Hello @twz123
Right. So it's by design of Kubernetes.
Physically aren't the same for sure, but for Kernel, it's allocatable same way. Kubernetes apparently reads only Physical memory. Since new Kubernetes allow swap usage, their readings must relly on swap too when NodeSwap is enabled - as example. But thinking twice, I agree it's not a k0s issue, but upstream issue.
Done
This one I did'nt found before. I've changed my parameters to:
And in fact swap started to be more used. I think the eviction settings is more related to low physical RAM (Since kubernetes does not look at Phy+Swap) than the swap enablement itself.
After checking eviction thresholds, services started to get up.
Already checked
I agree. I've removed
This is a two-sided coin issue: This cluster in specific is to get a running cluster on Rpi3 (Running + something reasonable - just to adjust scales) and experiment what works, what does not, so I tend to keep etcd to test HA sometime. But I also agree that use kine plus some lightweight backend is the best choice. For now, a setup with MetalLB + Jiva behaves like this: So I kept this record and agree with you, but for science and for now i'll keep etcd.
I've applied the following to entire cluster:
Do you consider the memory hard eviction configuration I did as enough?
Did it and in fact thing got better. Also, keep master alone is a more realistic scenario and if it's needed, Rpi3 are cheap today :)
I'll sometime try to replicate it. I agree with @iholder101 and I'll try to replicate it. Resuming: This is in fact a upstream question. Since there is a way to configure it on k0s, it's possible to document it in https://docs.k0sproject.io/ ? If you don't mind, I can also draft this documentation to be inserted (As a dedicated page for swap configuration, or in the rpi section as notes for rpi3 and low ram or another page you see it's better). |
I'm not an expert on this. I think it's up to you to experiment with these settings until you find something that works reasonably well.
That would be awesome, of course. It's always a great help to write down things like that. |
Trying to follow this, so once you edited your evictionHard settings, you were no longer seeing any problems? |
The issue is marked as stale since no activity has been recorded in 30 days |
@leleobhz I'm closing this for now. If you ever find the time and inclination to write up a little tutorial with your findings, that would be splendid, of course. |
I'm still ongoing testing all configurations (And I even changed from DietPI because some networking issues). I'll keep here in touch and send here if I have any progress. Thanks! |
Before creating an issue, make sure you've checked the following:
Platform
Version
v1.28.4+k0s.0
Sysinfo
`k0s sysinfo`
What happened?
Hello!
I'm running k0s in a 3 Raspberry Pi 3 Cluster (3 rpi, 1 controller+enable-worker + 2 workers) and I get a strange behavior about Swapping.
Swap enablement was discussed at #1524 (And excluding some typos, it works) and pods running on workers are allowed to run on swap, controller node - with or without enable-worker - still getting tainted:
root@pi0:~# k0s kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints
And this is crashing pods also:
root@pi0:~# root@pi0:~# k0s kubectl get pods -n metallb-system controller-786f9df989-tb58j -o json
Trying to remove this taint immediately is not effective because taint get applied instantly after removal and I can't figure how to allow pods to be scheduled on master+worker node, since k0s recognize entire RAM (Phy + Swap) but container and taint engines cannot:
root@pi0:~# k0s kubectl describe nodes/pi0.fqdn
Also, controller node is using the following systemd config:
root@pi0:~# systemctl cat k0scontroller.service
How can I tell to Kubernetes on controller to consider swap memory on controller node too? Its a good way to run on same host a controller unit separated from worker unit or I can keep with
--enable-worker
too?The text was updated successfully, but these errors were encountered: