-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set soft memory limits to 80% of hard limits #4284
Set soft memory limits to 80% of hard limits #4284
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trigger Eden tests.
I'm afraid soft limit in kernel is not what we can expect from similar "soft limit" GC Golang runtime. Memory reclaim always happens prior the drastic measures taken (OOM killer) when hard limit is hit, and if memory was reclaimed successfully, then OOM killer is not invoked (because we still stay below hard limit, we just reused page taken from another place). Soft limit is more about efficient memory utilization, for example soft limit is reached and that means that this cgroup will be chosen for memory reclamation in case of global OOM under memory pressure (and not hard limit hit). Or there are several cgroups competing for memory and one has lower soft limit, which again makes it a target for memory reclamation for another cgroup, which is under memory pressure. But I'm afraid we can't expect that when soft limit is reached then kernel does some magic and suddenly more free pages appear (which is the case for Golang garbage collector). Yes, we can expect memory to be reclaimed from file-backed pages, but that should happen anyway when hard limit is hit. So if your PR aims to help kernel to decide what cgroup yields memory first in case of OOM (no more memory on host), then perfect. If you PR aims to fix OOM triggered by the hard limit - then it won't help unfortunately. |
I understand it. The idea of the PR was not to replace the Goland GC direct setting, like your PR does, but rather to help the kernel trigger similar functionality in other processes.
Yeah, I got it. But I guess the problem is that Kernel starts to reclaim the memory too late, for example, in case a lot of new allocations start to happen when we are already close to the limit. Reclaiming memory at the moment close to the hard limit is fine when it's not a repetitive allocation of a lot of chunks. Otherwise it can be tool late. From the documentation:
So, soft limit does trigger memory reclamation. And I think it would be helpful to do it a little bit in advance before we are close to the hard limit. |
We can test the approach to understand what exactly it will mean for us. |
cf6b569
to
1f33832
Compare
1f33832
to
fd7927f
Compare
Now I'm confused... |
Only if there is memory contention, i.e. no memory on the host (global OOM) and multiple cgroups are competing for memory (this is what doc says). If one cgroup bloats and reaches soft limit - nothing happens until the moment there is system-wide OOM. if no system-wide OOM, then hard limit is reached, and then reclaim attempt happens anyway. Soft limit is all about to say "I promise not allocate above this value, if I lie, please reclaim memory from me in case of global OOM" which is not the hard limit case. |
Are you 100% sure it happens only when global OOM is coming? While testing the memory monitor, I saw many memory pressure events generated by Kernel even when the system was far from OOM. I had to adapt my threshold settings accordingly. These events are generated even when a regular reclaim of caches happens. If memory balancing according to soft limits can be triggered by these events (what I could expect), it can still be helpful. And yeah, it is also helpful to set an expectation for how much to reclaim. In any case, I want the soft limits decreased. It will not hurt and may help. But I want to understand what exactly it means to reflect this in the document properly. |
This is what doc says (you've posted) and this what I see in the sources:
kind of explicit and aligns to what doc says. The
I assume you can setup what actually triggers those. Cache reclaim can happen by the timer, depends what you mean by cache, but What you are saying is all valid, but not for the hard limit case. That's my point. |
Also this callstack is possible:
Which means you get |
Offtopic. Or another option. It's called when someone writes to the "memory.force_empty" file of the cgroup: |
There is also a bunch of Ah, it's cgroup v2. Not our case. |
Also interesting when it happens. As far as I remember, the daemon runs when the "high" watermark is reached. But it would be interesting to understand the details. |
I just found that the soft limit is the one that Pillar uses to count its memory requirement for EVE. eve/pkg/pillar/types/locationconsts.go Line 104 in 2aa31d2
So, before merging the PR, it's better to change the logic to hard limit |
fd7927f
to
911b9bc
Compare
Updated soft limit for the kubevirt flavor. |
By default, we now set the soft memory limits to 80% of the hard memory limits for EVE cgroups. This adjustment sets the target values for memory reclamation when it's triggered by the Kernel. Updated the default values for dom0_mem, eve_mem, and ctrd_mem in the documentation and configuration files to reflect this change. Signed-off-by: Nikolay Martyanov <nikolay@zededa.com>
911b9bc
to
e9dec89
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rouming are you OK with this? I didn't understand all the details in your discussion.
Run the tests
By default, we now set the soft memory limits to 80% of the hard memory limits for EVE cgroups. This adjustment allows the kernel to start reclaiming memory earlier, giving processes a chance to free up memory before reaching the hard limit. Updated the default values for dom0_mem, eve_mem, and ctrd_mem in the documentation and configuration files to reflect this change.
This change is inspired by PR #4273 by @rouming.
To be merged after #4300