Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add detailed memory metrics #3197

Open
smcgivern opened this issue Nov 11, 2022 · 2 comments
Open

Add detailed memory metrics #3197

smcgivern opened this issue Nov 11, 2022 · 2 comments

Comments

@smcgivern
Copy link

Currently, cAdvisor reports the following memory stats:

ret.Memory.Usage = s.MemoryStats.Usage.Usage
ret.Memory.MaxUsage = s.MemoryStats.Usage.MaxUsage
ret.Memory.Failcnt = s.MemoryStats.Usage.Failcnt
if cgroups.IsCgroup2UnifiedMode() {
ret.Memory.Cache = s.MemoryStats.Stats["file"]
ret.Memory.RSS = s.MemoryStats.Stats["anon"]
ret.Memory.Swap = s.MemoryStats.SwapUsage.Usage - s.MemoryStats.Usage.Usage
ret.Memory.MappedFile = s.MemoryStats.Stats["file_mapped"]
} else if s.MemoryStats.UseHierarchy {
ret.Memory.Cache = s.MemoryStats.Stats["total_cache"]
ret.Memory.RSS = s.MemoryStats.Stats["total_rss"]
ret.Memory.Swap = s.MemoryStats.Stats["total_swap"]
ret.Memory.MappedFile = s.MemoryStats.Stats["total_mapped_file"]
} else {
ret.Memory.Cache = s.MemoryStats.Stats["cache"]
ret.Memory.RSS = s.MemoryStats.Stats["rss"]
ret.Memory.Swap = s.MemoryStats.Stats["swap"]
ret.Memory.MappedFile = s.MemoryStats.Stats["mapped_file"]
}
if v, ok := s.MemoryStats.Stats["pgfault"]; ok {
ret.Memory.ContainerData.Pgfault = v
ret.Memory.HierarchicalData.Pgfault = v
}
if v, ok := s.MemoryStats.Stats["pgmajfault"]; ok {
ret.Memory.ContainerData.Pgmajfault = v
ret.Memory.HierarchicalData.Pgmajfault = v
}
inactiveFileKeyName := "total_inactive_file"
if cgroups.IsCgroup2UnifiedMode() {
inactiveFileKeyName = "inactive_file"
}
workingSet := ret.Memory.Usage
if v, ok := s.MemoryStats.Stats[inactiveFileKeyName]; ok {
if workingSet < v {
workingSet = 0
} else {
workingSet -= v
}
}
ret.Memory.WorkingSet = workingSet

  1. Usage
  2. MaxUsage
  3. FailCnt
  4. Cache
  5. RSS
  6. Swap
  7. MappedFile
  8. WorkingSet - this is usage - inactive_file

In our case, we'd like more detailed metrics. We find that working set is often over what we want to account for (#3081 is one example), because, to quote a colleague:

container_memory_working_set_bytes is not what the OOM killer uses, but it is a better leading indicator of OOM risk than just the plain container_memory_usage_bytes. As long as the container's cgroup still has evictable filesystem cache pages, it will try hard to avoid killing processes, and container_memory_working_set_bytes subtracts some (but not all) of those pages.

A bit more about "evictable":

File pages in the "active" list are not evictable... until they get demoted back down to the "inactive" list. When the cgroup is starving for memory and needs to free a page (e.g. to satisfy a process requesting anonymous memory), it can shrink the total number of filesystem cache pages, and then the normal mechanism of demoting pages from the "active" list to the "inactive" list allows those previously unevictable pages to become eviction candidates the next time. There are only a few special cases where file-backed pages tend to not be evictable, which is why when we see an OOM kill event, the kernel's verbose logs for that kill typically show that most of the memory was anonymous, not file-backed.

So from the perspective of the container_memory_working_set_bytes metric, as memory pressure causes the container to shrink its number of file-backed pages to make room for more anonymous memory, both the "active" and "inactive" lists of file-backed pages will tend to shrink. So before reaching OOMK, the metric should be dominated by anonymous memory, and the lead-up to that point should be more or less gradual depending on the relative sizes of the active vs. inactive lists of filesystem cache pages.

I lean towards treating just the anonymous memory by itself as a saturation metric, since on swapless hosts it is guaranteed to be unevictable.

memory.stat includes active_anon, inactive_anon, active_file, and inactive_file (among others) but these are not exposed by cAdvisor currently: https://docs.kernel.org/admin-guide/cgroup-v1/memory.html#stat-file

Would you accept a patch to add those?

(Side note: we'd like to use RSS instead of WSS, but then we run into issues with programs that use MADV_FREE. Go dropped this in golang/go#42330, but programs in other languages may still use this, which inflates RSS above what might be expected.)

@smcgivern
Copy link
Author

This seems similar to #2634; in our case, if we had LazyFree exposed, we could also take RSS - LazyFree to get the value we're interested in. It looks like #2767 went stale, though.

@smcgivern
Copy link
Author

master...smcgivern:cadvisor:add-detailed-memory-stats does this, but I'm assuming we'd want to do it conditionally as it adds four metrics series to everywhere we collect memory metrics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant