Metric `container_memory_working_set_bytes` includes `slab` reclaimable memory #3081

cyrus-mc · 2022-03-17T16:44:31Z

I ran into a somewhat unique situation in which a POD had very high slab memory - high as in 1.1GB worth. In terms of anonymous and active file the memory usage was only around 25MB. Working set calculate shows 1.1GB due to the fact that slab reclaimable memory isn't subtracted from workingSet calculation.

Working set calculation

  ret.Memory.Usage = s.MemoryStats.Usage.Usage
  workingSet := ret.Memory.Usage
  if v, ok := s.MemoryStats.Stats[inactiveFileKeyName]; ok {
    if workingSet < v {
      workingSet = 0
    } else {
      workingSet -= v
    }
  }
  ret.Memory.WorkingSet = workingSet

Where MemoryStats.Usage.Usage is the value from memory.current (cgroup v2) or memory.usage_in_bytes (cgroup v1). Memory statistics (memory.stat) contains the following fields:

anon 663552
file 10313728
kernel_stack 49152
...
inactive_anon 573440
active_anon 32768
inactive_file 5066752
active_file 5246976
unevictable 0
slab_reclaimable 1232589368
slab_unreclaimable 128408
slab 1232717776
...

Of which slab_reclaimable is memory that can be reclaimed by the OS when needed. Should we be subtracting this value when calculating workingSet?

The text was updated successfully, but these errors were encountered:

bwplotka · 2022-03-20T19:58:50Z

Good question! I don't want to overcrowd this issue, but why we don't subtract inactive_anon as well?

Rationale: https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt#:~:text=inactive_anon%09%2D%20%23%20of%20bytes%20of%20anonymous%20and%20swap%20cache%20memory%20on%20inactive%0A%09%09LRU%20list.

cyrus-mc · 2022-04-01T19:09:26Z

@bwplotka I can understand why inactive_anon isn't subtracted since most container clusters don't run with swap so that memory can't be swapped out and thus is part of the working set.

Is slab_reclaimable the same? Since it is more of a cache thing it can be reclaimed by the OS when it needs memory.

bwplotka · 2022-08-08T19:37:57Z

Yea, agree, something is off, but for me, it's not really slab. I can reproduce this problem with a large number of open file descriptors. The WSS shows quite large memory usage:

WSS:

(file_mapped = 0)

RSS:

Stat file:

sudo cat /sys/fs/cgroup/system.slice/docker-40dc294092fde3c01f9c715c20a224aa34ff13e1efdb99526f93ec70c25533c7.scope/memory.stat
anon 20172800
file 3391561728
kernel_stack 311296
pagetables 282624
percpu 504
sock 4096
vmalloc 8192
shmem 0
file_mapped 0
file_dirty 0
file_writeback 0
swapcached 0
anon_thp 0
file_thp 0
shmem_thp 0
inactive_anon 30097408
active_anon 4096
inactive_file 3391561728
active_file 0
unevictable 0
slab_reclaimable 102486032
slab_unreclaimable 501088
slab 102987120
workingset_refault_anon 0
workingset_refault_file 0
workingset_activate_anon 0
workingset_activate_file 0
workingset_restore_anon 0
workingset_restore_file 0
workingset_nodereclaim 0
pgfault 177528
pgmajfault 0
pgrefill 0
pgscan 0
pgsteal 0
pgactivate 0
pgdeactivate 0
pglazyfree 0
pglazyfreed 0
thp_fault_alloc 58
thp_collapse_alloc 44

Now, what's interesting dropping all cache pages on the host machine using sudo sysctl -w vm.drop_caches=1 cleans WSS to almost RSS🙃 Which kind of tell us it's reclaimable, no?

Simply dropping cache from WSS is no-go as cache is extremely large, yet kind of affecting the WSS:

Stats after:

sudo cat /sys/fs/cgroup/system.slice/docker-40dc294092fde3c01f9c715c20a224aa34ff13e1efdb99526f93ec70c25533c7.scope/memory.stat
anon 20291584
file 0
kernel_stack 311296
pagetables 282624
percpu 504
sock 4096
vmalloc 8192
shmem 0
file_mapped 0
file_dirty 0
file_writeback 0
swapcached 0
anon_thp 0
file_thp 0
shmem_thp 0
inactive_anon 30216192
active_anon 4096
inactive_file 0
active_file 0
unevictable 0
slab_reclaimable 1661888
slab_unreclaimable 496376
slab 2158264
workingset_refault_anon 0
workingset_refault_file 0
workingset_activate_anon 0
workingset_activate_file 0
workingset_restore_anon 0
workingset_restore_file 0
workingset_nodereclaim 0
pgfault 211983
pgmajfault 0
pgrefill 0
pgscan 0
pgsteal 0
pgactivate 0
pgdeactivate 0
pglazyfree 0
pglazyfreed 0
thp_fault_alloc 58
thp_collapse_alloc 44

saswatac · 2023-03-29T06:37:00Z

Why is not the Active(file) memory subtracted as well? I observe on my containers that the Active(file) memory is high even after a memory-heavy job ends, and the container_memory_working_set_bytes remains at a high value.

From what I understand, even the Active(file) memory is reclaimable, although it has a lesser priority than InActive(file).

cyrus-mc · 2023-08-25T05:31:50Z

@saswatac you don't want to subtract active file because the working set is meant to give you a metric for the active memory your container needs. Every app needs some file cache. If you just ignore that when assigning memory requests for your app it will degrade the performance.

astronaut0131 · 2024-08-28T09:01:40Z

After the pod accepted a large number of socket connections, the container_memory_working_set_bytes remained around 2GB and never dropped, even when there were no new requests. Meanwhile, container_memory_rss was only around 200+MB.

Before dropping the cache, I examined the memory stats:

# cat /sys/fs/cgroup/memory/memory.stat 
cache 0
rss 218734592
rss_huge 109051904
shmem 0
mapped_file 0
dirty 0
writeback 0
swap 0
pgpgin 14037837
pgpgout 14021095
pgfault 14053809
pgmajfault 0
inactive_anon 2554662912
active_anon 0
inactive_file 0
active_file 0
unevictable 0
hierarchical_memory_limit 17179869184
hierarchical_memsw_limit 17179869184
total_cache 0
total_rss 218734592
total_rss_huge 109051904
total_shmem 0
total_mapped_file 0
total_dirty 0
total_writeback 0
total_swap 0
total_pgpgin 14037837
total_pgpgout 14021095
total_pgfault 14053809
total_pgmajfault 0
total_inactive_anon 2554662912
total_active_anon 0
total_inactive_file 0
total_active_file 0
total_unevictable 0
I also checked the current memory usage:

# cat /sys/fs/cgroup/memory/memory.usage_in_bytes 
2556129280

To free reclaimable slab objects (which include dentries and inodes):

echo 2 > /proc/sys/vm/drop_caches

After executing the drop_caches command:

# cat /sys/fs/cgroup/memory/memory.stat 
cache 1081344
rss 219250688
rss_huge 109051904
shmem 0
mapped_file 540672
dirty 0
writeback 0
swap 0
pgpgin 14038926
pgpgout 14592264
pgfault 14055426
pgmajfault 0
inactive_anon 219336704
active_anon 0
inactive_file 540672
active_file 675840
unevictable 0
hierarchical_memory_limit 17179869184
hierarchical_memsw_limit 17179869184
total_cache 1081344
total_rss 219250688
total_rss_huge 109051904
total_shmem 0
total_mapped_file 540672
total_dirty 0
total_writeback 0
total_swap 0
total_pgpgin 14038926
total_pgpgout 14592264
total_pgfault 14055426
total_pgmajfault 0
total_inactive_anon 219336704
total_active_anon 0
total_inactive_file 540672
total_active_file 675840
total_unevictable 0

I checked the memory usage again:

# cat /sys/fs/cgroup/memory/memory.usage_in_bytes 
221339648

This issue is affecting the decisions made by the Horizontal Pod Autoscaler (HPA) regarding memory.

CharlieR-o-o-t · 2024-11-21T12:02:31Z

I think that "container_memory_working_set_bytes" should contain only unreclaimable mem, only in this way it'll be possible to effectively use this metric for scaling and as alert before OOM.

In my example, I have 1 GB of mem in slab_reclaimable, application itself consume only 100MB.

Is any update on this? Can I make PR to fix that?

smcgivern mentioned this issue Nov 11, 2022

Add detailed memory metrics #3197

Open

HonakerM mentioned this issue Mar 31, 2023

Proposal to update container_memory_usage_bytes to Cache+RSS #3286

Open

howardjohn mentioned this issue Oct 9, 2024

CNI highly skewed by pagecaches istio/istio#53493

Closed

CharlieR-o-o-t mentioned this issue Jan 5, 2025

exclude cgroups slab_reclaimable from container_memory_working_set_bytes #3641

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metric `container_memory_working_set_bytes` includes `slab` reclaimable memory #3081

Metric `container_memory_working_set_bytes` includes `slab` reclaimable memory #3081

cyrus-mc commented Mar 17, 2022

bwplotka commented Mar 20, 2022

cyrus-mc commented Apr 1, 2022

bwplotka commented Aug 8, 2022 •

edited

Loading

saswatac commented Mar 29, 2023

cyrus-mc commented Aug 25, 2023

astronaut0131 commented Aug 28, 2024 •

edited

Loading

CharlieR-o-o-t commented Nov 21, 2024 •

edited

Loading

Metric container_memory_working_set_bytes includes slab reclaimable memory #3081

Metric container_memory_working_set_bytes includes slab reclaimable memory #3081

Comments

cyrus-mc commented Mar 17, 2022

bwplotka commented Mar 20, 2022

cyrus-mc commented Apr 1, 2022

bwplotka commented Aug 8, 2022 • edited Loading

saswatac commented Mar 29, 2023

cyrus-mc commented Aug 25, 2023

astronaut0131 commented Aug 28, 2024 • edited Loading

CharlieR-o-o-t commented Nov 21, 2024 • edited Loading

Metric `container_memory_working_set_bytes` includes `slab` reclaimable memory #3081

Metric `container_memory_working_set_bytes` includes `slab` reclaimable memory #3081

bwplotka commented Aug 8, 2022 •

edited

Loading

astronaut0131 commented Aug 28, 2024 •

edited

Loading

CharlieR-o-o-t commented Nov 21, 2024 •

edited

Loading