You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Linux kernel supports the following overcommit handling modes
0 - Heuristic overcommit handling. Obvious overcommits of
address space are refused. Used for a typical system. It
ensures a seriously wild allocation fails while allowing
overcommit to reduce swap usage. root is allowed to
allocate slightly more memory in this mode. This is the
default.
1 - Always overcommit. Appropriate for some scientific
applications. Classic example is code using sparse arrays
and just relying on the virtual memory consisting almost
entirely of zero pages.
2 - Don't overcommit. The total address space commit
for the system is not permitted to exceed swap + a
configurable amount (default is 50%) of physical RAM.
Depending on the amount you use, in most situations
this means a process will not be killed while accessing
pages but will receive errors on memory allocation as
appropriate.
Useful for applications that want to guarantee their
memory allocations will be available in the future
without having to initialize every page.
int__vm_enough_memory(structmm_struct*mm, longpages, intcap_sys_admin)
{
longfree, allowed, reserve;
VM_WARN_ONCE(percpu_counter_read(&vm_committed_as) <-(s64)vm_committed_as_batch*num_online_cpus(),
"memory commitment underflow");
vm_acct_memory(pages);
/* * Sometimes we want to use more memory than we have */if (sysctl_overcommit_memory==OVERCOMMIT_ALWAYS)
return0;
if (sysctl_overcommit_memory==OVERCOMMIT_GUESS) {
free=global_page_state(NR_FREE_PAGES);
free+=global_page_state(NR_FILE_PAGES);
/* * shmem pages shouldn't be counted as free in this * case, they can't be purged, only swapped out, and * that won't affect the overall amount of available * memory in the system. */free-=global_page_state(NR_SHMEM);
free+=get_nr_swap_pages();
/* * Any slabs which are created with the * SLAB_RECLAIM_ACCOUNT flag claim to have contents * which are reclaimable, under pressure. The dentry * cache and most inode caches should fall into this */free+=global_page_state(NR_SLAB_RECLAIMABLE);
/* * Leave reserved pages. The pages are not for anonymous pages. */if (free <= totalreserve_pages)
goto error;
elsefree-=totalreserve_pages;
/* * Reserve some for root */if (!cap_sys_admin)
free-=sysctl_admin_reserve_kbytes >> (PAGE_SHIFT-10);
if (free>pages)
return0;
goto error;
}
allowed=vm_commit_limit();
/* * Reserve some for root */if (!cap_sys_admin)
allowed-=sysctl_admin_reserve_kbytes >> (PAGE_SHIFT-10);
/* * Don't let a single process grow so big a user can't recover */if (mm) {
reserve=sysctl_user_reserve_kbytes >> (PAGE_SHIFT-10);
allowed-=min_t(long, mm->total_vm / 32, reserve);
}
if (percpu_counter_read_positive(&vm_committed_as) <allowed)
return0;
error:
vm_unacct_memory(pages);
return-ENOMEM;
}
short answer for redis -> echo 1 > /proc/sys/vm/overcommit_memory :)
for PostgresSQL or Greenplum -> echo 2 > /proc/sys/vm/overcommit_memory :)
why
redis:
The Redis background saving schema relies on the copy-on-write semantic of the fork system call in modern operating systems: Redis forks (creates a child process) that is an exact copy of the parent. The child process dumps the DB on disk and finally exits. In theory the child should use as much memory as the parent being a copy, but actually thanks to the copy-on-write semantic implemented by most modern operating systems the parent and child process will share the common memory pages. A page will be duplicated only when it changes in the child or in the parent. Since in theory all the pages may change while the child process is saving, Linux can't tell in advance how much memory the child will take, so if the overcommit_memory setting is set to zero the fork will fail unless there is as much free RAM as required to really duplicate all the parent memory pages. If you have a Redis dataset of 3 GB and just 2 GB of free memory it will fail.
Setting overcommit_memory to 1 tells Linux to relax and perform the fork in a more optimistic allocation fashion, and this is indeed what you want for Redis.
Why Use Strategy 2?
The issue with the default overcommits strategy, or using strategy 1 is the unpredictable nature of the failure. In either case we are not guaranteed that the memory is available at time of allocation and that results in unpredictable termination of processes. In particular, the nature of the failure with strategy 1 can result in corruption of either datafiles or transaction logs as a memory failure can occur mid-transaction resulting in the immediate shutdown of the database process without any cleanup.
关于 overcommit_memory 学习的一些记录,其中一些为网上前辈文章的整理记录
什么是 vm.overcommit
但在实际情况下,进程请求的内存可能比实际需要的内存少很多,而且很多进程请求的内存并不会全部使用。在这种情况下,如果系统采用严格的内存申请机制,将会导致系统的内存利用率低下,一些能够使用的内存被浪费。
因此,为了提高系统的内存利用率和灵活性,Linux 提供了 vm.overcommit_memory 机制。当这个参数的值为 0 或 1 时,内核允许进程请求超过系统实际可用内存的内存量。这种行为被称为 "overcommit"。
vm.overcommit 的三个值
https://github.com/torvalds/linux/blob/v4.6/mm/util.c#L481
当 vm.overcommit_memory = 2 时
grep -i commit /proc/meminfo
/proc/meminfo 中的 Committed_AS 表示所有进程已经申请的内存总大小,(注意是已经申请的,不是已经分配的),如果 Committed_AS 超过 CommitLimit 就表示发生了 overcommit,超出越多表示 overcommit 越严重 -> 触发 OOM
那么这个 CommitLimit 是怎么算出来的呢?
红帽犯过这个错误 :https://access.redhat.com/solutions/665023
OOM-killer
OMM killer 机制:linux 会为每个进程算一个 score,当内存不足时候它会根据 score kill
score 越大表示越容易杀死
/proc/${pid}/oom_score_adj, 取值范围为-1000到1000, 如果将该值设置为-1000,则进程永远不会被杀死 --> pg
/proc/${pid}/oom_adj, 取值是-17到+15,取值越高,越容易被干掉。如果是-17,则表示不能被kill
/proc/${pid}/oom_score, 是系统综合进程的内存消耗量、CPU 时间(utime + stime)、存活时间(uptime - start time)和 oom_adj 计算出的,消耗内存越多分越高。
https://elixir.bootlin.com/linux/v5.4.58/source/mm/oom_kill.c
数据库如何设置,如何避免被 oom-killer
why
redis:
Setting overcommit_memory to 1 tells Linux to relax and perform the fork in a more optimistic allocation fashion, and this is indeed what you want for Redis.
PostgresSQL or Greenplum:
那如果 redis 和 pg 跑在一台机器上有什么好的建议么?
当设置 overcommit = 2 时有什么需要注意的
The text was updated successfully, but these errors were encountered: