-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PATCH] mm: Protect clean file pages under memory pressure to prevent thrashing, avoid high latency and prevent livelock in near-OOM conditions #218
Comments
Thanks, this patchset looks safe and straightforward to add. I ended up using the 5.14 release candidate for the 5.13 branch since 5.13's memory subsystem changed quite a bit since rc2. Also, with both 5.12 and 5.13, I set a default value of effectively 256mb soft and 64mb hard cache protection. Should make it easier for any package maintainers to pick up this change without needing to research too much into it. |
Note that hard protection may cause these issues with DRM/i915:
https://github.com/hakavlad/le9-patch#warnings Soft protection should be safe and 256MB is OK. |
Per @hakavlad [1], enabling hard cache protection breaks i915. Keep it at zero so we don't run into the same issues previously reported. [1] #218 (comment)
Per @hakavlad [1], enabling hard cache protection breaks i915. Keep it at zero so we don't run into the same issues previously reported. [1] #218 (comment)
@hakavlad thanks for the warning, dropped CLEAN_MIN_KBYTES to zero. |
Also note that le9 doesn't work with mg-LRU [1]. If you are using mg-LRU, than le9 will not work (mg-LRU doesn't use [1] https://forum.xanmod.org/thread-4102-post-7599.html#pid7599 |
Per @hakavlad [1], enabling hard cache protection breaks i915. Keep it at zero so we don't run into the same issues previously reported. [1] zen-kernel#218 (comment) Signed-off-by: kernelOfTruth <kerneloftruth@gmail.com>
@yuzhaogoogle "Protecting file pages with mgLRU will require a completely different approach than in le9. I do not yet know how to implement protection of file pages when using mgLRU." Do you plan to implement this for mgLRU? |
Yes, sometime next week. |
Circling back on this: now |
Although not identical to the le9 patches that protect a byte-amount of cache through tunables, multigenerational LRU now supports protecting cache accessed in the last X milliseconds. In #218, Yu recommends starting with 1000ms and tuning as needed. This looks like a safe default and turning on this feature should help users that don't know they need it.
I will try this out soon, thanks. What are the trade-offs when tweaking (larger and smaller) this with sysctl? |
Oom kill more aggressively (more get killed) but more responsive user experience Does it make sense? |
Yes, I understand. So userspace OOM daemons, like earlyoom, are no longer needed with mgLRU? |
Although not identical to the le9 patches that protect a byte-amount of cache through tunables, multigenerational LRU now supports protecting cache accessed in the last X milliseconds. In #218, Yu recommends starting with 1000ms and tuning as needed. This looks like a safe default and turning on this feature should help users that don't know they need it.
Although not identical to the le9 patches that protect a byte-amount of cache through tunables, multigenerational LRU now supports protecting cache accessed in the last X milliseconds. In #218, Yu recommends starting with 1000ms and tuning as needed. This looks like a safe default and turning on this feature should help users that don't know they need it.
Although not identical to the le9 patches that protect a byte-amount of cache through tunables, multigenerational LRU now supports protecting cache accessed in the last X milliseconds. In #218, Yu recommends starting with 1000ms and tuning as needed. This looks like a safe default and turning on this feature should help users that don't know they need it.
Although not identical to the le9 patches that protect a byte-amount of cache through tunables, multigenerational LRU now supports protecting cache accessed in the last X milliseconds. In #218, Yu recommends starting with 1000ms and tuning as needed. This looks like a safe default and turning on this feature should help users that don't know they need it.
Although not identical to the le9 patches that protect a byte-amount of cache through tunables, multigenerational LRU now supports protecting cache accessed in the last X milliseconds. In #218, Yu recommends starting with 1000ms and tuning as needed. This looks like a safe default and turning on this feature should help users that don't know they need it.
Although not identical to the le9 patches that protect a byte-amount of cache through tunables, multigenerational LRU now supports protecting cache accessed in the last X milliseconds. In #218, Yu recommends starting with 1000ms and tuning as needed. This looks like a safe default and turning on this feature should help users that don't know they need it.
Although not identical to the le9 patches that protect a byte-amount of cache through tunables, multigenerational LRU now supports protecting cache accessed in the last X milliseconds. In #218, Yu recommends starting with 1000ms and tuning as needed. This looks like a safe default and turning on this feature should help users that don't know they need it.
Although not identical to the le9 patches that protect a byte-amount of cache through tunables, multigenerational LRU now supports protecting cache accessed in the last X milliseconds. In #218, Yu recommends starting with 1000ms and tuning as needed. This looks like a safe default and turning on this feature should help users that don't know they need it.
Although not identical to the le9 patches that protect a byte-amount of cache through tunables, multigenerational LRU now supports protecting cache accessed in the last X milliseconds. In #218, Yu recommends starting with 1000ms and tuning as needed. This looks like a safe default and turning on this feature should help users that don't know they need it.
Although not identical to the le9 patches that protect a byte-amount of cache through tunables, multigenerational LRU now supports protecting cache accessed in the last X milliseconds. In #218, Yu recommends starting with 1000ms and tuning as needed. This looks like a safe default and turning on this feature should help users that don't know they need it.
Although not identical to the le9 patches that protect a byte-amount of cache through tunables, multigenerational LRU now supports protecting cache accessed in the last X milliseconds. In #218, Yu recommends starting with 1000ms and tuning as needed. This looks like a safe default and turning on this feature should help users that don't know they need it.
[ Upstream commit d8c22c4 ] Disabling the remote phy for a SATA disk causes a hang: root@(none)$ more /sys/class/sas_phy/phy-0:0:8/target_port_protocols sata root@(none)$ echo 0 > sys/class/sas_phy/phy-0:0:8/enable root@(none)$ [ 67.855950] sas: ex 500e004aaaaaaa1f phy08 change count has changed [ 67.920585] sd 0:0:2:0: [sdc] Synchronizing SCSI cache [ 67.925780] sd 0:0:2:0: [sdc] Synchronize Cache(10) failed: Result: hostbyte=0x04 driverbyte=DRIVER_OK [ 67.935094] sd 0:0:2:0: [sdc] Stopping disk [ 67.939305] sd 0:0:2:0: [sdc] Start/Stop Unit failed: Result: hostbyte=0x04 driverbyte=DRIVER_OK ... [ 123.998998] INFO: task kworker/u192:1:642 blocked for more than 30 seconds. [ 124.005960] Not tainted 6.0.0-rc1-205202-gf26f8f761e83 #218 [ 124.012049] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 124.019872] task:kworker/u192:1 state:D stack:0 pid: 642 ppid: 2 flags:0x00000008 [ 124.028223] Workqueue: 0000:04:00.0_event_q sas_port_event_worker [ 124.034319] Call trace: [ 124.036758] __switch_to+0x128/0x278 [ 124.040333] __schedule+0x434/0xa58 [ 124.043820] schedule+0x94/0x138 [ 124.047045] schedule_timeout+0x2fc/0x368 [ 124.051052] wait_for_completion+0xdc/0x200 [ 124.055234] __flush_workqueue+0x1a8/0x708 [ 124.059328] sas_porte_broadcast_rcvd+0xa8/0xc0 [ 124.063858] sas_port_event_worker+0x60/0x98 [ 124.068126] process_one_work+0x3f8/0x660 [ 124.072134] worker_thread+0x70/0x700 [ 124.075793] kthread+0x1a4/0x1b8 [ 124.079014] ret_from_fork+0x10/0x20 The issue is that the per-device running_req read in pm8001_dev_gone_notify() never goes to zero and we never make progress. This is caused by missing accounting for running_req for when an internal abort command completes. In commit 2cbbf48 ("scsi: pm8001: Use libsas internal abort support") we started to send internal abort commands as a proper sas_task. In this when we deliver a sas_task to HW the per-device running_req is incremented in pm8001_queue_command(). However it is never decremented for internal abort commnds, so decrement in pm8001_mpi_task_abort_resp(). Link: https://lore.kernel.org/r/1663854664-76165-1-git-send-email-john.garry@huawei.com Fixes: 2cbbf48 ("scsi: pm8001: Use libsas internal abort support") Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d8c22c4 ] Disabling the remote phy for a SATA disk causes a hang: root@(none)$ more /sys/class/sas_phy/phy-0:0:8/target_port_protocols sata root@(none)$ echo 0 > sys/class/sas_phy/phy-0:0:8/enable root@(none)$ [ 67.855950] sas: ex 500e004aaaaaaa1f phy08 change count has changed [ 67.920585] sd 0:0:2:0: [sdc] Synchronizing SCSI cache [ 67.925780] sd 0:0:2:0: [sdc] Synchronize Cache(10) failed: Result: hostbyte=0x04 driverbyte=DRIVER_OK [ 67.935094] sd 0:0:2:0: [sdc] Stopping disk [ 67.939305] sd 0:0:2:0: [sdc] Start/Stop Unit failed: Result: hostbyte=0x04 driverbyte=DRIVER_OK ... [ 123.998998] INFO: task kworker/u192:1:642 blocked for more than 30 seconds. [ 124.005960] Not tainted 6.0.0-rc1-205202-gf26f8f761e83 #218 [ 124.012049] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 124.019872] task:kworker/u192:1 state:D stack:0 pid: 642 ppid: 2 flags:0x00000008 [ 124.028223] Workqueue: 0000:04:00.0_event_q sas_port_event_worker [ 124.034319] Call trace: [ 124.036758] __switch_to+0x128/0x278 [ 124.040333] __schedule+0x434/0xa58 [ 124.043820] schedule+0x94/0x138 [ 124.047045] schedule_timeout+0x2fc/0x368 [ 124.051052] wait_for_completion+0xdc/0x200 [ 124.055234] __flush_workqueue+0x1a8/0x708 [ 124.059328] sas_porte_broadcast_rcvd+0xa8/0xc0 [ 124.063858] sas_port_event_worker+0x60/0x98 [ 124.068126] process_one_work+0x3f8/0x660 [ 124.072134] worker_thread+0x70/0x700 [ 124.075793] kthread+0x1a4/0x1b8 [ 124.079014] ret_from_fork+0x10/0x20 The issue is that the per-device running_req read in pm8001_dev_gone_notify() never goes to zero and we never make progress. This is caused by missing accounting for running_req for when an internal abort command completes. In commit 2cbbf48 ("scsi: pm8001: Use libsas internal abort support") we started to send internal abort commands as a proper sas_task. In this when we deliver a sas_task to HW the per-device running_req is incremented in pm8001_queue_command(). However it is never decremented for internal abort commnds, so decrement in pm8001_mpi_task_abort_resp(). Link: https://lore.kernel.org/r/1663854664-76165-1-git-send-email-john.garry@huawei.com Fixes: 2cbbf48 ("scsi: pm8001: Use libsas internal abort support") Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Although not identical to the le9 patches that protect a byte-amount of cache through tunables, multigenerational LRU now supports protecting cache accessed in the last X milliseconds. In #218, Yu recommends starting with 1000ms and tuning as needed. This looks like a safe default and turning on this feature should help users that don't know they need it.
Although not identical to the le9 patches that protect a byte-amount of cache through tunables, multigenerational LRU now supports protecting cache accessed in the last X milliseconds. In #218, Yu recommends starting with 1000ms and tuning as needed. This looks like a safe default and turning on this feature should help users that don't know they need it.
Although not identical to the le9 patches that protect a byte-amount of cache through tunables, multigenerational LRU now supports protecting cache accessed in the last X milliseconds. In #218, Yu recommends starting with 1000ms and tuning as needed. This looks like a safe default and turning on this feature should help users that don't know they need it.
Although not identical to the le9 patches that protect a byte-amount of cache through tunables, multigenerational LRU now supports protecting cache accessed in the last X milliseconds. In #218, Yu recommends starting with 1000ms and tuning as needed. This looks like a safe default and turning on this feature should help users that don't know they need it.
Although not identical to the le9 patches that protect a byte-amount of cache through tunables, multigenerational LRU now supports protecting cache accessed in the last X milliseconds. In #218, Yu recommends starting with 1000ms and tuning as needed. This looks like a safe default and turning on this feature should help users that don't know they need it.
Although not identical to the le9 patches that protect a byte-amount of cache through tunables, multigenerational LRU now supports protecting cache accessed in the last X milliseconds. In #218, Yu recommends starting with 1000ms and tuning as needed. This looks like a safe default and turning on this feature should help users that don't know they need it.
Although not identical to the le9 patches that protect a byte-amount of cache through tunables, multigenerational LRU now supports protecting cache accessed in the last X milliseconds. In #218, Yu recommends starting with 1000ms and tuning as needed. This looks like a safe default and turning on this feature should help users that don't know they need it.
Although not identical to the le9 patches that protect a byte-amount of cache through tunables, multigenerational LRU now supports protecting cache accessed in the last X milliseconds. In #218, Yu recommends starting with 1000ms and tuning as needed. This looks like a safe default and turning on this feature should help users that don't know they need it.
Although not identical to the le9 patches that protect a byte-amount of cache through tunables, multigenerational LRU now supports protecting cache accessed in the last X milliseconds. In #218, Yu recommends starting with 1000ms and tuning as needed. This looks like a safe default and turning on this feature should help users that don't know they need it.
Although not identical to the le9 patches that protect a byte-amount of cache through tunables, multigenerational LRU now supports protecting cache accessed in the last X milliseconds. In #218, Yu recommends starting with 1000ms and tuning as needed. This looks like a safe default and turning on this feature should help users that don't know they need it.
Distinguish between xe_pt and the xe_pt_dir subclass when allocating and freeing. Also use a fixed-size array for the xe_pt_dir page entries to make life easier for dynamic range- checkers. Finally rename the page-directory child pointer array to "children". While no functional change, this fixes ubsan splats similar to: [ 51.463021] ------------[ cut here ]------------ [ 51.463022] UBSAN: array-index-out-of-bounds in drivers/gpu/drm/xe/xe_pt.c:47:9 [ 51.463023] index 0 is out of range for type 'xe_ptw *[*]' [ 51.463024] CPU: 5 PID: 2778 Comm: xe_vm Tainted: G U 6.8.0-rc1+ #218 [ 51.463026] Hardware name: ASUS System Product Name/PRIME B560M-A AC, BIOS 2001 02/01/2023 [ 51.463027] Call Trace: [ 51.463028] <TASK> [ 51.463029] dump_stack_lvl+0x47/0x60 [ 51.463030] __ubsan_handle_out_of_bounds+0x95/0xd0 [ 51.463032] xe_pt_destroy+0xa5/0x150 [xe] [ 51.463088] __xe_pt_unbind_vma+0x36c/0x9b0 [xe] [ 51.463144] xe_vm_unbind+0xd8/0x580 [xe] [ 51.463204] ? drm_exec_prepare_obj+0x3f/0x60 [drm_exec] [ 51.463208] __xe_vma_op_execute+0x5da/0x910 [xe] [ 51.463268] ? __drm_gpuvm_sm_unmap+0x1cb/0x220 [drm_gpuvm] [ 51.463272] ? radix_tree_node_alloc.constprop.0+0x89/0xc0 [ 51.463275] ? drm_gpuva_it_remove+0x1f3/0x2a0 [drm_gpuvm] [ 51.463279] ? drm_gpuva_remove+0x2f/0xc0 [drm_gpuvm] [ 51.463283] xe_vm_bind_ioctl+0x1a55/0x20b0 [xe] [ 51.463344] ? __pfx_xe_vm_bind_ioctl+0x10/0x10 [xe] [ 51.463414] drm_ioctl_kernel+0xb6/0x120 [ 51.463416] drm_ioctl+0x287/0x4e0 [ 51.463418] ? __pfx_xe_vm_bind_ioctl+0x10/0x10 [xe] [ 51.463481] __x64_sys_ioctl+0x94/0xd0 [ 51.463484] do_syscall_64+0x86/0x170 [ 51.463486] ? syscall_exit_to_user_mode+0x7d/0x200 [ 51.463488] ? do_syscall_64+0x96/0x170 [ 51.463490] ? do_syscall_64+0x96/0x170 [ 51.463492] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 51.463494] RIP: 0033:0x7f246bfe817d [ 51.463498] Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00 [ 51.463501] RSP: 002b:00007ffc1bd19ad0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 51.463502] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f246bfe817d [ 51.463504] RDX: 00007ffc1bd19b60 RSI: 0000000040886445 RDI: 0000000000000003 [ 51.463505] RBP: 00007ffc1bd19b20 R08: 0000000000000000 R09: 0000000000000000 [ 51.463506] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffc1bd19b60 [ 51.463508] R13: 0000000040886445 R14: 0000000000000003 R15: 0000000000010000 [ 51.463510] </TASK> [ 51.463517] ---[ end trace ]--- v2 - Fix kerneldoc warning (Matthew Brost) Fixes: dd08ebf ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240209112655.4872-1-thomas.hellstrom@linux.intel.com (cherry picked from commit 157261c) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Although not identical to the le9 patches that protect a byte-amount of cache through tunables, multigenerational LRU now supports protecting cache accessed in the last X milliseconds. In #218, Yu recommends starting with 1000ms and tuning as needed. This looks like a safe default and turning on this feature should help users that don't know they need it.
Although not identical to the le9 patches that protect a byte-amount of cache through tunables, multigenerational LRU now supports protecting cache accessed in the last X milliseconds. In #218, Yu recommends starting with 1000ms and tuning as needed. This looks like a safe default and turning on this feature should help users that don't know they need it.
Although not identical to the le9 patches that protect a byte-amount of cache through tunables, multigenerational LRU now supports protecting cache accessed in the last X milliseconds. In #218, Yu recommends starting with 1000ms and tuning as needed. This looks like a safe default and turning on this feature should help users that don't know they need it.
Although not identical to the le9 patches that protect a byte-amount of cache through tunables, multigenerational LRU now supports protecting cache accessed in the last X milliseconds. In #218, Yu recommends starting with 1000ms and tuning as needed. This looks like a safe default and turning on this feature should help users that don't know they need it.
Although not identical to the le9 patches that protect a byte-amount of cache through tunables, multigenerational LRU now supports protecting cache accessed in the last X milliseconds. In #218, Yu recommends starting with 1000ms and tuning as needed. This looks like a safe default and turning on this feature should help users that don't know they need it.
Although not identical to the le9 patches that protect a byte-amount of cache through tunables, multigenerational LRU now supports protecting cache accessed in the last X milliseconds. In #218, Yu recommends starting with 1000ms and tuning as needed. This looks like a safe default and turning on this feature should help users that don't know they need it.
Although not identical to the le9 patches that protect a byte-amount of cache through tunables, multigenerational LRU now supports protecting cache accessed in the last X milliseconds. In #218, Yu recommends starting with 1000ms and tuning as needed. This looks like a safe default and turning on this feature should help users that don't know they need it.
Please look at this https://github.com/hakavlad/le9-patch
https://github.com/hakavlad/le9-patch/blob/main/le9db_patches/le9db-5.13-rc2-mg-LRU-v3.patch may be applied to zen 5.12-5.13. I suggest this patch to be applied to linux-zen.
See also
https://www.phoronix.com/scan.php?page=news_item&px=le9-Linux-Low-RAM and comments
https://forum.xanmod.org/thread-4102-post-7562.html
hakavlad/le9-patch#4 and https://notes.valdikss.org.ru/linux-for-old-pc-from-2007/en/
The text was updated successfully, but these errors were encountered: