-
Notifications
You must be signed in to change notification settings - Fork 554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cephcsi-cephfs nodeplugin pod got OOM killing when using FUSE mounter #554
Comments
Yes, the pods will also perform I/O and filesystem operations. |
@ajarr @poornimag PTAL |
@yydzhou , what's the version of Ceph cluster and FUSE client (ceph-fuse)? Exact version would be helpful to know, 14.2.x? After how many PVCs consumed by the pods and doing I/Os, do you hit OOM? |
@ShyamsundarR , are 256 M and 1G typical memory limit setting for CephFS and RBD node plugins? |
I am not aware of the memory constraints or usage of RBD. I think we need to understand this better. For example, switching to kernel cephfs and/or krbd instead of rbd-nbd may not charge the container namespace (in this case the nodeplugin) the memory overhead and further would let the kernel manage space reclamation based on usage. I think this may be a better direction in the longer run (unless I understood rbd-ndb incorrectly, i.e where the blocks are cached). The issue may come down to, how to let cephfs FUSE know how much space it has to operate with cached data (beyond the usual must-have memory footprint) and if we can share this across the various FUSE mounts, or we need to think about a single FUSE mount to control this consumption (i.e #476 ). cc @dillaman |
@ajarr The ceph version is 13.2.4. The OOM happens when mounting the 3rd pv on the node. I am using cephfs-csi chart release 1.0. So I assume the ceph.fuse version is 14.2.x? I think Once the pv is mounted the related pods will issue some I/O against it. But not that much. Because the failure happens during the deployment of our cluster. |
JFYI, Using of ceph.FUSE mounting is to support centos with old kernel (3.10). The external cephfs provisioner has an option to disable RADOS pool namespace ioslation to allow old kernel mount (ref kubernetes-retired/external-storage@4fefaf6#diff-3ccb4687fb599e0453570308087f8252). But seems the cephfs-csi does not support that. So we have to use FUSE mounting, before we upgraded all our kernel version. |
@ batrick FYI |
This is surprising. Not sure how other CephFS CSI v1.0.0 users didn't hit this.
This is helpful information. I'll try tracking down the issue. |
even I have seen this issue with more I/O the memory consumption of fuse increases. |
@joscollin can you also take a look? |
Any news on this one? We're seeing the same issue, even in nodes with no PVs mounted after a couple days they go OOM. Dropping the resource requests the nodes eventually die. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions. |
This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation. |
Describe the bug
When using cephfs FUSE mounter in cephfs csi, the cephcsi-cephfs nodeplugin pod is very easy to get killed due to OOM. I have tried with 256M and then 1G resource setting, but the issue still happens.
A clear and concise description of what the bug is.
Environment details
image: quay.io/cephcsi/cephfsplugin:v1.0.0
1.13.4
Steps to reproduce
Steps to reproduce the behavior:
deploying ceph and then cephfs csi driver + storageclass (mounter: fuse).
Then create multiple cephfs pv/pvc and consume them with pods.
Actual results
Describe what happened
[167475.834265] Hardware name: RDO OpenStack Compute, BIOS 1.11.0-2.el7 04/01/2014 [167475.837017] Call Trace: [167475.838397] [<ffffffffb390e78e>] dump_stack+0x19/0x1b [167475.840546] [<ffffffffb390a110>] dump_header+0x90/0x229 [167475.842691] [<ffffffffb34d805b>] ? cred_has_capability+0x6b/0x120 [167475.845070] [<ffffffffb3397c44>] oom_kill_process+0x254/0x3d0 [167475.847349] [<ffffffffb34d813e>] ? selinux_capable+0x2e/0x40 [167475.849595] [<ffffffffb340f326>] mem_cgroup_oom_synchronize+0x546/0x570 [167475.852109] [<ffffffffb340e7a0>] ? mem_cgroup_charge_common+0xc0/0xc0 [167475.854595] [<ffffffffb33984d4>] pagefault_out_of_memory+0x14/0x90 [167475.856997] [<ffffffffb3908232>] mm_fault_error+0x6a/0x157 [167475.859178] [<ffffffffb391b8c6>] __do_page_fault+0x496/0x4f0 [167475.861395] [<ffffffffb391ba06>] trace_do_page_fault+0x56/0x150 [167475.863692] [<ffffffffb391af92>] do_async_page_fault+0x22/0xf0 [167475.865966] [<ffffffffb39177b8>] async_page_fault+0x28/0x30 [167475.868162] Task in /kubepods/burstable/pod6c6d1804-bd30-11e9-9fb1-fa163e63ddad/725f282aadbd6b6c9b540a8d64ce719560cca6dd9db685e52538f4012f08061f killed as a result of limit of /kubepods/burstable/pod6c6d1804-bd30-11e9-9fb1-fa163e63ddad/725f282aadbd6b6c9b540a8d64ce719560cca6dd9db685e52538f4012f08061f [167475.877505] memory: usage 1048576kB, limit 1048576kB, failcnt 54 [167475.879928] memory+swap: usage 1048576kB, limit 1048576kB, failcnt 0 [167475.882395] kmem: usage 10440kB, limit 9007199254740988kB, failcnt 0 [167475.884797] Memory cgroup stats for /kubepods/burstable/pod6c6d1804-bd30-11e9-9fb1-fa163e63ddad/725f282aadbd6b6c9b540a8d64ce719560cca6dd9db685e52538f4012f08061f: cache:0KB rss:1038136KB rss_huge:51200KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:1038064KB inactive_file:0KB active_file:0KB unevictable:0KB [167475.975145] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [167475.977958] [30026] 0 30026 32644 3917 24 0 994 cephcsi-cephfs [167475.980763] [32479] 0 32479 407518 83034 252 0 994 ceph-fuse [167475.983480] [19628] 0 19628 525737 176645 467 0 994 ceph-fuse [167475.986162] Memory cgroup out of memory: Kill process 23228 (ceph-fuse) score 1648 or sacrifice child [167475.989086] Killed process 19628 (ceph-fuse) total-vm:2102948kB, anon-rss:700684kB, file-rss:5896kB, shmem-rss:0kB
Expected behavior
A clear and concise description of what you expected to happen.
The nodeplugin should be more stable and have option to set how much memory to use.
Additional context
Add any other context about the problem here.
For example:
Any existing bug report which describe about the similar issue/behavior
The text was updated successfully, but these errors were encountered: