cephcsi-cephfs nodeplugin pod got OOM killing when using FUSE mounter #554

yydzhou · 2019-08-14T17:47:12Z

Describe the bug

When using cephfs FUSE mounter in cephfs csi, the cephcsi-cephfs nodeplugin pod is very easy to get killed due to OOM. I have tried with 256M and then 1G resource setting, but the issue still happens.

A clear and concise description of what the bug is.

Environment details

Image/version of Ceph CSI driver
image: quay.io/cephcsi/cephfsplugin:v1.0.0
helm chart version
Kubernetes cluster version
1.13.4
Logs

Steps to reproduce

Steps to reproduce the behavior:
deploying ceph and then cephfs csi driver + storageclass (mounter: fuse).
Then create multiple cephfs pv/pvc and consume them with pods.

Setup details: '...'
Deployment to trigger the issue '....'
See error

Actual results

Describe what happened
[167475.834265] Hardware name: RDO OpenStack Compute, BIOS 1.11.0-2.el7 04/01/2014 [167475.837017] Call Trace: [167475.838397] [<ffffffffb390e78e>] dump_stack+0x19/0x1b [167475.840546] [<ffffffffb390a110>] dump_header+0x90/0x229 [167475.842691] [<ffffffffb34d805b>] ? cred_has_capability+0x6b/0x120 [167475.845070] [<ffffffffb3397c44>] oom_kill_process+0x254/0x3d0 [167475.847349] [<ffffffffb34d813e>] ? selinux_capable+0x2e/0x40 [167475.849595] [<ffffffffb340f326>] mem_cgroup_oom_synchronize+0x546/0x570 [167475.852109] [<ffffffffb340e7a0>] ? mem_cgroup_charge_common+0xc0/0xc0 [167475.854595] [<ffffffffb33984d4>] pagefault_out_of_memory+0x14/0x90 [167475.856997] [<ffffffffb3908232>] mm_fault_error+0x6a/0x157 [167475.859178] [<ffffffffb391b8c6>] __do_page_fault+0x496/0x4f0 [167475.861395] [<ffffffffb391ba06>] trace_do_page_fault+0x56/0x150 [167475.863692] [<ffffffffb391af92>] do_async_page_fault+0x22/0xf0 [167475.865966] [<ffffffffb39177b8>] async_page_fault+0x28/0x30 [167475.868162] Task in /kubepods/burstable/pod6c6d1804-bd30-11e9-9fb1-fa163e63ddad/725f282aadbd6b6c9b540a8d64ce719560cca6dd9db685e52538f4012f08061f killed as a result of limit of /kubepods/burstable/pod6c6d1804-bd30-11e9-9fb1-fa163e63ddad/725f282aadbd6b6c9b540a8d64ce719560cca6dd9db685e52538f4012f08061f [167475.877505] memory: usage 1048576kB, limit 1048576kB, failcnt 54 [167475.879928] memory+swap: usage 1048576kB, limit 1048576kB, failcnt 0 [167475.882395] kmem: usage 10440kB, limit 9007199254740988kB, failcnt 0 [167475.884797] Memory cgroup stats for /kubepods/burstable/pod6c6d1804-bd30-11e9-9fb1-fa163e63ddad/725f282aadbd6b6c9b540a8d64ce719560cca6dd9db685e52538f4012f08061f: cache:0KB rss:1038136KB rss_huge:51200KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:1038064KB inactive_file:0KB active_file:0KB unevictable:0KB [167475.975145] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [167475.977958] [30026] 0 30026 32644 3917 24 0 994 cephcsi-cephfs [167475.980763] [32479] 0 32479 407518 83034 252 0 994 ceph-fuse [167475.983480] [19628] 0 19628 525737 176645 467 0 994 ceph-fuse [167475.986162] Memory cgroup out of memory: Kill process 23228 (ceph-fuse) score 1648 or sacrifice child [167475.989086] Killed process 19628 (ceph-fuse) total-vm:2102948kB, anon-rss:700684kB, file-rss:5896kB, shmem-rss:0kB

Expected behavior

A clear and concise description of what you expected to happen.
The nodeplugin should be more stable and have option to set how much memory to use.

Additional context

Add any other context about the problem here.

For example:

Any existing bug report which describe about the similar issue/behavior

The text was updated successfully, but these errors were encountered:

ShyamsundarR · 2019-08-14T18:06:12Z

@yydzhou Is the test to only create pods that consume the PVCs or do the pods also perform any IO or filesystem operation?

cc @ajarr can you help understand the behavior here?

yydzhou · 2019-08-14T18:08:47Z

Yes, the pods will also perform I/O and filesystem operations.

Madhu-1 · 2019-08-16T06:20:35Z

@ajarr @poornimag PTAL

ajarr · 2019-08-16T12:50:03Z

@yydzhou , what's the version of Ceph cluster and FUSE client (ceph-fuse)? Exact version would be helpful to know, 14.2.x?

After how many PVCs consumed by the pods and doing I/Os, do you hit OOM?

ajarr · 2019-08-16T12:51:04Z

@ShyamsundarR , are 256 M and 1G typical memory limit setting for CephFS and RBD node plugins?

ShyamsundarR · 2019-08-16T13:35:56Z

@ShyamsundarR , are 256 M and 1G typical memory limit setting for CephFS and RBD node plugins?

I am not aware of the memory constraints or usage of RBD. I think we need to understand this better.

For example, switching to kernel cephfs and/or krbd instead of rbd-nbd may not charge the container namespace (in this case the nodeplugin) the memory overhead and further would let the kernel manage space reclamation based on usage. I think this may be a better direction in the longer run (unless I understood rbd-ndb incorrectly, i.e where the blocks are cached).

The issue may come down to, how to let cephfs FUSE know how much space it has to operate with cached data (beyond the usual must-have memory footprint) and if we can share this across the various FUSE mounts, or we need to think about a single FUSE mount to control this consumption (i.e #476 ).

cc @dillaman

yydzhou · 2019-08-16T13:53:49Z

@ajarr The ceph version is 13.2.4. The OOM happens when mounting the 3rd pv on the node. I am using cephfs-csi chart release 1.0. So I assume the ceph.fuse version is 14.2.x? I think Once the pv is mounted the related pods will issue some I/O against it. But not that much. Because the failure happens during the deployment of our cluster.

yydzhou · 2019-08-16T14:00:44Z

JFYI, Using of ceph.FUSE mounting is to support centos with old kernel (3.10). The external cephfs provisioner has an option to disable RADOS pool namespace ioslation to allow old kernel mount (ref kubernetes-retired/external-storage@4fefaf6#diff-3ccb4687fb599e0453570308087f8252). But seems the cephfs-csi does not support that. So we have to use FUSE mounting, before we upgraded all our kernel version.

ajarr · 2019-08-16T14:08:14Z

@ShyamsundarR , are 256 M and 1G typical memory limit setting for CephFS and RBD node plugins?

I am not aware of the memory constraints or usage of RBD. I think we need to understand this better.

For example, switching to kernel cephfs and/or krbd instead of rbd-nbd may not charge the container namespace (in this case the nodeplugin) the memory overhead and further would let the kernel manage space reclamation based on usage. I think this may be a better direction in the longer run (unless I understood rbd-ndb incorrectly, i.e where the blocks are cached).

The issue may come down to, how to let cephfs FUSE know how much space it has to operate with cached data (beyond the usual must-have memory footprint) and if we can share this across the various FUSE mounts, or we need to think about a single FUSE mount to control this consumption (i.e #476 ).

cc @dillaman

@ batrick FYI

ajarr · 2019-08-16T14:17:47Z

@ajarr The ceph version is 13.2.4. The OOM happens when mounting the 3rd pv on the node.

This is surprising. Not sure how other CephFS CSI v1.0.0 users didn't hit this.

I am using cephfs-csi chart release 1.0. So I assume the ceph.fuse version is 14.2.x? I think Once the pv is mounted the related pods will issue some I/O against it. But not that much. Because the failure happens during the deployment of our cluster.

This is helpful information. I'll try tracking down the issue.

Madhu-1 · 2019-09-11T04:17:12Z

even I have seen this issue with more I/O the memory consumption of fuse increases.
@poornimag can you confirm it.

ajarr · 2019-09-11T10:12:51Z

@joscollin can you also take a look?

rochaporto · 2019-10-23T16:03:17Z

Any news on this one? We're seeing the same issue, even in nodes with no PVs mounted after a couple days they go OOM. Dropping the resource requests the nodes eventually die.

stale · 2020-10-04T13:17:14Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

stale · 2020-10-12T01:33:47Z

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.

yydzhou changed the title ~~cephcsi-cephfs nodeplugin pod got killed when using FUSE mounter~~ cephcsi-cephfs nodeplugin pod got OOM killing when using FUSE mounter Aug 14, 2019

ajarr added bug Something isn't working component/cephfs Issues related to CephFS labels Aug 16, 2019

This was referenced Sep 16, 2019

allowing kernel mount for versions of kernel version < 4.17 #617

Closed

cephfs: Choose kernel mounter when mounter specified is kernel #623

Closed

ajarr assigned joscollin Nov 25, 2019

stale bot added the wontfix This will not be worked on label Oct 4, 2020

stale bot closed this as completed Oct 12, 2020

humblec mentioned this issue Nov 3, 2021

Improve detection and recovery of broken ceph-fuse mounts #2616

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cephcsi-cephfs nodeplugin pod got OOM killing when using FUSE mounter #554

cephcsi-cephfs nodeplugin pod got OOM killing when using FUSE mounter #554

yydzhou commented Aug 14, 2019

ShyamsundarR commented Aug 14, 2019

yydzhou commented Aug 14, 2019

Madhu-1 commented Aug 16, 2019

ajarr commented Aug 16, 2019

ajarr commented Aug 16, 2019 •

edited

Loading

ShyamsundarR commented Aug 16, 2019

yydzhou commented Aug 16, 2019

yydzhou commented Aug 16, 2019

ajarr commented Aug 16, 2019 •

edited

Loading

ajarr commented Aug 16, 2019

Madhu-1 commented Sep 11, 2019

ajarr commented Sep 11, 2019

rochaporto commented Oct 23, 2019

stale bot commented Oct 4, 2020

stale bot commented Oct 12, 2020

cephcsi-cephfs nodeplugin pod got OOM killing when using FUSE mounter #554

cephcsi-cephfs nodeplugin pod got OOM killing when using FUSE mounter #554

Comments

yydzhou commented Aug 14, 2019

Describe the bug

Environment details

Steps to reproduce

Actual results

Expected behavior

Additional context

ShyamsundarR commented Aug 14, 2019

yydzhou commented Aug 14, 2019

Madhu-1 commented Aug 16, 2019

ajarr commented Aug 16, 2019

ajarr commented Aug 16, 2019 • edited Loading

ShyamsundarR commented Aug 16, 2019

yydzhou commented Aug 16, 2019

yydzhou commented Aug 16, 2019

ajarr commented Aug 16, 2019 • edited Loading

ajarr commented Aug 16, 2019

Madhu-1 commented Sep 11, 2019

ajarr commented Sep 11, 2019

rochaporto commented Oct 23, 2019

stale bot commented Oct 4, 2020

stale bot commented Oct 12, 2020

ajarr commented Aug 16, 2019 •

edited

Loading

ajarr commented Aug 16, 2019 •

edited

Loading