remount old mount point when csi plugin unexpect exit #282

huaizong · 2019-03-25T14:49:04Z

Goal

we try to solve when csi exit unexpect, the pod use cephfs pv can not auto recovery because lost mount relation until pod be killed and reschedule to other node. i think this is may be a problem. may be csi plugin can do more thing to remount the old path so when pod may be auto recovery when pod exit and restart, the old mount path can use.

NoGoal

Pod should exit and restart when csi plugin pod exit and mount point lost. if pod not exit will get error of transport endpoint is not connected.

implment logic

csi-plugin start:

1. load all MountCachEntry  from node local dir
2. check if volID exist in cluster, if no we ignore this entry, if yes continue
3. check if stagingPath exist, if yes we mount the path
4. check if all targetPath exist, if yes we binmount to staging path

NodeServer:

NodeStageVolume: add MountCachEntry on local dir include readonly attr and ceph secret
NodeStagePublishVolume: add pod bind mount path to MountCachEntry and persist local dir
NodeStageunPublishVolume: remove pod bind mount path From MountCachEntry and persist local dir
NodeStageunStageVolume: remove MountCachEntry from local dir

issue #217 Goal we try to solve when csi exit unexpect, the pod use cephfs pv can not auto recovery because lost mount relation until pod be killed and reschedule to other node. i think this is may be a problem. may be csi plugin can do more thing to remount the old path so when pod may be auto recovery when pod exit and restart, the old mount path can use. NoGoal Pod should exit and restart when csi plugin pod exit and mount point lost. if pod not exit will get error of **transport endpoint is not connected**. implment logic csi-plugin start: 1. load all MountCachEntry from node local dir 2. check if volID exist in cluster, if no we ignore this entry, if yes continue 3. check if stagingPath exist, if yes we mount the path 4. check if all targetPath exist, if yes we binmount to staging path NodeServer: 1. NodeStageVolume: add MountCachEntry on local dir include readonly attr and ceph secret 2. NodeStagePublishVolume: add pod bind mount path to MountCachEntry and persist local dir 3. NodeStageunPublishVolume: remove pod bind mount path From MountCachEntry and persist local dir 4. NodeStageunStageVolume: remove MountCachEntry from local dir

huaizong · 2019-03-26T08:31:21Z

@rootfs @gman0

humblec · 2019-03-26T10:21:11Z

@huaizong First of all thanks for the PR. One question I have here is on below:

3. check if stagingPath exist, if yes we mount the path
4. check if all targetPath exist, if yes we binmount to staging path

To make sure I understand correctly, at time of plugin restart we are doing volumeConstruction of existing mount with a minimal operation which is "remount ". Are we good enough by just retrying it as "remount" or rather do we need to remount it with option rw explicit ? One other thought here is, is there a chance where we remount staging volume without an active pod in the node.

huaizong · 2019-03-26T12:28:21Z

@huaizong First of all thanks for the PR. One question I have here is on below:
3. check if stagingPath exist, if yes we mount the path
4. check if all targetPath exist, if yes we binmount to staging path
To make sure I understand correctly, at time of plugin restart we are doing volumeConstruction of existing mount with a minimal operation which is "remount ".

yes, csi plugin may restart because of unexpect exit and need to remount existing mount path, or pod use pv will lost connected to the mount path.

Are we good enough by just retrying it as "remount" or rather do we need to remount it with option rw explicit ?

staging path mount just remount, and when bindmount pod targetpath will bindmount with rw option

One other thought here is, is there a chance where we remount staging volume without an active pod in the node.

kubernetes call unstage when without an active pod in the node and the mount cache will clean, so when restart it will not mount staging path whithout active and pod in the node.

pkg/util/nodecache.go

pkg/cephfs/driver.go

pkg/cephfs/mountcache.go

gman0 · 2019-03-26T15:51:57Z

@huaizong the idea itself is fine, but there are issues with the implementation that need to be fixed before we can get this merged.

gman0 · 2019-03-26T15:57:53Z

Another thing - just putting it out there - can be done after the code reviews...deployment manifests need to be updated to mount the emptyDir by default, as well as the documentation which should mention the possibility of caching mount info.

2. support user-defined cache dir 3. if not define mountcachedir disable mountcache

huaizong · 2019-03-27T08:15:48Z

Another thing - just putting it out there - can be done after the code reviews...deployment manifests need to be updated to mount the emptyDir by default, as well as the documentation which should mention the possibility of caching mount info.

now this pr support user defined mount cache info save dir, also support deployment manifests update to mount the emptyDir by default.

huaizong · 2019-03-27T18:14:17Z

@humblec @gman0 PTAL

gman0 · 2019-03-27T18:18:14Z

@huaizong my apologies, i'm busy today...will continue with the review tomorrow

pkg/cephfs/driver.go

pkg/cephfs/mountcache.go

cmd/cephfs/main.go

pkg/cephfs/mountcache.go

huaizong · 2019-03-29T15:26:20Z

@gman0 @Madhu-1 @humblec

This PR is useful for scenarios that use ceph-fuse, and there is no obvious benefit to kernel-based scenarios, so whether it is worth adding complexity to support ceph-fuse scenarios.

pkg/cephfs/mountcache.go

humblec · 2019-04-01T12:03:44Z

@huaizong indeed this is useful. I have couple of quick comments. PTAL ..

pkg/cephfs/mountcache.go

pkg/cephfs/mountcache_test.go

pkg/cephfs/nodeserver.go

pkg/cephfs/mountcache.go

humblec · 2019-04-02T06:21:48Z

@huaizong Thanks !! LGTM.

@gman0 can you take a final look at this PR ? this is indeed good to have!

gman0

LGTM, thanks @huaizong for the work and thank you guys for help with the review

…n-exit-v2 remount old mount point when csi plugin unexpect exit

jianglingxia · 2020-01-07T06:38:19Z

Hi,all
the problem has resolved that Transport endpoint is not connected?
why i use the ceph csi driver v1.2.0 and kubernetes version is v1.13,when pod use cephfs pvc and stop the daemonset named container cephfsplugin,then my pod container exec df -h error log :Transport endpoint is not connected
and must restart the pod can resolved the problem,there has some method not restart the pod resolved the problem?thanks all

df -h

df: `/mnt': Transport endpoint is not connected
Filesystem Size Used Avail Use% Mounted on
rootfs 745G 50G 695G 7% /
overlay 745G 50G 695G 7% /
tmpfs 63G 0 63G 0% /dev
tmpfs 63G 0 63G 0% /sys/fs/cgroup

Madhu-1 · 2020-01-07T06:55:11Z

if you are using cephfs fuse client this issue is present.

Madhu-1 · 2020-01-07T06:56:17Z

@jianglingxia why you have stopped daemonset pods? this should be running as long as storage is needed

jianglingxia · 2020-01-07T07:15:42Z

I tested the scenero that when Power down and power on,then the ds and sts will restart it,so maybe the problem will exist,another scenero that when the ceph csi driver upgrade,the application pod use cephfs with cephfs fuse also will exist the problem,

using cephfs fuse client this issue is present --> the problem can resolved it or the commuty will plan to resolved?

Madhu-1 · 2020-01-07T07:35:16Z

I tested the scenero that when Power down and power on,then the ds and sts will restart it,so maybe the problem will exist,another scenero that when the ceph csi driver upgrade,the application pod use cephfs with cephfs fuse also will exist the problem,

we recommend you to use cephfs kernel client, if you use cephfs fuse client you need to drain the nodes to use the PVC again

the upgrade is covered i think doc #770

using cephfs fuse client this issue is present --> the problem can resolved it or the commuty will plan to resolved?

yes we will fix this issue, but there is no ETA yet

jianglingxia · 2020-01-10T08:35:53Z

the daemonset not upgrade,but restart the csi-cephfsplugin container,like follow:

ae33cf5ab2d2 ed6f186ec08a "/usr/local/bin/cephc" 21 hours ago Up 21 hours k8s_csi-cephfsplugin_ceph1-csi-cephfsplugin-czl9r_default_3fd90197-32d4-11ea-9d2e-744aa4028226_0

[root@paas-controller-172-20-0-3:/home/ubuntu]$ docker restart ae33cf5ab2d2
ae33cf5ab2d2

the minion mountpath is :
ceph-fuse 1.0G 0 1.0G 0% /paasdata/docker/plugins/kubernetes.io/csi/pv/pvc-8bdd24b9-3383-11ea-8500-744aa4028242/globalmount

the app nginx11-1-dp6lh use cephfs volume pvc-8bdd24b9-3383-11ea-8500-744aa4028242
nginx11-1-dp6lh 1/1 Running 0 9m2s 100.100.0.9 172.20.0.3

but the app named nginx11-1-dp6lh pod container error is
df -h
df: `/test': Transport endpoint is not connected
Filesystem Size Used Avail Use% Mounted on
rootfs 745G 50G 695G 7% /
overlay 745G 50G 695G 7% /
tmpfs 63G 0 63G 0% /dev
tmpfs 63G 0 63G 0% /sys/fs/cgroup
/dev/mapper/ncl-paasdata 745G 50G 695G 7% /dev/termination-log
/dev/mapper/ncl-paasdata 745G 50G 695G 7% /etc/resolv.conf
/dev/mapper/ncl-paasdata 745G 50G 695G 7% /etc/hostname

ls -ih /

ls: cannot access /test: Transport endpoint is not connected

why the app pod /test mount path can not read and write and must restart the app container,thanks for your reply!

jianglingxia · 2020-01-14T07:25:05Z

@Madhu-1
if need resolved the problem,need encode the csi driver or k8s code? and how to exec it ,thanks
maybe the problem cause some app not running

Madhu-1 · 2020-01-14T08:18:28Z

the daemonset not upgrade,but restart the csi-cephfsplugin container,like follow:

ae33cf5ab2d2 ed6f186ec08a "/usr/local/bin/cephc" 21 hours ago Up 21 hours k8s_csi-cephfsplugin_ceph1-csi-cephfsplugin-czl9r_default_3fd90197-32d4-11ea-9d2e-744aa4028226_0

[root@paas-controller-172-20-0-3:/home/ubuntu]$ docker restart ae33cf5ab2d2
ae33cf5ab2d2

the minion mountpath is :
ceph-fuse 1.0G 0 1.0G 0% /paasdata/docker/plugins/kubernetes.io/csi/pv/pvc-8bdd24b9-3383-11ea-8500-744aa4028242/globalmount

the app nginx11-1-dp6lh use cephfs volume pvc-8bdd24b9-3383-11ea-8500-744aa4028242
nginx11-1-dp6lh 1/1 Running 0 9m2s 100.100.0.9 172.20.0.3

but the app named nginx11-1-dp6lh pod container error is
df -h
df: `/test': Transport endpoint is not connected
Filesystem Size Used Avail Use% Mounted on
rootfs 745G 50G 695G 7% /
overlay 745G 50G 695G 7% /
tmpfs 63G 0 63G 0% /dev
tmpfs 63G 0 63G 0% /sys/fs/cgroup
/dev/mapper/ncl-paasdata 745G 50G 695G 7% /dev/termination-log
/dev/mapper/ncl-paasdata 745G 50G 695G 7% /etc/resolv.conf
/dev/mapper/ncl-paasdata 745G 50G 695G 7% /etc/hostname

ls -ih /

ls: cannot access /test: Transport endpoint is not connected

why the app pod /test mount path can not read and write and must restart the app container,thanks for your reply!

@jianglingxia please open a separate issue with reproducer steps

PR ceph#282 introduces the mount cache to solve cephfs fuse mount issue when cephfs plugin pod restarts .This is not working as intended. This PR removes the code for maintainability. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>

PR #282 introduces the mount cache to solve cephfs fuse mount issue when cephfs plugin pod restarts .This is not working as intended. This PR removes the code for maintainability. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>

Sync 'odf/devel' into ds/sync/release-4.16

gman0 reviewed Mar 26, 2019

View reviewed changes

pkg/util/nodecache.go Outdated Show resolved Hide resolved

gman0 reviewed Mar 26, 2019

View reviewed changes

pkg/cephfs/driver.go Outdated Show resolved Hide resolved

gman0 reviewed Mar 26, 2019

View reviewed changes