Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

when csi-plugin need exit and restart for upgrade or painc, pod will recive error msg that 'Transport endpoint is not connected', #91

Closed
huaizong opened this issue Oct 17, 2018 · 9 comments

Comments

@huaizong
Copy link

huaizong commented Oct 17, 2018

when csi-plugin exit and restart for upgrade or painc, pod will recive error msg that 'Transport endpoint is not connected', is cephfs-csi plan to support remount history mounted path when
csi-plugin start

df: ‘/var/lib/kubelet/pods/d132d662-d1d1-11e8-8297-28d24488ad30/volumes/kubernetes.io~csi/pvc-d123868ad1d111e8/mount’: Transport endpoint is not connected
@gman0
Copy link
Contributor

gman0 commented Oct 17, 2018

Please attach the plugin logs.

@huaizong huaizong changed the title when csi-plugin exit and restart for upgrade or painc, pod will recive error msg that 'Transport endpoint is not connected', when csi-plugin need exit and restart for upgrade or painc, pod will recive error msg that 'Transport endpoint is not connected', Oct 18, 2018
@huaizong
Copy link
Author

what i mean is that may be ceph-csi need a feature that can mounted history mounted path,

@rootfs
Copy link
Member

rootfs commented Oct 18, 2018

mountpath is given by kubelet. if the pod is deleted, the mountpoint will be gone too.

@huaizong
Copy link
Author

mountpath is given by kubelet. if the pod is deleted, the mountpoint will be gone too.

if we need upgrade ceph-csi now, we need to taint all node all drain all pod that used ceph-csi plugin , so if the ceph-csi plugin support remount last mounted path, may be ceph-csi plugin can support rolling update

@huaizong huaizong reopened this Oct 20, 2018
@tangle329
Copy link

I meet the same issue. Is there any solution for it? Do we need to monitor the plugin and drain node when it restart, panic or updated?

@rootfs
Copy link
Member

rootfs commented Nov 20, 2018

yes, drain the node before update. It is not the best solution but gives you some protection

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Mar 18, 2019

@rootfs do we need to do something in the code to fix this issue? if not can we close this one?

@rootfs
Copy link
Member

rootfs commented Mar 19, 2019

@Madhu-1 would you add a upgrade process in readme? for cephfs mount, drain the node before upgrade. I believe this process applies to other FUSE mount drivers too.

@huaizong
Copy link
Author

huaizong commented Mar 20, 2019

@rootfs as mentioned in #217 if csi plugin exit unexpect, the pod use cephfs pv can not auto recovery until pod be killed and reschedule. i think this is may be a problem. may be csi plugin can do more thing to remount the old path so when pod may be auto recovery when exit and restart , the old mount path can use

ShyamsundarR pushed a commit to ShyamsundarR/ceph-csi that referenced this issue Apr 25, 2019
issue ceph#217

Goal

we try to solve when csi exit unexpect, the pod use cephfs pv can not auto recovery because lost mount relation until pod be killed and reschedule to other node. i think this is may be a problem. may be csi plugin can do more thing to remount the old path so when pod may be auto recovery when pod exit and restart, the old mount path can use.

NoGoal

Pod should exit and restart when csi plugin pod exit and mount point lost. if pod not exit will get error of **transport endpoint is not connected**.

implment logic

csi-plugin start:

	1. load all MountCachEntry  from node local dir
	2. check if volID exist in cluster, if no we ignore this entry, if yes continue
	3. check if stagingPath exist, if yes we mount the path
	4. check if all targetPath exist, if yes we binmount to staging path

NodeServer:

1. NodeStageVolume: add MountCachEntry on local dir include readonly attr and ceph secret
2. NodeStagePublishVolume: add pod bind mount path to MountCachEntry  and persist local dir
3. NodeStageunPublishVolume: remove pod bind mount path From MountCachEntry  and persist local dir
4. NodeStageunStageVolume: remove MountCachEntry  from local dir
wilmardo pushed a commit to wilmardo/ceph-csi that referenced this issue Jul 29, 2019
issue ceph#217

Goal

we try to solve when csi exit unexpect, the pod use cephfs pv can not auto recovery because lost mount relation until pod be killed and reschedule to other node. i think this is may be a problem. may be csi plugin can do more thing to remount the old path so when pod may be auto recovery when pod exit and restart, the old mount path can use.

NoGoal

Pod should exit and restart when csi plugin pod exit and mount point lost. if pod not exit will get error of **transport endpoint is not connected**.

implment logic

csi-plugin start:

	1. load all MountCachEntry  from node local dir
	2. check if volID exist in cluster, if no we ignore this entry, if yes continue
	3. check if stagingPath exist, if yes we mount the path
	4. check if all targetPath exist, if yes we binmount to staging path

NodeServer:

1. NodeStageVolume: add MountCachEntry on local dir include readonly attr and ceph secret
2. NodeStagePublishVolume: add pod bind mount path to MountCachEntry  and persist local dir
3. NodeStageunPublishVolume: remove pod bind mount path From MountCachEntry  and persist local dir
4. NodeStageunStageVolume: remove MountCachEntry  from local dir
Rakshith-R referenced this issue in Rakshith-R/ceph-csi May 26, 2022
sync devel branch with upstream
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants