-
Notifications
You must be signed in to change notification settings - Fork 554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remount old mount point when csi plugin unexpect exit #282
remount old mount point when csi plugin unexpect exit #282
Conversation
issue #217 Goal we try to solve when csi exit unexpect, the pod use cephfs pv can not auto recovery because lost mount relation until pod be killed and reschedule to other node. i think this is may be a problem. may be csi plugin can do more thing to remount the old path so when pod may be auto recovery when pod exit and restart, the old mount path can use. NoGoal Pod should exit and restart when csi plugin pod exit and mount point lost. if pod not exit will get error of **transport endpoint is not connected**. implment logic csi-plugin start: 1. load all MountCachEntry from node local dir 2. check if volID exist in cluster, if no we ignore this entry, if yes continue 3. check if stagingPath exist, if yes we mount the path 4. check if all targetPath exist, if yes we binmount to staging path NodeServer: 1. NodeStageVolume: add MountCachEntry on local dir include readonly attr and ceph secret 2. NodeStagePublishVolume: add pod bind mount path to MountCachEntry and persist local dir 3. NodeStageunPublishVolume: remove pod bind mount path From MountCachEntry and persist local dir 4. NodeStageunStageVolume: remove MountCachEntry from local dir
@huaizong First of all thanks for the PR. One question I have here is on below:
To make sure I understand correctly, at time of plugin restart we are doing volumeConstruction of existing mount with a minimal operation which is "remount ". Are we good enough by just retrying it as "remount" or rather do we need to remount it with option |
yes, csi plugin may restart because of unexpect exit and need to remount existing mount path, or pod use pv will lost connected to the mount path.
staging path mount just remount, and when bindmount pod targetpath will bindmount with
kubernetes call unstage when without an active pod in the node and the mount cache will clean, so when restart it will not mount staging path whithout active and pod in the node. |
@huaizong the idea itself is fine, but there are issues with the implementation that need to be fixed before we can get this merged. |
Another thing - just putting it out there - can be done after the code reviews...deployment manifests need to be updated to mount the |
2. support user-defined cache dir 3. if not define mountcachedir disable mountcache
now this pr support user defined mount cache info save dir, also support deployment manifests update to mount the |
@huaizong my apologies, i'm busy today...will continue with the review tomorrow |
@huaizong indeed this is useful. I have couple of quick comments. PTAL .. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @huaizong for the work and thank you guys for help with the review
…n-exit-v2 remount old mount point when csi plugin unexpect exit
Hi,all df -hdf: `/mnt': Transport endpoint is not connected |
if you are using cephfs fuse client this issue is present. |
@jianglingxia why you have stopped daemonset pods? this should be running as long as storage is needed |
I tested the scenero that when Power down and power on,then the ds and sts will restart it,so maybe the problem will exist,another scenero that when the ceph csi driver upgrade,the application pod use cephfs with cephfs fuse also will exist the problem, using cephfs fuse client this issue is present --> the problem can resolved it or the commuty will plan to resolved? |
we recommend you to use cephfs kernel client, if you use cephfs fuse client you need to drain the nodes to use the PVC again the upgrade is covered i think doc #770
yes we will fix this issue, but there is no ETA yet |
the daemonset not upgrade,but restart the csi-cephfsplugin container,like follow: ae33cf5ab2d2 ed6f186ec08a "/usr/local/bin/cephc" 21 hours ago Up 21 hours k8s_csi-cephfsplugin_ceph1-csi-cephfsplugin-czl9r_default_3fd90197-32d4-11ea-9d2e-744aa4028226_0 [root@paas-controller-172-20-0-3:/home/ubuntu]$ docker restart ae33cf5ab2d2 the minion mountpath is : the app nginx11-1-dp6lh use cephfs volume pvc-8bdd24b9-3383-11ea-8500-744aa4028242 but the app named nginx11-1-dp6lh pod container error is ls -ih /ls: cannot access /test: Transport endpoint is not connected why the app pod /test mount path can not read and write and must restart the app container,thanks for your reply! |
@Madhu-1 |
@jianglingxia please open a separate issue with reproducer steps |
PR ceph#282 introduces the mount cache to solve cephfs fuse mount issue when cephfs plugin pod restarts .This is not working as intended. This PR removes the code for maintainability. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
PR #282 introduces the mount cache to solve cephfs fuse mount issue when cephfs plugin pod restarts .This is not working as intended. This PR removes the code for maintainability. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Sync 'odf/devel' into ds/sync/release-4.16
issue #217 #91
Goal
we try to solve when csi exit unexpect, the pod use cephfs pv can not auto recovery because lost mount relation until pod be killed and reschedule to other node. i think this is may be a problem. may be csi plugin can do more thing to remount the old path so when pod may be auto recovery when pod exit and restart, the old mount path can use.
NoGoal
Pod should exit and restart when csi plugin pod exit and mount point lost. if pod not exit will get error of transport endpoint is not connected.
implment logic
csi-plugin start:
NodeServer: