-
Notifications
You must be signed in to change notification settings - Fork 554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document upgrade procedure for CSI nodeplugins #703
Comments
Upgrading the CSI driver will cause all the mounts to become stale? This seems like a blocker for upgrades. What's the workaround to keep a mount available during any upgrade? You have to failover all pods on one node, then upgrade its csi driver? |
Yes (when using the referenced backend drivers). We are working on a way to preserve the |
Also, for CephFS-FUSE driver we do have the feature to preserve mounts on the system, post restart via the |
The stated feature and options do not work for CephFS. The reasons are as follows,
IOW, the pod gets its own bind mount, and it really does not matter at this point in time if we lose the stage and publish mount points, what matters is that the pod also depends on the specific instance of cephfs-fuse to back the mount point. The feature to restart the bind mounts in cephFS are only contained to the CSI stage and publish paths, hence on a restart, when a new cephfs-fuse instance is mounted to the stage path and subsequently bind mounted to the publish path, these 2 paths become healthy. The pods bind mount however is never refreshed, as that is out of the control of CSI, and the CO (kubernetes in this case) has no reason to refresh that path due to nodeplugin restarts (also runc is already running the container, so even if it was required it would not be possible). As a result of the above, when using cephfs, at present even with the |
@batrick @joscollin ^^ |
It is ironic that the container movement partially started out of a desire to avoid dependency hell with shared libraries and other system files and yet here we are resolving dependencies between pods/infrastructure. @ShyamsundarR it's not clear to me if this is a a result of dysfunction in the CSI "nodeplugin" interface or what. What is an example of a shared file system that is supposed to survive this upgrade process? The bind mount will surely become stale as soon as the file system proxy (FUSE in this case) is restarted? |
@batrick The CephFS kernel driver will survive (as will krbd). The issue arises when we start mixing userspace daemons that are backing kernel file systems / block devices. This upgrade issue is only applicable for the ceph-fuse and rbd-nbd userspace tools since those daemons are run within the CSI node plugin pod, so when that pod gets upgraded, it results in those daemons getting killed. We are moving |
Right.
I understand moving the rbd-nbd userspace agent to another pod. Is that so you can avoid upgrades for the running application pods? For ceph-fuse, we would never want to upgrade the client mount while an application pod is using it.
There's no way to recover the mount, no. That is unlikely to ever change. |
You can adopt a similar approach. Move ceph-fuse to its own pod, have it run in the foreground (it becomes the container's pid 1) but spawn child processes or threads for each mount. Re-use the Ceph admin-daemon to communicate between this new ceph-fuse pod and the CephFS CSI node plugin (e.g. "ceph --admin-daemon /shared/path/to/the/ceph-fuse-pod.asok mount ...."), and boom goes the dynamite so long as ceph-fuse doesn't crash. |
I'm not sure why we wouldn't have a |
The alternative here is to drain a node of application pods using PVs backed by fuse-cephfs (and as an extension rbd-nbd), before upgrading the CSI nodeplugin. IOW, move pods out of the node before an upgrade of the nodeplugin pod. In prior to CSI cases, the mount proxy was run on the host/node, hence was not tied to a pod and there was no such pod to upgrade as well in the first place. (this may not be true for all storage providers). If the proxy on the node needed to be upgraded, an upgrade of the node (resulting in application pod drain anyway) was performed. I am adding @phlogistonjohn and @raghavendra-talur for more commentary on pre-CSI cases where the proxy service, for example gluster fuse client, needed to be upgraded. |
Interesting, this may mean we should close this issue as a result, where I was toying with the thought of running a single ceph-fuse for all subvolumes on that node. |
Also one downside of this discussion is we need to pull out this PR from the code, as it serves no purpose at present #282 |
That is right. Even though we did not have pods for client operation, we had client rpms that needed upgrade. We followed the same rules for client rpms that are recommended for the I was not able to find any docs specifically for the CSI pods. |
I linked to the rules in the previous comment but the summary is that the worker nodes are drained, cordoned off before upgrading the kubelet. Admins prefer to do this when the usage of cluster is low and the upgrade of all nodes in the cluster might take days. Hence it is expected that the some nodes have lower version client and the other have higher version at a given point. |
CSI nodeplugins, specifically when using cephfs FUSE or rbd-nbd as the mounters, when upgraded, will cause existing mounts to become stale/not-rechable (usually connection timeout errors). This is due to losing the mount processes running within the CSI nodeplugin pods. This PR add updated the Daemonset update strategy based on the ENV variable to take care of above issue with some manual steps Moreinfo: ceph/ceph-csi#703 Resolves: rook#4248 Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
CSI nodeplugins, specifically when using cephfs FUSE or rbd-nbd as the mounters, when upgraded, will cause existing mounts to become stale/not-rechable (usually connection timeout errors). This is due to losing the mount processes running within the CSI nodeplugin pods. This PR add updated the Daemonset update strategy based on the ENV variable to take care of above issue with some manual steps Moreinfo: ceph/ceph-csi#703 Resolves: rook#4248 Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
The current strategy (as discussed with @Madhu-1) to address this is as follows,
The above at least ensures that there are no surprises for app pods using the storage on said nodes and the upgrade can be admin controlled as well. As updated above, @Madhu-1 is working on this in Rook to begin with rook/rook#4496 |
Here is another community discussion on the topic that is a useful read. |
CSI nodeplugins, specifically when using cephfs FUSE or rbd-nbd as the mounters, when upgraded, will cause existing mounts to become stale/not-rechable (usually connection timeout errors). This is due to losing the mount processes running within the CSI nodeplugin pods. This PR add updated the Daemonset update strategy based on the ENV variable to take care of above issue with some manual steps Moreinfo: ceph/ceph-csi#703 Resolves: rook#4248 Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
CSI nodeplugins, specifically when using cephfs FUSE or rbd-nbd as the mounters, when upgraded, will cause existing mounts to become stale/not-rechable (usually connection timeout errors). This is due to losing the mount processes running within the CSI nodeplugin pods. This PR add updated the Daemonset update strategy based on the ENV variable to take care of above issue with some manual steps Moreinfo: ceph/ceph-csi#703 Resolves: rook#4248 Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
CSI nodeplugins, specifically when using cephfs FUSE or rbd-nbd as the mounters, when upgraded, will cause existing mounts to become stale/not-rechable (usually connection timeout errors). This is due to losing the mount processes running within the CSI nodeplugin pods. This PR add updated the Daemonset update strategy based on the ENV variable to take care of above issue with some manual steps Moreinfo: ceph/ceph-csi#703 Resolves: rook#4248 Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
CSI nodeplugins, specifically when using cephfs FUSE or rbd-nbd as the mounters, when upgraded, will cause existing mounts to become stale/not-rechable (usually connection timeout errors). This is due to losing the mount processes running within the CSI nodeplugin pods. This PR add updated the Daemonset update strategy based on the ENV variable to take care of above issue with some manual steps Moreinfo: ceph/ceph-csi#703 Resolves: rook#4248 Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
CSI nodeplugins, specifically when using cephfs FUSE or rbd-nbd as the mounters, when upgraded, will cause existing mounts to become stale/not-rechable (usually connection timeout errors). This is due to losing the mount processes running within the CSI nodeplugin pods. This PR add updated the Daemonset update strategy based on the ENV variable to take care of above issue with some manual steps Moreinfo: ceph/ceph-csi#703 Resolves: rook#4248 Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
CSI nodeplugins, specifically when using cephfs FUSE or rbd-nbd as the mounters, when upgraded, will cause existing mounts to become stale/not-rechable (usually connection timeout errors). This is due to losing the mount processes running within the CSI nodeplugin pods. This PR add updated the Daemonset update strategy based on the ENV variable to take care of above issue with some manual steps Moreinfo: ceph/ceph-csi#703 Resolves: rook#4248 Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
CSI nodeplugins, specifically when using cephfs FUSE or rbd-nbd as the mounters, when upgraded, will cause existing mounts to become stale/not-rechable (usually connection timeout errors). This is due to losing the mount processes running within the CSI nodeplugin pods. This PR add updated the Daemonset update strategy based on the ENV variable to take care of above issue with some manual steps Moreinfo: ceph/ceph-csi#703 Resolves: rook#4248 Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
CSI nodeplugins, specifically when using cephfs FUSE or rbd-nbd as the mounters, when upgraded, will cause existing mounts to become stale/not-rechable (usually connection timeout errors).
This is due to losing the mount processes running within the CSI nodeplugin pods.
We need documented steps to ensure upgrades are smooth, even when upgrading to minor image versions, for bug fixes.
The text was updated successfully, but these errors were encountered: