How to migrate from csi-driver 0.3 -> csi-driver csi-1.0 #296

sbskas · 2019-03-30T13:19:48Z

Since driver name changed and PV are tagged with driver-name, how do one migrate the already created PV from pre-csi-1.0 version to the new version.

Changing the driver in the persistent volume fails with :
"# persistentvolumes "pvc-638476e6264211e9" was not valid:

* spec.persistentvolumesource: Forbidden: is immutable after creation"

What's the plan then to migrate smoothly from csi-rbdplugin to rbd.csi.ceph.com ?

Madhu-1 · 2019-04-01T06:20:44Z

@ShyamsundarR @humblec do we support migration?

ShyamsundarR · 2019-04-02T12:51:57Z

@sbskas Currently the PV naming scheme in 1.0 is undergoing more changes, thus the naming would be different than what was present in v0.3 of the Ceph-CSI implementation. As this is immutable post creation, a direct upgrade to the v1.0 version from v0.3 version is not feasible.

We also have to analyze the breaking changes from the perspective of the CSI spec itself to understand what else may break on such upgrades.

As of now we do not have any migration steps to move from 0.3 to 1.0. Possibilities are,

PV to PV migration of data, where both instances of the CSI plugin exist, and a pod can use the older and newer PVs to copy out the data.
If kubernetes allows it, change the PV names in the backend (rbd/cephfs) and create the requres omaps, and change the VolID as kubernetes know it (this is speculative at best for now)

@gman0 @rootfs thoughts?

sbskas · 2019-04-03T13:21:14Z

@ShyamsundarR, naming is not an issue. PV has all information to work by itself, even after storageclass deletion as long as finalizer,provisioner, and attacher are able to process it. My main point is indeed the PV are immutable so are the pvc.
Could the driver take in charge of both pv (rdb-cephplugin + rdb.csi.ceph.com) ? That would be a goot starting point. Then implement some kind of migration using maybe kubernetes-csi-migration-library.
That will be a thrill. Up to now there's only reference implementation for GCE, but maybe rbd <-> csi-rbd would be doable ?

ShyamsundarR · 2019-04-23T16:22:45Z

@ShyamsundarR, naming is not an issue. PV has all information to work by itself, even after storageclass deletion as long as finalizer,provisioner, and attacher are able to process it. My main point is indeed the PV are immutable so are the pvc.
Could the driver take in charge of both pv (rdb-cephplugin + rdb.csi.ceph.com) ? That would be a goot starting point. Then implement some kind of migration using maybe kubernetes-csi-migration-library.
That will be a thrill. Up to now there's only reference implementation for GCE, but maybe rbd <-> csi-rbd would be doable ?

The mentioned repository is now updated to point to this one instead: https://github.com/kubernetes/kubernetes/tree/master/staging/src/k8s.io/csi-translation-lib

ShyamsundarR · 2019-04-24T23:39:16Z

@kfox1111 @sbskas

Question: Supposing we state the following support statement for volumes (and snapshots) created using existing 0.3 and 1.0 versions of the Ceph-CSI drivers (rbd and CephFS), is it sufficient to address any concerns with already in use plugins?

For volumes created using 0.3 and existing 1.0 versions of the Ceph-CSI plugins the following actions would be supported by a future version of the plugin,

DeleteVolume
DeleteSnapshot
NodePublishVolume (IOW, mounting and using the volume for required IO operations)

And, the following would be unsupported:

CreateVolume from snapshot source which is from an older version
CreateSnapshot from volume source which is from an older version

kfox1111 · 2019-04-24T23:53:26Z

I think there's multiple things maybe here....

This issue is talking about specifically 0.3 -> 1.0 ?

The driver rename happend post 1.0. as I got hit by that and I manually burned down all my 0.3 volumes and deployed 1.0 fresh and then the driver rename happened.

Then there's the issue where they want to drop configmaps and push that info into ceph directly (I support this)

But thats already potentially 2 issues, post 1.0 where things are breaking or close to breaking.

I'm kind of ok not having a migration path from 0.3 to 1.0 as it doesn't effect me anymore. I already took that hit. but post 1.0 that is a problem. we guarantee some stability so should provide some kind of migration path or ongoing support.

For the configmap -> ceph migration, I'd be ok with some out of band tool that dumped the configmaps and loaded them into ceph. Then the code doesn't have to stick around in driver forever. Make it a migration step.

sbskas · 2019-04-25T00:58:13Z

For me, I think it should be enough.
Just one question though : why not mutate the pv object to the new format and manage it like created objects with the new drivers ?

kfox1111 · 2019-04-25T16:05:40Z

the pv's cant be edited after they are created. The apiserver blocks changes.

kfox1111 · 2019-04-25T16:08:42Z

I can see 3 possible solutions to that:

the kubernetes storage team provides an option to allow "unsafe" api editing of pv's.
we try a procedure like: shutdown the driver, backup all relevant pv's. Delete them. update the driver name in backups, import from backups.
We shutdown the apiserver and tweak the docs directly in etcd.

option two might be the most expedient and safe

ShyamsundarR · 2019-04-25T17:41:31Z

I can see 3 possible solutions to that:

1. the kubernetes storage team provides an option to allow "unsafe" api editing of pv's.

I would not vote for the above, just for this plugin, seems like a lot of work.

2. we try a procedure like: shutdown the driver, backup all relevant pv's. Delete them. update the driver name in backups, import from backups.

Wouldn't the PV delete step move it to "pending" state if the driver is not up? and, hence not really delete the PV till the driver is back up again, at which point the real PV on Ceph gets deleted as well?

To prevent this, would an option in 0.3/1.0 versions of the plugin work, that would fake a delete? IOW, respond success for a delete, but in reality not delete anything in the Ceph Cluster? (just a thought, as I attempt to run these steps on a test cluster)

3. We shutdown the apiserver and tweak the docs directly in etcd.
option two might be the most expedient and safe

kfox1111 · 2019-04-25T17:47:03Z

Yeah. maybe an extra step of: update the pv's to have state so they get deleted is needed. If the driver isn't running, its probably ok?

JohnStrunk · 2019-04-25T19:13:50Z

Another migration option would be to swap things around directly on the ceph back end...

Assume: Both 0.3 and 1.0 can co-exist during the migration

Starting w/ a 0.3 PVC/PV pair: pvc/myvol and pv/myvol:

Provision a new, empty volume via the 1.0 provisioner: pvc/myvol-1.0 and pv/myvol-1.0.
By looking at the metadata, it should be possible to figure out both the original (pv/myvol) RBD volume name and the new (pv/myvol-1.0) RBD volume name.
Delete (directly on the back end) the new RBD volume
Clone the old RBD volume into that new name
(flatten, maybe?)
Delete via kube pvc/myvol original. This will cascade through and clean up the rest of the 0.3 resources.

The original data should now be visible in the new PVC: pvc/myvol-1.0.

If you really must have the same PVC name, either CSI clone it back into pvc/myvol or do the above steps again.

ShyamsundarR · 2019-04-26T20:03:23Z

Ran the above procedure on a test cluster, and it works as required (thanks @JohnStrunk ).

@kfox1111 and @sbskas this would be the way to migrate, without leaving behind any older PVCs with older metadata for a longer duration, or without having to copy out the data. Let me know your thoughts, and we can possibly coordinate on scripts that can help achieve the same.

Here are some more intermediate steps that can help clarify the procedure better:

Install and configure latest 1.0 CSI plugins into namespace "latest-ceph-csi" with driverName <name>.rbd.csi.ceph.com
- say latest is the stateless version from master at some point in the future
- also ensure that the socket directory is changed to reflect the driver name as above
Remove older 0.3/1.0 storage class
(Stops further PV creation using the older plugin)
Create storageclass pointing to the "latest" instance of the CSI plugin, if needed with the same name as the just deleted storageclass
Gather data from existing PVCs
- kubectl get -A pvc -o jsonpath='{range .items[*]}{@.spec.storageClassName}{","}{@.metadata.name}{","}{@.spec.volumeName}{"\n"}{end}' | grep -E "^@${STORAGECLASSNAME}" > existing-pvcs.txt
- Each line would be a comma separated <storageclass-name,pvc-name,image-name>
  - Where image-name is the name of the RBD image backing the PVC in 0.3/1.0 versions of the CSI plugin
Create new PVCs for the above list of existing PVCs (named say, "-latest")
- Ideally, size would also need to be gathered above to create PVCs of the same size
Gather data for new PVCs
- Get the <storageclass-name,pvc-name,volume-name> (as before)
  - kubectl get -A pvc -o jsonpath='{range .items[*]}{@.spec.storageClassName}{","}{@.metadata.name}{","}{@.spec.volumeName}{"\n"}{end}' | grep -E "^@${STORAGECLASSNAME}" > new-pvcs-tmp.txt
- Further, get the Volume handle and hence the rbd image name, from the backing PV for each PVC like so,
  - for i in $(cat new-pvcs.out); do pv=$(echo $i | cut -d ',' -f 3); imgSuffix=$(kubectl get pv $pv -o jsonpath='{@.spec.csi.volumeHandle}{"\n"}' | cut -d '-' -f 5-); j=$(echo $i | cut -d ',' -f 1,2); echo $j,csi-vol-$imgSuffix; done > new-pvcs.txt
Stop workloads using the the PVCs under migration
The 2 lists generated above have the the new and old PVCs and its corresponding image name on Ceph for the following operations of replacing the RBD image backing the new PVCs

  - rbd snap create --pool <poolname> --snap <oldpvc-volumename>-snap <oldpvc-volumename>
  - rbd snap protect <poolname>/<oldpvc-volumename>@<oldpvc-volumename>-snap
  - rbd rm replicapool/<newpvc-volumename>
  - rbd clone <poolname>/<oldpvc-volumename>@<oldpvc-volumename>-snap replicapool/<newpvc-volumename>
  - rbd flatten replicapool/<newpvc-volumename>
  - rbd snap unprotect <poolname>/<oldpvc-volumename>@<oldpvc-volumename>-snap
  - rbd snap rm <poolname>/<oldpvc-volumename>@<oldpvc-volumename>-snap

Delete the existing PVCs
Modify the stopped workloads (pods) using the older PVCs to rename them to the new PVCs, and start the workloads
- If this modification is intrusive, repeat the steps, as suggested, post deleting the existing PVCs to flip the names again
Delete the older 0.3/1.0 CSI plugin pods and configuration, once all PVCs are migrated

kfox1111 · 2019-04-26T20:51:58Z

Thanks for coming up with the procedure. looks like that was a fair amount of work.

But it is also a very long procedure that has many potential places where a mistake could be made.

I'm going to ask advice on the k8s sig-storage channel to see what other options are available. If there was a quick way to rename the driver under the hood, that may be preferable?

sbskas · 2019-04-27T10:51:05Z

Indeed, this is what we did to migrate the pv.
However, there is still few roadblocks on the roard.
1- we have been unable to install both driver together. Either it's 1.0, or 0.3 not both. The csinodeinfo/driverregistrar features whatever barfs with the old driver registration.
2- the pv must be migrated with all the applications using the PV down (i.e. scaled to 0). Quite a big blackout.
3- Last, once the new PV have been created with the old named, we saw that deleting the PVC cleans up all relative ressources in kubernetes but forgets to effectivelment delete the rbd.

Isn't there a simpler way of migrating thing like a mutatingwebhook to change the pv ?

I was pointing to the csi-translation-library since one seems to be able to migrate old style PV to the new CSI without all those steps and I was wondering how they managed to do it ?
They seems to be able to mutate the nodes to the new specs in place.

kfox1111 · 2019-04-29T16:03:08Z

@msau42 any ideas?

msau42 · 2019-04-29T16:24:48Z

The "simplest" I know of currently is to:

VERY IMPORTANT: Modify all your PVs to use reclaim policy Retain!
Create new copies of the PV.spec object with the new driver name. In that new copy, specify the same PV.spec.ClaimRef.Name/Namespace (but not uid).
One by one, delete pod, then PVC. Recreate a new PVC (if you're not using statefulsets) with the same name. Since the newPV.spec.ClaimRef is pointing to that same PVC name, the system will automatically bind the PVC to newPV. You might be able to skip the pod deletion part if you force remove the PVC finalizer. But I haven't tried that out.
Delete all the old PVs.
Change your new PVs reclaim policy back to Delete (if it was previously)

kfox1111 · 2019-04-29T16:29:57Z

Hmm.. sounds like that might work... still a good chance of messing something up in the process.

How possible would it be to make the csidriver field of the pv editable?

msau42 · 2019-04-29T16:46:31Z

That may fix this particular case, but doesn't solve the general issue of how to "upgrade" a PV.

kfox1111 · 2019-04-29T16:57:01Z

The problem is, pvc's and pv's are the abstraction between the user (pvc) and admin (pv) allowing scaleout of workload. If the admin has to deal with pvc's then it re-intertwines the workload from the cluster maintenance. This is especially hard on multitenant clusters. Being able to have a procedure that leaves the pvc's alone would be a huge benefit.

Solving it completely generally may not be possible.

Renaming a csidriver I expect to be a relatively common issue, as people will either not realize there is a convention and switch once they realize (like this driver) or the driver will someday change hands and want the driver name to go along. So, I'm kind of ok trying to find a solution to it, separately from trying to find a more general solution to allow editing of any pv. Is that reasonable?

msau42 · 2019-04-29T17:00:01Z

Name of the driver should be treated like any other API field. Once something is 1.0, then there are strict backwards compatibility rules that drivers need to follow.

Anything pre 1.0 though does not have such strict guarantees.

msau42 · 2019-04-29T17:01:43Z

On another note, you should be able to run a 0.3 driver with name X, and a 1.0 with name Y alongside each other. They should be seen as 2 different drivers. However, you may need to make sure they're not clobbering each other's sockets

kfox1111 · 2019-04-29T17:08:31Z

so, should the driver be renamed back, since it started life in 1.0 with the other name?

ShyamsundarR · 2019-04-29T17:16:59Z

The 1.0 versioning was a misnomer IMO, as the version of the driver went along with the CSI spec version that it supported (it should have been a version for the driver, not the spec version).

The problem is larger than the driver name in this case, we are changing the on-disk rbd image name and also the passed back volumeID. Thus, mutating older PVs to newer ones, requires rbd changes (image names and rados maps) and also PV spec changes, to change the volumeID and related information.

kfox1111 · 2019-04-29T17:22:05Z

I asked when backwards compatability would be honoured, and it sounded like the answer was, once the driver hit 1.0. So I thought 1.0 had a meaning there.

I too have seen potentially breaking changes, but thus far they could be maintained unbroken by telling the chart to use the old name. I was planning on trying to block further breaking changes without migration plans.

I'm still looking for a migration plan for allowing renaming the driver to move forward.

For those running 1.0 and above, there is a non breaking plan. for 0.3 -> 1.0+, there isn't currently a plan.

ShyamsundarR · 2019-04-29T17:44:50Z

I did not fully understand this comment, hence asking for clarifications.

I asked when backwards compatability would be honoured, and it sounded like the answer was, once the driver hit 1.0. So I thought 1.0 had a meaning there.

Could you point to the context in which this was answered. I am wondering if I answered this in some way or if this was before the current breaking changes under review.

I too have seen potentially breaking changes, but thus far they could be maintained unbroken by telling the chart to use the old name. I was planning on trying to block further breaking changes without migration plans.

I see that using the old name, would prevent any breakage till now, and the name is configurable, hence entirely feasible to use the older name as a solution to the driver name change in the code.

Further, as per my understanding I thought and still think we are in pre-1.0 space, and hence we were making breaking changes to the code. Although, we stopped this one, as that would have required further PV spec edits had it got in.

I'm still looking for a migration plan for allowing renaming the driver to move forward.

For those running 1.0 and above, there is a non breaking plan. for 0.3 -> 1.0+, there isn't currently a plan.

Even for 1.0 -> 1.0+ (or maybe 2.0, depends on what it is versioned) it would be a breaking plan once we merge the on-disk changes. The migration plan (barring "PV upgrades" or ability to edit PV meta-data) would be to change up the names as in this comment which was tested with 1.0+ breaking changes in place.

So we need to sort this out someway to allow for the breaking changes, such that we can support this more elegantly in the future.

Tagging @rootfs @gman0 for their comments or observations.

kfox1111 · 2019-04-29T18:13:39Z

I think the discussion was offline with some folks. So, sorry I don't have history of it. :(

#312 has not merged. This change in particular was the one I had in mind when I mentioned trying to block such changes unless there is a clear, reasonable migration plan. It Being under review isn't necessarily a bad thing as it allows progress to be made. But merging without a migration plan would be very bad/damaging to users.

ShyamsundarR · 2019-04-29T19:30:56Z

Indeed, this is what we did to migrate the pv.
However, there is still few roadblocks on the roard.
1- we have been unable to install both driver together. Either it's 1.0, or 0.3 not both. The csinodeinfo/driverregistrar features whatever barfs with the old driver registration.

I did my initial experiments with 1.0 and 1.0+ code, hence repeated part of it (i.e running 2 CSI instances) with 0.3 today. I am able to run both instances without much trouble. The changes are,

Driver name should be different
Namespace where the CSI pods are running is different
Socket path is different (like the driver names)

A patch to the v1.0 helm chart looks like this to make it run as above.
Kubernetes version: 1.13.5

2- the pv must be migrated with all the applications using the PV down (i.e. scaled to 0). Quite a big blackout.
3- Last, once the new PV have been created with the old named, we saw that deleting the PVC cleans up all relative ressources in kubernetes but forgets to effectivelment delete the rbd.

Is the delete not working against the 0.3 version of the CSI Plugin? Request further details to enable testing the same. Thanks.

Isn't there a simpler way of migrating thing like a mutatingwebhook to change the pv ?

I was pointing to the csi-translation-library since one seems to be able to migrate old style PV to the new CSI without all those steps and I was wondering how they managed to do it ?
They seems to be able to mutate the nodes to the new specs in place.

The translation library transforms the PV parameters, but helps when the older PV is from an in-tree provisioner (based on Kubernetes code reading at present). Further, with the future scheme of VolumeID based encoding the RBD image and cluster, such a simple transformation is not feasible (for example we need to feed the transformation engine details about the Ceph cluster and pool IDs etc. for it to work).

This is an initial analysis of the csi translation based method.

sbskas · 2019-09-06T15:23:21Z

I'm closing the issue since we no longer are using the ceph-csi 0.3 and ceph-csi team decided to not support ceph-csi 0.3 -> ceph-csi 1.x.x migration.

Syncing latest changes from upstream devel for ceph-csi

ShyamsundarR mentioned this issue Apr 16, 2019

Removed config maps and replaced with rados omaps #312

Merged

ShyamsundarR mentioned this issue May 8, 2019

Update deploy to build canary images for 1.0 #347

Merged

ShyamsundarR mentioned this issue May 17, 2019

Backward compatibility and migration to 1.1 volumes from existing 1.0 versions #378

Closed

sbskas closed this as completed Sep 6, 2019

Madhu-1 pushed a commit to Madhu-1/ceph-csi that referenced this issue Jun 20, 2024

Merge pull request ceph#296 from red-hat-storage/sync_us--devel

9481a14

Syncing latest changes from upstream devel for ceph-csi

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to migrate from csi-driver 0.3 -> csi-driver csi-1.0 #296

How to migrate from csi-driver 0.3 -> csi-driver csi-1.0 #296

sbskas commented Mar 30, 2019 •

edited

Loading

Madhu-1 commented Apr 1, 2019

ShyamsundarR commented Apr 2, 2019

sbskas commented Apr 3, 2019

ShyamsundarR commented Apr 23, 2019

ShyamsundarR commented Apr 24, 2019

kfox1111 commented Apr 24, 2019

sbskas commented Apr 25, 2019

kfox1111 commented Apr 25, 2019

kfox1111 commented Apr 25, 2019

ShyamsundarR commented Apr 25, 2019

kfox1111 commented Apr 25, 2019

JohnStrunk commented Apr 25, 2019

ShyamsundarR commented Apr 26, 2019 •

edited

Loading

kfox1111 commented Apr 26, 2019

sbskas commented Apr 27, 2019

kfox1111 commented Apr 29, 2019

msau42 commented Apr 29, 2019

kfox1111 commented Apr 29, 2019

msau42 commented Apr 29, 2019

kfox1111 commented Apr 29, 2019

msau42 commented Apr 29, 2019

msau42 commented Apr 29, 2019

kfox1111 commented Apr 29, 2019

ShyamsundarR commented Apr 29, 2019

kfox1111 commented Apr 29, 2019

ShyamsundarR commented Apr 29, 2019

kfox1111 commented Apr 29, 2019 •

edited

Loading

ShyamsundarR commented Apr 29, 2019

sbskas commented Sep 6, 2019

How to migrate from csi-driver 0.3 -> csi-driver csi-1.0 #296

How to migrate from csi-driver 0.3 -> csi-driver csi-1.0 #296

Comments

sbskas commented Mar 30, 2019 • edited Loading

* spec.persistentvolumesource: Forbidden: is immutable after creation"

Madhu-1 commented Apr 1, 2019

ShyamsundarR commented Apr 2, 2019

sbskas commented Apr 3, 2019

ShyamsundarR commented Apr 23, 2019

ShyamsundarR commented Apr 24, 2019

kfox1111 commented Apr 24, 2019

sbskas commented Apr 25, 2019

kfox1111 commented Apr 25, 2019

kfox1111 commented Apr 25, 2019

ShyamsundarR commented Apr 25, 2019

kfox1111 commented Apr 25, 2019

JohnStrunk commented Apr 25, 2019

ShyamsundarR commented Apr 26, 2019 • edited Loading

kfox1111 commented Apr 26, 2019

sbskas commented Apr 27, 2019

kfox1111 commented Apr 29, 2019

msau42 commented Apr 29, 2019

kfox1111 commented Apr 29, 2019

msau42 commented Apr 29, 2019

kfox1111 commented Apr 29, 2019

msau42 commented Apr 29, 2019

msau42 commented Apr 29, 2019

kfox1111 commented Apr 29, 2019

ShyamsundarR commented Apr 29, 2019

kfox1111 commented Apr 29, 2019

ShyamsundarR commented Apr 29, 2019

kfox1111 commented Apr 29, 2019 • edited Loading

ShyamsundarR commented Apr 29, 2019

sbskas commented Sep 6, 2019

sbskas commented Mar 30, 2019 •

edited

Loading

ShyamsundarR commented Apr 26, 2019 •

edited

Loading

kfox1111 commented Apr 29, 2019 •

edited

Loading