-
Notifications
You must be signed in to change notification settings - Fork 547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CephFS] Create a CSI volume from a snapshot #411
Comments
@ajarr can you please share your view on this ? |
The plan is to have the volumes plugin add a command to recursive copy another subvolume@snapshot to a new subvolume. The interface is still TBD. What do you think? |
Could you please explain some more on the Most of the storage systems has such interface and one example would be |
@humblec , no. not available atm. |
Thanks for confirming, we need this functionality too. |
The ceph-mgr "volumes" plugin that @ajarr is working on.
It would probably look like |
Oh. OK, thanks for clarifying.
Does above command can be reused, if we want to clone a subvol from an existing one ? or is the plan to do it with a seperate command?. If |
We'd like to be able to use this directly from the OpenStack Manila driver as well. This would enable both (1) Manila CephFS (and CephFS with NFS) create-share-from snapshot support, and (2) manila-CSI (recently merged by gman0 in k8s/cloud-provider-openstack) support for creating a new volume from snapshot. |
A clone in CSI is created using the CreateVolume call, with a VolumeContentSource as the snapshot to clone from. If we perform a recursive copy of data when such a CreateVolume is invoked, it would very likely take time, causing the caller to time out on the CreateVolume call, and try again. The ceph-csi behaviour in this case would be, for the subsequent retries, to block on a mutex that the first call still holds. Thus delaying a response to the retry as well (based on time taken to copy). I see that the container orchestrator, would possibly retry this call a few times, depending on the time taken to copy the data. Also, the time it would take is not deterministic. This sounds like a poor choice to implement this feature. I would hence ask what the urgency is to implement clone for CephFS subvolumes and if we should really be waiting for native clone support from CephFS instead? |
Well this is troublesome.
Native clone is complex and simply will not be done anytime soon. The urgency surrounding this feature is to eliminate the feature gap between CephFS and RBD volumes. Is there no way we can block volume creation in CSI somewhere in the chain of operations of provisioning/using a volume? |
Just to be clear, the first call to clone will time out on the caller end (the client end), the Ceph-CSI plugin (the server end) would still progress and (say) wait on a response to the clone from ceph, so eventually the first call will be complete, and hence the clone would be created. The server on one of the subsequent retries would finally detect that the clone was complete and respond in time to the caller. This detection would just look at the CSI RADOS OMaps to detect the existence of the clone and respond in relatively constant time, but this will happen only after the first call is completed by the server. IOW, there is no mechanism at present to start a clone job and check its status later in CSI CreateVolume.This would have meant much better semantics than what is happening currently. Maybe we can build such semantics between the plugin and the call to the ceph-mgr volumes plugin? |
We can build those semantics in a new call like:
It would just work in the background and return immediately. Then just build another call to check the status:
or similar. Would htat work? |
I completely agree to @batrick here and the urgency is really high!!! We have to sort this out asap. Even if CSI timeout, the subsequent calls is going to pick it up and thats the design for most of the CSI calls. Secondly, this is not specific to ceph, other storage systems behaves the same way, most of the storage operations are heavy.
ceph subvolume clone ... It would just work in the background and return immediately. Then just build another call to check the status: ceph subvolume info ... or similar. Would htat work? That works @batrick ! @batrick considering this is an important feature to have and to satisfy requirement of many products, how could we expedite or track the progress? |
@nixpanic, this discussion might be interesting for you. |
There is one corner case where, when the clone by copy takes time, and the PVC that started the clone is deleted, may leak the clone against Ceph. On experimenting with induced sleep in a create volume call, such that it would never report success in time, I observed that the moment the PVC is deleted (it is stll in It does seem like a bug in kubernetes provisioner, as it stops attempting to get a success return from a volume creation if the source PVC is deleted before the PV is created. IOW, it does not attempt the create indefinitely. Other cases should be fine as per kubernetes provisioner documentation on timeouts and also as per timeouts in the spec state that the call would be tried indefinitely on timeouts. Note on possible alternative: It seems that the requirement in question that needs the clone feature could rather use rbd clones as long as rbd RWX volumes are supported (with the required caveats as in PR #261) |
@ajarr please summarize this discussion (with the links to the specs provided by @ShyamsundarR) and propose the new APIs in a tracker ticket.
There are ways to approach this so that cleanup is possible. The cloned(-ing) subvolume could be put in an isolated location until the final |
This is cloning a PVC from snapshot not cloning from PVC |
@joscollin can you confirm, it is possible to clone a volume from Snapshot now and it is available in cephfs upstream release ? if yes, which version of cephfs release has it ? |
Nautilus release status: https://ceph.io/releases/v14-2-8-nautilus-released, which has clone volume from the Snapshot. |
Moving it to release-v3.1.0 |
Closing this as the functionality is already available with #394 |
Describe the feature you'd like to have
We will be using ceph manager based volume provisioning from v1.1.0. Afaict, we only have a way to create snapshots from a volume. The support for cloning a volume from
existing snapshot looks to be unavailable. This is an important functionality which has to be supported in CSI driver.
This issue tracks this feature support.
The text was updated successfully, but these errors were encountered: