-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CephFS volumes handling via PVCs (dynamic provisioning) #1125
Comments
I think this is something we should do and aligns well with our block scenario. But one thing about Cephfs is that it allows you to provide "path" so that you can mount a path within the cephfs on a pod. In kubernetes, mount options are only given during provisioner. So I am not sure how we could pass the
But I don't think something like this is supported in K8s. |
@kokhang From what I understand is that the Example: an application having Also having one CephFS "per claim" would blow up the Ceph PG count too (From what I know about Ceph there is a "not enough PGs on a node" and "too many PGs on a node"). |
What if we can have just one cephFS and we share that? So every claim will just use a path. |
Is it possible to have one CephFS (the same way that we have one 'rook-block', ie: 'rook-filesystem') that has multiple PVCs inside of it, which in turn are mounted by a deployment? That deployment can then give it a path in its volumeMount for the volume. |
No, looks like it's one CephFS per PV but I feel that the issue with PGs would happen with or without rook. |
Submitted a design proposal to #1152. Please chime in there. |
We also want this feature. I think the |
Would also like to see this. We're running kubernetes inside a docker container (using a docker-in-docker solution) for a clean setup/teardown and rapid on premise deployment. This causes issues with mounting the RBDs to the pods as the libceph used in the kernel expects the context to be the initial network namespace and --net=host is not an option. |
We are using this external CephFS provisioner, which is ideal from our point of view. And something like this included in Rook would be great. https://github.com/kubernetes-incubator/external-storage/tree/master/ceph/cephfs In response to a PVC for the provisioner's StorageClass, the provisioner creates a Ceph user and PV which maps to a directory (named after the PV ID/name) in a shared, underlying CephFS. This existing implementation matches what @kokhang describes above, except per-PVC Ceph users are created for extra separation over just the mount path. These PVCs are ReadWriteMany mountable so can be shared between Pods. And deleting the PVC causes the provisioner to delete (or optionally retain) the folder in CephFS. Pods only get their PVC's directory mounted, so they can't see or access other directories. Very clean, very self-managing. So when you have 100's of small PVC requirements, they become small directories on a decent-sized shared CephF, rather than 100's of separate file systems or block devices to manage. You can horizontally scale the provisioner. If you have multiple Ceph clusters or CephFS instances, you can deploy the provisioner multiple times (with a different election ID and StorageClass). This Before we had these external provisioners we use to directly mount directories (NFS, FlexVolume) in shared filesystems. We had to manage the directory structure and clean up the directories ourselves. It was doable, but a lot more hassle and more error prone. |
I believe we'd prefer to move to CSI support (https://github.com/ceph/ceph-csi ) than the external provisioner? |
@mykaul sure, that's an implementation decision for the project to make. CSI is certainly a forward-looking option. As a user, what I am looking for is the provisioning experience I describe above. In particular, the efficient management of large numbers of small PVC requirements. In a storage sub-system agnostic manner. |
fwiw, I was just doing some research on possible shared file system options, I knew that rook supported ceph block dynamic provisioning and was assuming it also had same support for filesystem, was surprised it was not available |
I'm really interested in this feature because many helm chart (eg: jenkins) ask for existing persistent volume claim name and not for flexvolume. |
@bordeo - just for my curiosity, why would you use CephFS and not RBD for Jenkins jobs? |
Cause we need to share workspace across job pods that can run on multiple nodes. RBD is ReadWriteOnce or I'm wrong? |
@bordeo - no, you are not wrong - that's a good use case. I was just wondering as many times Jenkins jobs do not share workspace and then performance is quite likely be better using their own RBD backed XFS file system. |
So just to clear something up: rook/cephfs works just fine via PVCs, and has done so for approximately ever. Here's a rook/cephfs PVC I use daily (from parallel workers in a complex jenkins job as it happens): apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: oe-scratch
namespace: jenkins
spec:
accessModes:
- ReadWriteMany
selector:
matchLabels:
name: oe-scratch
storageClassName: rook-cephfs The PVC can also be created from within a StatefulSet What doesn't work (right now) with rook/cephfs is dynamic provisioning. ie: kubernetes can't take the above and magically create the underlying PV from a apiVersion: v1
kind: PersistentVolume
metadata:
labels:
name: oe-scratch
name: oe-scratch
spec:
accessModes:
- ReadWriteMany
capacity:
storage: 200Gi
flexVolume:
driver: ceph.rook.io/rook
fsType: ceph
options:
clusterNamespace: rook-ceph
fsName: ceph-filesystem
persistentVolumeReclaimPolicy: Recycle
storageClassName: rook-cephfs Note the So: rook/cephfs supports PVCs just fine, since PVCs are a volume-type-agnostic feature within kubernetes. What rook/cephfs does not (yet) support is "dynamic provisioning" of PVs, and new rook/cephfs PVs need to be explicitly created. See whereisaaron's comment above for a "3rd party" dynamic provisioning solution that allows carving up a single (statically provisioned) cephfs PV into multiple smaller PVs (using subdirectories). Honestly, that's about as good as it could be within this space unless you need truly separated cephfs volumes (perhaps to support different underlying cephfs options). See https://kubernetes.io/docs/concepts/storage/persistent-volumes/ for a more in-depth discussion. |
Hi All I use actually the projet : https://github.com/kubernetes-incubator/external-storage/tree/master/ceph/cephfs So actually each PVC created by kuberbenes is a folder on the cephfs but there is is no option to enable quotas on the cephfs folders to limit the usage on the size defined on the PVC. Quota is a feature ever available on ceph. So did you plan to implement the limit of PVC on cephfs with the quotas ? I know that you wait CSI plugin integration to allow resizing of RBD. |
@konvergence can you share the manifests to build a setup like yours? Would be very helpful since we're running in the same issue. |
One interesting note about the external provisioner: I noticed that the On the other hand, the manually created PV given by @anguslees above uses FlexVolume. They both seem to work okay, not sure which one is preferred though. |
What's the progress of this feature? It's been nearly 2 years since it was picked up, and though there seems to have been initial momentum it doesn't appear to have had development done on it. Can we track the progress somewhere? |
FWIW, there's an open PR to more or less bring There's some discussion over there about whether to finish that PR or wait for the CSI driver, which would have dynamic provisioning. From that thread:
I'm hoping that this solution works the same as I'd love to see this added to the roadmap (in a way that specifically includes dynamic provisioning). For example, #2650 is a 1.1 roadmap item but it doesn't seem to include the actual provisioner. @travisn do you happen to know the status of this? Thanks! |
@thomasjm Thank you for digging that one up. That does seem to imply things are making progress one way or another. Perhaps I'll hold off using it until 1.1 is released. |
Since #3562 merged in rook master, rook master uses Ceph CSI by default to dynamically provision PVs backed by CephFS. Documentation is here, @travisn , should this issue be open until rook v1.1 is released? |
Thinking about it more, as Ceph CSI is a way to have dynamic provisioning of CephFS and available in Rook v1.1 we can close this issue. #3264 might bring Flexvolume support for dynamic provisioning as well in the future. |
Per #1115, it'd be interesting if CephFS volumes could be handled by storage classes and PVCs, I think this would make it a lot easier to deploy these volumes to helm charts for instance where people would just need to set the
storageClass
to something likerook-filesystem
and then theaccessModes
toReadWriteMany
rather than hacking in YAML which is specific to Rook itself.The text was updated successfully, but these errors were encountered: