A shared file system is a collection of resources and services that work together to serve a files for multiple users across multiple clients. Rook will automate the configuration of the Ceph resources and services that are necessary to start and maintain a highly available, durable, and performant shared file system.
A Rook storage cluster must be configured and running in Kubernetes. In this example, it is assumed the cluster is in the rook
namespace.
When the storage admin is ready to create a shared file system, he will specify his desired configuration settings in a yaml file such as the following filesystem.yaml
. This example is a simple configuration with metadata that is replicated across different hosts, and the data is erasure coded across multiple devices in the cluster. One active MDS instance is started, with one more MDS instance started in standby mode.
apiVersion: ceph.rook.io/v1
kind: Filesystem
metadata:
name: myfs
namespace: rook-ceph
spec:
metadataPool:
replicated:
size: 3
dataPools:
- erasureCoded:
dataChunks: 2
codingChunks: 1
metadataServer:
activeCount: 1
activeStandby: true
Now create the file system.
kubectl create -f filesystem.yaml
At this point the Rook operator recognizes that a new file system needs to be configured. The operator will create all of the necessary resources.
- The metadata pool is created (
myfs-meta
) - The data pools are created (only one data pool for the example above:
myfs-data0
) - The Ceph file system is created with the name
myfs
- If multiple data pools were created, they would be added to the file system
- The file system is configured for the desired active count of MDS (
max_mds
=3) - A Kubernetes deployment is created to start the MDS pods with the settings for the file system. Twice the number of instances are started as requested for the active count, with half of them in standby.
After the MDS pods start, the file system is ready to be mounted.
The file system settings are exposed to Rook as a Custom Resource Definition (CRD). The CRD is the Kubernetes-native means by which the Rook operator can watch for new resources. The operator stays in a control loop to watch for a new file system, changes to an existing file system, or requests to delete a file system.
The pools are the backing data store for the file system and are created with specific names to be private to a file system. Pools can be configured with all of the settings that can be specified in the Pool CRD. The underlying schema for pools defined by a pool CRD is the same as the schema under the metadataPool
element and the dataPools
elements of the file system CRD.
metadataPool:
replicated:
size: 3
dataPools:
- replicated:
size: 3
- erasureCoded:
dataChunks: 2
codingChunks: 1
Multiple data pools can be configured for the file system. Assigning users or files to a pool is left as an exercise for the reader with the CephFS documentation.
The metadata server settings correspond to the MDS service.
activeCount
: The number of active MDS instances. As load increases, CephFS will automatically partition the file system across the MDS instances. Rook will create double the number of MDS instances as requested by the active count. The extra instances will be in standby mode for failover.activeStandby
: If true, the extra MDS instances will be in active standby mode and will keep a warm cache of the file system metadata for faster failover. The instances will be assigned by CephFS in failover pairs. If false, the extra MDS instances will all be on passive standby mode and will not maintain a warm cache of the metadata.placement
: The mds pods can be given standard Kubernetes placement restrictions withnodeAffinity
,tolerations
,podAffinity
, andpodAntiAffinity
similar to placement defined for daemons configured by the cluster CRD.
metadataServer:
activeCount: 1
activeStandby: true
placement:
In Ceph Luminous, multiple file systems is still considered an experimental feature. While Rook seamlessly enables this scenario, be aware of the issues in the CephFS docs with snapshots and security implications.
For a description of the underlying Ceph data model, see the CephFS Terminology.