To run software such as MySQL or Elasticsearch, it would be nice to use local fast storages and form a cluster to replicate data between servers.
TopoLVM provides a storage driver for such software running on Kubernetes.
- Use LVM for flexible volume capacity management.
- Enhance the scheduler to prefer nodes having a larger storage capacity.
- Support dynamic volume provisioning from PVC.
- Support volume resizing (resizing for CSI becomes beta in Kubernetes 1.16).
- Prefer nodes with less IO usage.
- Support volume snapshot.
topolvm-controller
: CSI controller service.topolvm-scheduler
: A scheduler extender for TopoLVM.topolvm-node
: CSI node service.lvmd
: gRPC service to manage LVM volumes.
Blue arrows in the diagram indicate communications over unix domain sockets. Red arrows indicate communications over TCP sockets.
TopoLVM is a storage plugin based on CSI. Therefore, the architecture basically follows the one described in https://kubernetes-csi.github.io/docs/ .
To manage LVM, lvmd
should be run as a system service of the node OS.
It provides gRPC services via UNIX domain socket to create/update/delete
LVM logical volumes and watch a volume group status.
topolvm-node
implements CSI node services as well as miscellaneous control
on each Node. It communicates with lvmd
to watch changes in free space
of a volume group and exports the information by annotating Kubernetes
Node
resource of the running node. In the mean time, it adds a finalizer
to the Node
to cleanup PersistentVolumeClaims (PVC) bound on the node.
topolvm-node
also works as a custom Kubernetes controller to implement
dynamic volume provisioning. Details are described in the following sections.
topolvm-controller
implements CSI controller services. It also works as
a custom Kubernetes controller to implement dynamic volume provisioning and
resource cleanups.
topolvm-scheduler
is a scheduler extender to extend the
standard Kubernetes scheduler for TopoLVM.
To extend the standard scheduler, TopoLVM components work together as follows:
topolvm-node
exposes free storage capacity astopolvm.cybozu.com/capacity
annotation of each Node.topolvm-controller
works as a mutating webhook for new Pods.- It adds
topolvm.cybozu.com/capacity
resource to the first container of a pod. - The value is the sum of the storage capacity requests of all unbound TopoLVM PVC referenced by the pod.
- It adds
topolvm-scheduler
filters and scores Nodes for a new pod havingtopolvm.cybozu.com/capacity
resource request.- Nodes having less capacity than requested are filtered.
- Nodes having larger capacity are scored higher.
To support dynamic volume provisioning, CSI controller service need to create a
logical volume on remote target nodes. In general, CSI controller runs on a
different node from the target node of the volume. To allow communication
between CSI controller and the target node, TopoLVM uses a custom resource
called LogicalVolume
.
Dynamic provisioning depends on CSI external-provisioner
sidecar container.
external-provisioner
finds a new unbound PersistentVolumeClaim (PVC) for TopoLVM.external-provisioner
calls CSI controller'sCreateVolume
with the topology key of the target node.topolvm-controller
creates aLogicalVolume
with the topology key and capacity of the volume.topolvm-node
on the target node finds theLogicalVolume
.topolvm-node
sends a volume create request tolvmd
.lvmd
creates an LVM logical volume as requested.topolvm-node
updates the status ofLogicalVolume
.topolvm-controller
finds the updated status ofLogicalVolume
.topolvm-controller
sends the success (or failure) toexternal-provisioner
.external-provisioner
creates a PersistentVolume (PV) and binds it to the PVC.
TopoLVM depends on Kubernetes deeply. Portability to other container orchestrators (CO) is not considered.
lvmd
is provided as a single executable.
Users needs to deploy lvmd
manually by themselves.
Other components, as well as CSI sidecar containers, are provided in a single Docker container image, and is deployed as Kubernetes objects.