Skip to content

Latest commit

 

History

History
352 lines (265 loc) · 15.4 KB

File metadata and controls

352 lines (265 loc) · 15.4 KB
assignees
jsafrane
mikedanese
saad-ali
thockin

This document describes the current state of PersistentVolumes in Kubernetes. Familiarity with volumes is suggested.

  • TOC {:toc}

Introduction

Managing storage is a distinct problem from managing compute. The PersistentVolume subsystem provides an API for users and administrators that abstracts details of how storage is provided from how it is consumed. To do this we introduce two new API resources: PersistentVolume and PersistentVolumeClaim.

A PersistentVolume (PV) is a piece of networked storage in the cluster that has been provisioned by an administrator. It is a resource in the cluster just like a node is a cluster resource. PVs are volume plugins like Volumes, but have a lifecycle independent of any individual pod that uses the PV. This API object captures the details of the implementation of the storage, be that NFS, iSCSI, or a cloud-provider-specific storage system.

A PersistentVolumeClaim (PVC) is a request for storage by a user. It is similar to a pod. Pods consume node resources and PVCs consume PV resources. Pods can request specific levels of resources (CPU and Memory). Claims can request specific size and access modes (e.g, can be mounted once read/write or many times read-only).

While PersistentVolumeClaims allow a user to consume abstract storage resources, it is common that users need PersistentVolumes with varying properties, such as performance, for different problems. Cluster administrators need to be able to offer a variety of PersistentVolumes that differ in more ways than just size and access modes, without exposing users to the details of how those volumes are implemented. For these needs there is the StorageClass resource.

A StorageClass provides a way for administrators to describe the "classes" of storage they offer. Different classes might map to quality-of-service levels, or to backup policies, or to arbitrary policies determined by the cluster administrators. Kubernetes itself is unopinionated about what classes represent. This concept is sometimes called "profiles" in other storage systems.

Please see the detailed walkthrough with working examples.

Lifecycle of a volume and claim

PVs are resources in the cluster. PVCs are requests for those resources and also act as claim checks to the resource. The interaction between PVs and PVCs follows this lifecycle:

Provisioning

A cluster administrator will create a number of PVs. They carry the details of the real storage which is available for use by cluster users. They exist in the Kubernetes API and are available for consumption.

Binding

A user creates a PersistentVolumeClaim with a specific amount of storage requested and with certain access modes. A control loop in the master watches for new PVCs, finds a matching PV (if possible), and binds them together. The user will always get at least what they asked for, but the volume may be in excess of what was requested. Once bound, PersistentVolumeClaim binds are exclusive, regardless of the mode used to bind them.

Claims will remain unbound indefinitely if a matching volume does not exist. Claims will be bound as matching volumes become available. For example, a cluster provisioned with many 50Gi PVs would not match a PVC requesting 100Gi. The PVC can be bound when a 100Gi PV is added to the cluster.

Using

Pods use claims as volumes. The cluster inspects the claim to find the bound volume and mounts that volume for a pod. For volumes which support multiple access modes, the user specifies which mode desired when using their claim as a volume in a pod.

Once a user has a claim and that claim is bound, the bound PV belongs to the user for as long as they need it. Users schedule Pods and access their claimed PVs by including a persistentVolumeClaim in their Pod's volumes block. See below for syntax details.

Releasing

When a user is done with their volume, they can delete the PVC objects from the API which allows reclamation of the resource. The volume is considered "released" when the claim is deleted, but it is not yet available for another claim. The previous claimant's data remains on the volume which must be handled according to policy.

Reclaiming

The reclaim policy for a PersistentVolume tells the cluster what to do with the volume after it has been released of its claim. Currently, volumes can either be Retained, Recycled or Deleted. Retention allows for manual reclamation of the resource. For those volume plugins that support it, deletion removes both the PersistentVolume object from Kubernetes as well as deletes associated storage asset in external infrastructure such as AWS EBS, GCE PD or Cinder volume. If supported by appropriate volume plugin, recycling performs a basic scrub (rm -rf /thevolume/*) on the volume and makes it available again for a new claim.

Types of Persistent Volumes

PersistentVolume types are implemented as plugins. Kubernetes currently supports the following plugins:

  • GCEPersistentDisk
  • AWSElasticBlockStore
  • AzureFile
  • FC (Fibre Channel)
  • NFS
  • iSCSI
  • RBD (Ceph Block Device)
  • CephFS
  • Cinder (OpenStack block storage)
  • Glusterfs
  • VsphereVolume
  • HostPath (single node testing only -- local storage is not supported in any way and WILL NOT WORK in a multi-node cluster)

Persistent Volumes

Each PV contains a spec and status, which is the specification and status of the volume.

  apiVersion: v1
  kind: PersistentVolume
  metadata:
    name: pv0003
  spec:
    capacity:
      storage: 5Gi
    accessModes:
      - ReadWriteOnce
    persistentVolumeReclaimPolicy: Recycle
    nfs:
      path: /tmp
      server: 172.17.0.2

Capacity

Generally, a PV will have a specific storage capacity. This is set using the PV's capacity attribute. See the Kubernetes Resource Model to understand the units expected by capacity.

Currently, storage size is the only resource that can be set or requested. Future attributes may include IOPS, throughput, etc.

Access Modes

A PersistentVolume can be mounted on a host in any way supported by the resource provider. As shown in the table below, providers will have different capabilities and each PV's access modes are set to the specific modes supported by that particular volume. For example, NFS can support multiple read/write clients, but a specific NFS PV might be exported on the server as read-only. Each PV gets its own set of access modes describing that specific PV's capabilities.

The access modes are:

  • ReadWriteOnce -- the volume can be mounted as read-write by a single node
  • ReadOnlyMany -- the volume can be mounted read-only by many nodes
  • ReadWriteMany -- the volume can be mounted as read-write by many nodes

In the CLI, the access modes are abbreviated to:

  • RWO - ReadWriteOnce
  • ROX - ReadOnlyMany
  • RWX - ReadWriteMany

Important! A volume can only be mounted using one access mode at a time, even if it supports many. For example, a GCEPersistentDisk can be mounted as ReadWriteOnce by a single node or ReadOnlyMany by many nodes, but not at the same time.

Volume Plugin ReadWriteOnce ReadOnlyMany ReadWriteMany
AWSElasticBlockStore x - -
AzureFile x x x
CephFS x x x
Cinder x - -
FC x x -
FlexVolume x x -
GCEPersistentDisk x x -
Glusterfs x x x
HostPath x - -
iSCSI x x -
NFS x x x
RDB x x -
VsphereVolume x - -

Recycling Policy

Current recycling policies are:

  • Retain -- manual reclamation
  • Recycle -- basic scrub ("rm -rf /thevolume/*")
  • Delete -- associated storage asset such as AWS EBS, GCE PD or OpenStack Cinder volume is deleted

Currently, only NFS and HostPath support recycling. AWS EBS, GCE PD and Cinder volumes support deletion.

Phase

A volume will be in one of the following phases:

  • Available -- a free resource that is not yet bound to a claim
  • Bound -- the volume is bound to a claim
  • Released -- the claim has been deleted, but the resource is not yet reclaimed by the cluster
  • Failed -- the volume has failed its automatic reclamation

The CLI will show the name of the PVC bound to the PV.

PersistentVolumeClaims

Each PVC contains a spec and status, which is the specification and status of the claim.

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: myclaim
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 8Gi
  selector:
    matchLabels:
      release: "stable"
    matchExpressions:
      - {key: environment, operator: In, values: [dev]}

Access Modes

Claims use the same conventions as volumes when requesting storage with specific access modes.

Resources

Claims, like pods, can request specific quantities of a resource. In this case, the request is for storage. The same resource model applies to both volumes and claims.

Selector

Claims can specify a label selector to further filter the set of volumes. Only the volumes whose labels match the selector can be bound to the claim. The selector can consist of two fields:

  • matchLabels - the volume must have a label with this value
  • matchExpressions - a list of requirements made by specifying key, list of values, and operator that relates the key and values. Valid operators include In, NotIn, Exists, and DoesNotExist.

All of the requirements, from both matchLabels and matchExpressions are ANDed together – they must all be satisfied in order to match.

Claims As Volumes

Pods access storage by using the claim as a volume. Claims must exist in the same namespace as the pod using the claim. The cluster finds the claim in the pod's namespace and uses it to get the PersistentVolume backing the claim. The volume is then mounted to the host and into the pod.

kind: Pod
apiVersion: v1
metadata:
  name: mypod
spec:
  containers:
    - name: myfrontend
      image: dockerfile/nginx
      volumeMounts:
      - mountPath: "/var/www/html"
        name: mypd
  volumes:
    - name: mypd
      persistentVolumeClaim:
        claimName: myclaim

A Note on Namespaces

PersistentVolumes binds are exclusive, and since PersistentVolumeClaims are namespaced objects, mounting claims with "Many" modes (ROX, RWX) is only possible within one namespace.

StorageClasses

Each StorageClass contains the fields provisioner and parameters, which are used when a PersistentVolume belonging to a class needs to be dynamically provisioned.

The name of a StorageClass object is significant, and is how users can request a particular class. Administrators set the name and other parameters of a class, all of which are opaque to users, when first creating StorageClass objects, and the objects cannot be updated once they are created.

Users don't necessarily have to request a StorageClass in their PVC. The cluster treats PVCs that don't request a StorageClass differently depending on whether the SimpleDefaultStorageClassForPVC admission controller is turned on.

  • If the admission controller is turned on, the administrator may specify a default StorageClass. All PVCs that don't request a StorageClass will be bound only to PVs of that default. Specifying a default StorageClass is done by setting the annotation storageclass.beta.kubernetes.io/is-default-class equal to "true" in a StorageClass object. If the administrator does not specify a default, the cluster will respond to PVC creation as if the admission controller were turned off. If more than one default is specified, the admission controller will forbid the creation of all PVCs.
  • If the admission controller is turned off, the claim may be bound to any volume that does not belong to a StorageClass or that has a blank "" value for its volume.beta.kubernetes.io/storage-class annotation.
kind: StorageClass
apiVersion: extensions/v1beta1
metadata:
  name: standard
provisioner: kubernetes.io/aws-ebs
parameters:
  type: gp2

Provisioner

Storage classes have a provisioner that determines what volume plugin is used for provisioning PVs. This field must be specified. During beta, the available provisioner types are kubernetes.io/aws-ebs and kubernetes.io/gce-pd.

Parameters

Storage classes have parameters that describe volumes belonging to the storage class. Different parameters may be accepted depending on the provisioner. For example, the value io1, for the parameter type, and the parameter iopsPerGB are specific to EBS. When a parameter is omitted, some default is used.

AWS

kind: StorageClass
apiVersion: extensions/v1beta1
metadata:
  name: slow
provisioner: kubernetes.io/aws-ebs
parameters:
  type: io1
  zone: us-east-1d
  iopsPerGB: "10"
  • type: io1, gp2, sc1, st1. See AWS docs for details. Default: gp2.
  • zone: AWS zone. If not specified, a random zone in the same region as controller-manager will be chosen.
  • iopsPerGB: only for io1 volumes. I/O operations per second per GiB. AWS volume plugin multiplies this with size of requested volume to compute IOPS of the volume and caps it at 20 000 IOPS (maximum supported by AWS, see AWS docs).

GCE

kind: StorageClass
apiVersion: extensions/v1beta1
metadata:
  name: slow
provisionerType: kubernetes.io/gce-pd
provisionerParameters:
  type: pd-standard
  zone: us-central1-a
  • type: pd-standard or pd-ssd. Default: pd-ssd
  • zone: GCE zone. If not specified, a random zone in the same region as controller-manager will be chosen.

Requesting a StorageClass

Users request a particular StorageClass, i.e. a PersistentVolume that belongs to and has the characteristics described by the StorageClass, by specifying the name of the class in their PersistentVolumeClaim using the annotation volume.beta.kubernetes.io/storage-class. Only a PV of the requested class, one with the same annotation as the PVC, can then be bound to the PVC.

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: myclaim
  annotations:
    "volume.beta.kubernetes.io/storage-class": "slow"
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi

If a PV of the requested class is not found, a new one will be provisioned using the StorageClass.

When a PVC specifies a LabelSelector in addition to requesting a StorageClass, the requirements are ANDed together: only a PV of the requested class and with the requested labels may be bound to the PVC.