Skip to content

Latest commit

 

History

History
 
 

statefulsets

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

Kubernetes Stateful Containers using StatefulSets and Persistent Volumes

In this section, we will review how to launch and manage applications using StatefulSets and Persistent Volumes.

We will review how to deploy MySQL database using StatefulSets and EBS volumes. The example is a MySQL single-master topology with multiple slaves running asynchronous replication.

The example consists of ConfigMap, two MySQL services and a StatefulSet. We will deploy MySQL database, send some traffic to test connection status, go through few failure modes and review resiliency that is built into the StatefulSet. Lastly, we’ll demonstrate how to use scale options with StatefulSet.

Prerequisites

This chapter uses a cluster with 3 master nodes and 5 worker nodes as described here: multi-master, multi-node gossip based cluster.

All configuration files for this chapter are in the statefulsets directory. Make sure you change to that directory before giving any commands in this chapter.

Create ConfigMap

Using ConfigMap, you can independently control MySQL configuration. The ConfigMap looks like as shown:

apiVersion: v1
kind: ConfigMap
metadata:
  name: mysql-config
  labels:
    app: mysql
data:
  master.cnf: |
    # Apply this config only on the master.
    [mysqld]
    log-bin
  slave.cnf: |
    # Apply this config only on slaves.
    [mysqld]
    super-read-only

In this case, we are using master to serve replication logs to slave and slaves are read-only. Create the ConfigMap using the command shown:

$ kubectl create -f templates/mysql-configmap.yaml
configmap "mysql-config" created

Create Services

Create two headless services using the following configuration:

# Headless service for stable DNS entries of StatefulSet members.
apiVersion: v1
kind: Service
metadata:
  name: mysql
  labels:
    app: mysql
spec:
  ports:
  - name: mysql
    port: 3306
  clusterIP: None
  selector:
    app: mysql
---
# Client service for connecting to any MySQL instance for reads.
# For writes, you must instead connect to the master: mysql-0.mysql.
apiVersion: v1
kind: Service
metadata:
  name: mysql-read
  labels:
    app: mysql
spec:
  ports:
  - name: mysql
    port: 3306
  selector:
    app: mysql

The mysql service is used for DNS resolution so that when pods are placed by StatefulSet controller, pods can be resolved using <pod-name>.mysql. mysql-read is a client service that does load balancing for all slaves.

$ kubectl create -f templates/mysql-services.yaml
service "mysql" created
service "mysql-read" created

Only read queries can use the load-balanced mysql-read service. Because there is only one MySQL master, clients should connect directly to the MySQL master Pod, identified by mysql-0.mysql, to execute writes.

Create StatefulSet

Finally, we create StatefulSet using the configuration in templates/mysql-statefulset.yaml using the command shown:

$ kubectl create -f templates/mysql-statefulset.yaml
statefulset "mysql" created
$ kubectl get -w statefulset
NAME      DESIRED   CURRENT   AGE
mysql     3         1         8s
mysql     3         2         59s
mysql     3         3         2m
mysql     3         3         3m

In a different terminal window, wou can watch the progress of pods creation using the following command:

$ kubectl get pods -l app=mysql --watch
NAME      READY     STATUS     RESTARTS   AGE
mysql-0   0/2       Init:0/2   0          30s
mysql-0   0/2       Init:1/2   0         35s
mysql-0   0/2       PodInitializing   0         47s
mysql-0   1/2       Running   0         48s
mysql-0   2/2       Running   0         59s
mysql-1   0/2       Pending   0         0s
mysql-1   0/2       Pending   0         0s
mysql-1   0/2       Pending   0         0s
mysql-1   0/2       Init:0/2   0         0s
mysql-1   0/2       Init:1/2   0         35s
mysql-1   0/2       Init:1/2   0         45s
mysql-1   0/2       PodInitializing   0         54s
mysql-1   1/2       Running   0         55s
mysql-1   2/2       Running   0         1m
mysql-2   0/2       Pending   0         <invalid>
mysql-2   0/2       Pending   0         <invalid>
mysql-2   0/2       Pending   0         0s
mysql-2   0/2       Init:0/2   0         0s
mysql-2   0/2       Init:1/2   0         32s
mysql-2   0/2       Init:1/2   0         43s
mysql-2   0/2       PodInitializing   0         50s
mysql-2   1/2       Running   0         52s
mysql-2   2/2       Running   0         56s

Press Ctrl+C to stop watching. If you notice, the pods are initialized in an orderly fashion in their startup process. The reason being StatefulSet controller assigns a unique, stable name (mysql-0, mysql-1, mysql-2) with mysql-0 being the master and others being slaves. The configuration uses Percona Xtrabackup (open-source tool) to clone source MySQL server to its slaves.

Test MySQL setup

You can use mysql-client to send some data to the master (mysql-0.mysql)

kubectl run mysql-client --image=mysql:5.7 -i --rm --restart=Never --\
  mysql -h mysql-0.mysql <<EOF
CREATE DATABASE test;
CREATE TABLE test.messages (message VARCHAR(250));
INSERT INTO test.messages VALUES ('hello, from mysql-client');
EOF

You can run the following to test if slaves (mysql-read) received the data

$ kubectl run mysql-client --image=mysql:5.7 -it --rm --restart=Never --\
  mysql -h mysql-read -e "SELECT * FROM test.messages"

This should display an output like this:

+--------------------------+
| message                  |
+--------------------------+
| hello, from mysql-client |
+--------------------------+

To test load balancing across slaves, you can run the following command:

kubectl run mysql-client-loop --image=mysql:5.7 -i -t --rm --restart=Never --\
   bash -ic "while sleep 1; do mysql -h mysql-read -e 'SELECT @@server_id,NOW()'; done"

Each MySQL instance is assigned a unique identifier, and it can be retrieved using @@server_id. This command prints the server id serving the request and the timestamp in an infinite loop.

This command will show the output:

+-------------+---------------------+
| @@server_id | NOW()               |
+-------------+---------------------+
|         100 | 2017-10-24 03:01:11 |
+-------------+---------------------+
+-------------+---------------------+
| @@server_id | NOW()               |
+-------------+---------------------+
|         100 | 2017-10-24 03:01:12 |
+-------------+---------------------+
+-------------+---------------------+
| @@server_id | NOW()               |
+-------------+---------------------+
|         102 | 2017-10-24 03:01:13 |
+-------------+---------------------+
+-------------+---------------------+
| @@server_id | NOW()               |
+-------------+---------------------+
|         101 | 2017-10-24 03:01:14 |
+-------------+---------------------+

You can leave this open in a separate window while you run failure modes in the next section.

Alternatively, you can use Ctrl+C to terminate the loop.

Testing failure modes

We will see how StatefulSet behave in different failure modes. The following modes will be tested:

  1. Unhealthy container

  2. Failed pod

  3. Failed node

Unhealthy container

MySQL container uses readiness probe by running mysql -h 127.0.0.1 -e 'SELECT 1' on the server to make sure MySQL server is still active.

Run this command to simulate MySQL as being unresponsive:

kubectl exec mysql-2 -c mysql -- mv /usr/bin/mysql /usr/bin/mysql.off

This command renames the /usr/bin/mysql command so that readiness probe cannot find it. After a few seconds, during the next health check, the Pod should report one of its containers is not healthy. This can be verified using the command:

kubectl get pod mysql-2
NAME      READY     STATUS    RESTARTS   AGE
mysql-2   1/2       Running   0          12m

mysql-read load balancer detects failures like this and takes action by not sending traffic to failed containers. You can check this if you have the loop running in separate window. The loop shows the following output:

+-------------+---------------------+
| @@server_id | NOW()               |
+-------------+---------------------+
|         101 | 2017-10-24 03:17:09 |
+-------------+---------------------+
+-------------+---------------------+
| @@server_id | NOW()               |
+-------------+---------------------+
|         101 | 2017-10-24 03:17:10 |
+-------------+---------------------+
+-------------+---------------------+
| @@server_id | NOW()               |
+-------------+---------------------+
|         100 | 2017-10-24 03:17:11 |
+-------------+---------------------+
+-------------+---------------------+
| @@server_id | NOW()               |
+-------------+---------------------+
|         100 | 2017-10-24 03:17:12 |
+-------------+---------------------+

Revert back to its initial state

kubectl exec mysql-2 -c mysql -- mv /usr/bin/mysql.off /usr/bin/mysql

Check the status again to see that both the pods are running and healthy:

$ kubectl get pod -w mysql-2
NAME      READY     STATUS    RESTARTS   AGE
mysql-2   2/2       Running   0          5h

And the loop is now also showing all three servers.

Failed pod

To simulate a failed pod, you can delete a pod as shown:

kubectl delete pod mysql-2
pod "mysql-2" deleted

StatefulSet controller recognizes failed pods and creates a new one with same name and link to the same PersistentVolumeClaim.

$ kubectl get pod -w mysql-2
NAME      READY     STATUS     RESTARTS   AGE
mysql-2   0/2       Init:0/2   0          28s
mysql-2   0/2       Init:1/2   0         31s
mysql-2   0/2       PodInitializing   0         32s
mysql-2   1/2       Running   0         33s
mysql-2   2/2       Running   0         37s

Failed node

Kubernetes allows a node to be marked unschedulable using the kubectl drain command. This prevents any new pods to be scheduled on this node. If the API server supports eviction, then it will evict the pods. Otherwise, it will delete all the pods. The evict and delete happens for all the pods except mirror pods (which cannot be deleted through API server). Read more about drain at https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/.

You can simulate node downtime by draining the node. In order to determine which node to drain, run this command

$ kubectl get pod mysql-2 -o wide
NAME      READY     STATUS    RESTARTS   AGE       IP            NODE
mysql-2   2/2       Running   0          11m       100.96.6.12   ip-172-20-64-152.ec2.internal

Drain the node using the command:

$ kubectl drain ip-172-20-64-152.ec2.internal --force --delete-local-data --ignore-daemonsets
node "ip-172-20-64-152.ec2.internal" cordoned
WARNING: Deleting pods with local storage: mysql-2; Deleting pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet: kube-proxy-ip-172-20-64-152.ec2.internal
pod "kube-dns-479524115-76s6j" evicted
pod "mysql-2" evicted
node "ip-172-20-64-152.ec2.internal" drained

You can look at the list of nodes:

$ kubectl get nodes
NAME                             STATUS                     ROLES     AGE       VERSION
ip-172-20-107-81.ec2.internal    Ready                      node      10h       v1.7.4
ip-172-20-122-243.ec2.internal   Ready                      master    10h       v1.7.4
ip-172-20-125-181.ec2.internal   Ready                      node      10h       v1.7.4
ip-172-20-37-239.ec2.internal    Ready                      master    10h       v1.7.4
ip-172-20-52-200.ec2.internal    Ready                      node      10h       v1.7.4
ip-172-20-57-5.ec2.internal      Ready                      node      10h       v1.7.4
ip-172-20-64-152.ec2.internal    Ready,SchedulingDisabled   node      10h       v1.7.4
ip-172-20-76-117.ec2.internal    Ready                      master    10h       v1.7.4

Notice how scheduling is disabled on one node.

Now you can watch Pod reschedules

kubectl get pod mysql-2 -o wide --watch

The output always stay at:

NAME      READY     STATUS    RESTARTS   AGE       IP        NODE
mysql-2   0/2       Pending   0          33s       <none>    <none>

This could be a bug in StatefulSet as the pod was failing to reschedule. The reason was, there was no other nodes running in the AZ where the original node failed. The EBS volume was failing to to attach to other nodes because of different AZ restriction.

To mitigate this issue, manually scale the nodes to 6 which resulted in an additional node being available in that AZ. Your scenario could be different and may not need this step.

Edit number of nodes to 6 if you run into Pending issue:

kops edit ig nodes

Change the specification to:

spec:
  image: kope.io/k8s-1.7-debian-jessie-amd64-hvm-ebs-2017-07-28
  machineType: t2.medium
  maxSize: 6
  minSize: 6
  role: Node
  subnets:
  - us-east-1a
  - us-east-1b
  - us-east-1c

Review and commit changes:

kops update cluster --yes

It takes a few minutes for a new node to be provisioned. This can be verified using the command shown:

$ kubectl get nodes
NAME                             STATUS                     ROLES     AGE       VERSION
ip-172-20-107-81.ec2.internal    Ready                      node      10h       v1.7.4
ip-172-20-122-243.ec2.internal   Ready                      master    10h       v1.7.4
ip-172-20-125-181.ec2.internal   Ready                      node      10h       v1.7.4
ip-172-20-37-239.ec2.internal    Ready                      master    10h       v1.7.4
ip-172-20-52-200.ec2.internal    Ready                      node      10h       v1.7.4
ip-172-20-57-5.ec2.internal      Ready                      node      10h       v1.7.4
ip-172-20-64-152.ec2.internal    Ready,SchedulingDisabled   node      10h       v1.7.4
ip-172-20-73-181.ec2.internal    Ready                      node      1m        v1.7.4
ip-172-20-76-117.ec2.internal    Ready                      master    10h       v1.7.4

Now you can watch the status of the pod:

$ kubectl get pod mysql-2 -o wide
NAME      READY     STATUS    RESTARTS   AGE       IP           NODE
mysql-2   2/2       Running   0          11m       100.96.8.2   ip-172-20-73-181.ec2.internal

Let’s put the previous node back into normal state:

$ kubectl uncordon ip-172-20-64-152.ec2.internal
node "ip-10-10-71-96.ec2.internal" uncordoned

The list of nodes is now shown as:

$ kubectl get nodes
NAME                             STATUS    ROLES     AGE       VERSION
ip-172-20-107-81.ec2.internal    Ready     node      10h       v1.7.4
ip-172-20-122-243.ec2.internal   Ready     master    10h       v1.7.4
ip-172-20-125-181.ec2.internal   Ready     node      10h       v1.7.4
ip-172-20-37-239.ec2.internal    Ready     master    10h       v1.7.4
ip-172-20-52-200.ec2.internal    Ready     node      10h       v1.7.4
ip-172-20-57-5.ec2.internal      Ready     node      10h       v1.7.4
ip-172-20-64-152.ec2.internal    Ready     node      10h       v1.7.4
ip-172-20-73-181.ec2.internal    Ready     node      3m        v1.7.4
ip-172-20-76-117.ec2.internal    Ready     master    10h       v1.7.4

Scaling slaves

More slaves can be added to the MySQL cluster to increase the read query capacity. This can be done using the command shown:

$ kubectl scale statefulset mysql --replicas=5
statefulset "mysql" scaled

Of course, you can watch the progress of scaling

kubectl get pods -l app=mysql -w

It shows the output:

$ kubectl get pods -l app=mysql -w
NAME      READY     STATUS     RESTARTS   AGE
mysql-0   2/2       Running    0          6h
mysql-1   2/2       Running    0          6h
mysql-2   2/2       Running    0          16m
mysql-3   0/2       Init:0/2   0          1s
mysql-3   0/2       Init:1/2   0         18s
mysql-3   0/2       Init:1/2   0         28s
mysql-3   0/2       PodInitializing   0         36s
mysql-3   1/2       Running   0         37s
mysql-3   2/2       Running   0         43s
mysql-4   0/2       Pending   0         <invalid>
mysql-4   0/2       Pending   0         <invalid>
mysql-4   0/2       Pending   0         0s
mysql-4   0/2       Init:0/2   0         0s
mysql-4   0/2       Init:1/2   0         31s
mysql-4   0/2       Init:1/2   0         41s
mysql-4   0/2       PodInitializing   0         52s
mysql-4   1/2       Running   0         53s
mysql-4   2/2       Running   0         58s

If the loop is still running, then it will print an output as shown:

+-------------+---------------------+
|         101 | 2017-10-24 03:53:53 |
+-------------+---------------------+
+-------------+---------------------+
| @@server_id | NOW()               |
+-------------+---------------------+
|         100 | 2017-10-24 03:53:54 |
+-------------+---------------------+
+-------------+---------------------+
| @@server_id | NOW()               |
+-------------+---------------------+
|         102 | 2017-10-24 03:53:55 |
+-------------+---------------------+
+-------------+---------------------+
| @@server_id | NOW()               |
+-------------+---------------------+
|         103 | 2017-10-24 03:53:57 |
+-------------+---------------------+
+-------------+---------------------+
| @@server_id | NOW()               |
+-------------+---------------------+
|         103 | 2017-10-24 03:53:58 |
+-------------+---------------------+
+-------------+---------------------+
| @@server_id | NOW()               |
+-------------+---------------------+
|         104 | 2017-10-24 03:53:59 |
+-------------+---------------------+

You can also verify if the slaves have the same data set:

kubectl run mysql-client --image=mysql:5.7 -i -t --rm --restart=Never --\
mysql -h mysql-3.mysql -e "SELECT * FROM test.messages"

It still shows the same result:

+--------------------------+
| message                  |
+--------------------------+
| hello, from mysql-client |
+--------------------------+

You can scale down by using the command shown:

kubectl scale statefulset mysql --replicas=3
statefulset "mysql" scaled

Note that, scale in doesn’t delete the data or PVCs attached to the pods. You have to delete them manually

kubectl delete pvc data-mysql-3
kubectl delete pvc data-mysql-4

It shows the output:

persistentvolumeclaim "data-mysql-3" deleted
persistentvolumeclaim "data-mysql-4" deleted

Cleaning up

First delete the StatefulSet. This also terminates the pods:

$ kubectl delete statefulset mysql
statefulset "mysql" deleted

Verify there are no more pods running:

kubectl get pods -l app=mysql

It shows the output:

No resources found.

Delete ConfigMap, Service, PVC using the command:

$ kubectl delete configmap,service,pvc -l app=mysql
configmap "mysql-config" deleted
service "mysql" deleted
service "mysql-read" deleted
persistentvolumeclaim "data-mysql-0" deleted
persistentvolumeclaim "data-mysql-1" deleted
persistentvolumeclaim "data-mysql-2" deleted