NOTE: This script is meant for pre-CSI Kubernetes. If you use a Trident CSI, move on - nothing to see here!
It aims to provide a simple method to failover a Kubernetes cluster from one SolidFire cluster to another by redirecting iSCSI to a backup SolidFire cluster.
To accomplish that, it does the following:
- Go through all the PVs in a Kubernetes cluster
- Find only iSCSI vols
- Find the volumes with PVs name on SolidFire cluster
- Remove all volume pairs (sources of inbound replication) for these volumes
- Change volume access mode from
failoverTarget
toreadWrite
for all these volumes - Update IQN and Target Portal for these PV and replaces PVs in Kubernetes
NOTE: this script has not been tested and it may not really work (in fact I am sure I can't make it work as-is). I would not recommend testing it out outside of a lab environment. See Limitations and the issues before you give it a try.
- All primary SolidFire Volumes (corresponiding to PVs in K8S) have a replicated volume on the secondary SolidFire with exactly the same name
- The secondary SolidFire cluster already has VAG defined and all volumes are already added to the corresponding VAG
- After the script has completed, all pods must be restarted for new iSCSI connections to establish.
-
Clone repository
git clone https://github.com/kapilarora/sf-failover-k8s.git
-
Enter cloned repository
cd sf-failover-k8s
-
Setup your profile with Kubernetes and SolidFire details. Edit file named profile (remember to set
NO_EXECUTE=true
for a dry run) -
Create and activate (on subsequent only activate) a virtualenv:
virtualenv venv; source venv/bin/activate
-
Install dependencies
pip install -r requirements.txt
-
Setup Env variables using a profile file
source profile
-
Execute failover script:
python3 failover.py
-
Restart pods
-
To failback, replicate SolidFire volumes to the original cluster and once they're in sync, stop pods, load the profile (just set new
SF_
variables) and run this script again
KUBECONFIG="/home/sean/.kube/config"
SF_IP="192.168.1.34"
SF_USERNAME="admin"
SF_PASSWORD="admin"
SF_TARGET_PORTAL="192.168.103.34"
NO_EXECUTE="true"
WAIT_TIME="60"
LOG_LEVEL="debug"
- Logging does not mask SolidFire cluster credentials (see Issues)
- The script is limited to exactly what it says it does in a simple "one Kubernetes, two SolidFire clusters" environment
- This script not designed to work in complex environments (for example, Kubernetes cluster connected to three SolidFire arrays)
- This script may requires the use of VAGs which latest Trident no longer recommends
- Trident's SolidFire backend should be configured to use VAGs
- If that is the case user must maintain VAGs if Trident's SolidFire backend does not use CHAP (please check the Trident documentation for VAG-related limitations)
- After switch-over, Trident backend configuration for SolidFire may need updating (in the case the new cluster has different cluster admin username or password, for example)
- I'm not sure how Trident is supposed to create new PVs after failover - this should be checked
To make this script usable, we'd have to update it and test it with a Trident release v21.01 or better. Please submit results of your testing in Issues.