This was done during Distributed Cloud Native Hackathon 7th Sept 2019. Meetup event info
It is advised to read this document fully before doing any cluster setups or app deployments.
Primary:
- try to recreate demo scenario from chaos-toolkit about app on k8s with node pool replacement (this was on GKE, so we want to try it on Azure)
Secondary:
- get to know about managed Kubernetes on Microsoft Azure
- prior experience almost 0
- use chaos-toolkit to play around, without prior experience with the tool
- this is a sample example not for production use, use at your own risk.
- this was tested only on Azure, with simple k8s cluster with smallest instances, 1 core, 4 nodes
- we used node draining for simulating node loss - after talking in the team and due to time constraints we ruled out calling to Azure API to effectively delete nodes
- remember to delete cluster afterwards
- original primary objective was not achieved - due to time and tech knowledge constraints about Azure in general
- primary objective was adjusted - we changed it from
replace node
todrain selected node
- and this was acheived successfully - secondary objective achieved, we know Azure managed k8s and chaos-toolkit better
- Microsoft Azure provides managed kubernetes, but to use autoscaling node
pools requires additional subscription features which are in Preview mode
- we were unable to activate it, so we switched to simple node draining scenario
- Due to how DNS (sic!) was set in web spawned console in Azure (console icon in top right corner) we were unable to reach k8s cluster with kubectl, also we had no time to adjust the env which was console using (such as adjusting DNS and so on and walled-garden approach of that vm/ container/whatever) due to time constrains and lack of experience
- Due to above we went full YOLO mode
¯\_(ツ)_/¯
- we decided to compress~/.kube/
on the vm/container, and fetch it to the laptops - this is a short living cluster and it was wiped after 12h - Using storage class
azurefile
was much faster than normal disks in Azure - total cloud cost 4.45 EUR - 12h of compute/storage with k8s clusters, could be a bit less due to the fact we never touched Azure before.
- Helm chart for
stable/wordpress
has very slow readiness probes
- REALLY READ chaos-toolkit tutorials before using it (derp mode on)
chaos-toolkit
looks pretty interesting in automating tests, especially for nightly builds- Currently chaos-toolkit provides a lot of testing extensions especially for AWS, while GCP is severely lacking (actually just add/remove node pools in GKE)
- Currently chaos-toolkit looks mature in it's core but needs additional extendability
Create kubernetes cluster with 4 nodes, in Azure it should take about 30min.
Export KUBECONFIG variable so that you manage only specific cluster.
export KUBECONFIG="$(pwd)/.kube/config"
If using Azure and playing in shell console in the cloud just fetch cluster credentials.
If you are using Azure k8s then we need read-write-many StorageClass to allow multiple pods to write to storage:
kubectl apply -f provision/kubernetes/azure-pvc-roles.yaml
kubectl apply -f provision/kubernetes/azure-file-sc.yaml
For other storage/cloud providers you may need to adjust it accordingly.
Install helm v2.14.3 on your control host.
Then we need to apply RBAC on cluster and deploy helm:
kubectl apply -f provision/kubernetes/helm-tiller.rbac.yaml
helm init --service-account tiller --wait
Since now your kubernetes cluster should be ready for app deployment.
We will use official helm chart for wordpress with our minor customizations
within wordpress-azurefile.yaml
file:
helm install --name wp-02 stable/wordpress -f wordpress-azurefile.yaml
It takes about few minutes due to the way storage is attached to k8s nodes.
kubectl port-forward service/wp-02-wordpress 80:80 8081:80
kubectl get endpoints
kubectl get svc
Notice that Azure managed kubernetes may be exposing app publicly to the Internet.
We will use label drain-me
as a selector when doing one of the tests later.
We just want to manage specific nodes in this scenario, and leave node with
database intact.
kubectl get nodes
NAME STATUS ROLES AGE VERSION
aks-agentpool-32137755-0 Ready agent 33m v1.13.10
aks-agentpool-32137755-1 Ready agent 33m v1.13.10
aks-agentpool-32137755-2 Ready agent 33m v1.13.10
aks-agentpool-32137755-3 Ready agent 33m v1.13.10
See where database was deployed (kubectl describe will show it), in our case it was node with suffix 2. So we add labels accordingly:
kubectl label nodes aks-agentpool-32137755-2 app=mysql --overwrite
kubectl label nodes aks-agentpool-32137755-0 app=drain-me --overwrite
kubectl label nodes aks-agentpool-32137755-1 app=drain-me --overwrite
kubectl label nodes aks-agentpool-32137755-3 app=drain-me --overwrite
We will use pyenv with pyenv-virtualenv to install desired python version and then to create virtualenv for the project:
pyenv virtualenv chaos
pyenv activate chaos
pip install -r requirements.txt
When the cluster is getting ready it is good time to read about chaos-toolkit tutorial.
This way next sections will be less cryptic ;-)
Remember to export KUBECONFIG
.
First, run chaos discovery to fetch available options for kubernetes:
chaos discover chaostoolkit-kubernetes --no-install
See discovery.json
Look into experiment.json
- we defined there:
- hypothesis that specific microservice is up and healthy, where healthy means it has proper number of desired replicas
- testing methods to validate above:
- kill 2 pods and wait 90s
- drain 1 random node and wait 90s
We know that in Azure and non given cluster the specific wordpress pods are in Ready state in about 70s, so we will use 90s as a base time span for tests.
Run chaos experiment which kills 2 pods (out of 3) and waits 90s:
chaos run --journal-path journal.json experiment.json
Play around, with each run chaos run experiment.json
and see what happens:
- change replicas to 2
- in
experiment.json
changepauses after
value to 10.
Notice to uncordon nodes after running experiment.
Add your own experiments basing on chaosk8s.
Generate example report:
chaos report --export-format=html journal.json report.html
Pro tip - there are more formats, such as: asciidoc, beamer, commonmark, context, docbook, docx, dokuwiki, dzslides, epub, epub3, fb2, haddock, html, html5, icml, json, latex, man, markdown, markdown_github, markdown_mmd, markdown_phpextra, markdown_strict, mediawiki, native, odt, opendocument, opml, org, pdf, plain, revealjs, rst, rtf, s5, slideous, slidy, texinfo, textile