Personal "iterative" kubernetes sandbox used while playing with kubernetes at home on top of raspberry pi 3B machines.
We'll first manually setup a single-master kubernetes cluster on top of 4x Raspberry pi 3B (our fifth raspberry pi ethernet device died recently).
Because one raspberry pi 3B can barely handle the load generated by 4 worker nodes calls to apiserver (no comment), we'll later switch to a multi-master cluster, composed of :
- 3 master nodes
- 1 worker node (later on we'll add amd64 nodes to the cluster, re-entering the painful world of docker and multi-arch images :p)
- 2x HAproxy + VRRP to provide HA for control plane nodes
references :
-
step 1 : basic manual cluster setup, in single-master mode with a nginx ingress controller + metalLB and a default nginx service in spec.type: LoadBalancer mode as a proof of concept.
-
step 2 : using ansible to create the same cluster automatically.
- installing docker-ce on arm+amd64 machines,
- configuring networking/VLANs on cluster members
- installing a local docker registry on ARM machines, reconfiguring docker engines to use this registry as a registry-mirror, with self-signed ssl/tls certs. This registry will not run on top of the cluster (chicken and egg problem) but on a single ARM device instead.
-
step 3 : setting up custom namespaces, network policies, roles (rbac) and better test service using homemade docker images
-
step 4 : migrate our manually-created test service to our local gitlab-ce installation, with gitlab-ci integration. (or better, switch to jenkins-x )
In the first iteration/step, we'll use following raspberry pi hosts :
- pi01.p13.p.s18m2.com 10.13.1.21
- pi02.p13.p.s18m2.com 10.13.1.22
- pi03.p13.p.s18m2.com 10.13.1.23
- pi04.p13.p.s18m2.com 10.13.1.24 (master node)
One single kubernetes master node for now.
because i want to bring both ARM and amd64 machines into the mix, i use ansible to provision 4 extra amd64 virtual machines using :
- https://github.com/alemansec/ansible-role-qemu-host ; to setup and configure qemu on chosen hosts
- https://github.com/alemansec/ansible-role-qemu-provision-machine ; to provision actual amd64 VM(s).
this adds :
- vk8s01.p13.p.s18m2.com
- vk8s02.p13.p.s18m2.com
- vk8s03.p13.p.s18m2.com
- vk8s04.p13.p.s18m2.com to the mix
Note : i won't add this ansible part to the repository, for multiple reasons (those initial playbooks and roles are quite ugly and still contain many hardcoded values, plus ansible will probably get dozens of deprecation warnings then errors in a few weeks, as usual...)
We setup two local nameservers to provide host name resolution for our sandbox zone "p13.p.s18m2.com", extra vlan configuration is also applied using this playbook, so our raspberry pi hosts can sit on various networks the way we like :
ansible-playbook -i inventory/bootstrap/ _bootstrap.yml
# limit to bind :
ansible-playbook -i inventory/bootstrap/ _bootstrap.yml --tags bind
docker system prune --volumes
iptables -P INPUT ACCEPT
iptables -P FORWARD ACCEPT
iptables -P OUTPUT ACCEPT
iptables -t nat -F
iptables -t mangle -F
iptables -F
iptables -X
ip6tables -P INPUT ACCEPT
ip6tables -P FORWARD ACCEPT
ip6tables -P OUTPUT ACCEPT
ip6tables -t nat -F
ip6tables -t mangle -F
ip6tables -F
ip6tables -X
Basic docker-ce setup on amd64+armhf(armv7 only for now, not using aarch64 here yet), plus ssl/tls certificates setup related to our local docker registry (also running on arm devices) :
cd ansible/
ansible-playbook -i inventory/sandbox/ book_docker_engine.yml
dphys-swapfile swapoff
dphys-swapfile uninstall
update-rc.d dphys-swapfile remove
# append to /boot/cmdline.txt :
cgroup_enable=cpuset cgroup_memory=1 cgroup_enable=memory swapaccount=1
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
# yes, this shows "xenial" although we're running raspbian on our raspberry pi machines:
echo "deb http://apt.kubernetes.io/ kubernetes-xenial main" > /etc/apt/sources.list.d/kubernetes.list
apt-get update && apt-get -y install kubeadm
we'll use a separate partition (hopefully faster than internal mmc card) for the whole set of kubernetes tools.
we'll use /opt/hosting/infra/k8s ; /opt/hosting/infra being already mounted locally
# on al nodes :
mkdir /opt/hosting/infra/k8s
sed -i 's#KUBELET_EXTRA_ARGS=#KUBELET_EXTRA_ARGS="--root-dir=/opt/hosting/infra/k8s"#g' /etc/default/kubelet
(our docker-engine daemons also uses a separate partition on another device than the internal mmc card).
on future master node (pi04.p13.p.s18m2.com in our case), as root :
# pre-pull required docker images to run a k8s master node :
kubeadm config images pull
We will use 'flannel' networking, so we have to run (10.244.0.0/16 being the default flannel cidr) :
# as root, on pi04 (future master node) :
# 10.13.1.24 = pi04, our future master node (where we run those commands) :
kubeadm init --pod-network-cidr=10.244.0.0/16 --apiserver-advertise-address=10.13.1.24
or better : use this script in the first place to kubeadm init the cluster : (chicken and egg problem on slower machines : kubeadm init creates the manifests containing .. the timeouts values we need to increase..)
#!/usr/bin/env python
#
# replace some startup timeouts when manifests files are created by kubeadm init
# see https://github.com/kubernetes/kubeadm/issues/413
# run me using 'sudo python foobar.py'
#
import os
import time
import threading
filepath = '/etc/kubernetes/manifests/kube-apiserver.yaml'
def replace_defaults():
print('Thread start looking for the file')
while not os.path.isfile(filepath):
time.sleep(1) #wait one second
print('\033[94m -----------> FILE FOUND: replacing defaults \033[0m')
os.system("""sed -i 's/failureThreshold: [0-9]/failureThreshold: 18/g' /etc/kubernetes/manifests/kube-apiserver.yaml""")
os.system("""sed -i 's/timeoutSeconds: [0-9][0-9]/timeoutSeconds: 20/g' /etc/kubernetes/manifests/kube-apiserver.yaml""")
os.system("""sed -i 's/initialDelaySeconds: [0-9][0-9]/initialDelaySeconds: 240/g' /etc/kubernetes/manifests/kube-apiserver.yaml""")
t = threading.Thread(target=replace_defaults)
t.start()
os.system("kubeadm init --pod-network-cidr=10.244.0.0/16 --apiserver-advertise-address=10.13.1.24")
as a regular user on same node (future master node, pi04) (as told by kubeadm init) :
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
write join command somewhere, here's a sample :
kubeadm join 10.13.1.24:6443 --token ymduph.d5tuo85q088e1k72 --discovery-token-ca-cert-hash sha256:e568747339f7deaf29b6777d9bffd52766bf5fee22e0b94394bf4f49b9dcdb03
on master node (pi04), as the regular user with .kube/config (or on your desktop machine if you copied .kube/config under ~/.kube/) :
#kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
# sed -i 's/amd64/arm/g' ./kube-flannel.yml
kubectl apply -f ./kube-flannel.yml
kubectl get pods --namespace=kube-system
on each node, including master node :
#sudo sysctl net.bridge.bridge-nf-call-iptables=1
sudo echo "net.bridge.bridge-nf-call-iptables = 1" > /etc/sysctl.d/k8s.conf
sudo sysctl -p /etc/sysctl.d/k8s.conf
as root, on every other node :
# run "kubeadm token create --print-join-command" as root on master node to get a new token if lost :
kubeadm join 10.13.1.24:6443 --token ymduph.d5tuo85q088e1k72 --discovery-token-ca-cert-hash sha256:e568747339f7deaf29b6777d9bffd52766bf5fee22e0b94394bf4f49b9dcdb03
exposing the service onto an external ip address :
To be able to use the spec.type: LoadBalancer, we install an ingress controller (nginx in our case).
We also setup metallb, so we get dynamically allocated 'external' ip addresses from configured pool.
https://kubernetes.github.io/ingress-nginx/deploy/
MetalLB controller needs to be installed, so our exposed services (with type LoadBalancer) gets external IP addresses:
# https://metallb.universe.tf/installation/
#kubectl apply -f https://raw.githubusercontent.com/google/metallb/v0.7.3/manifests/metallb.yaml
kubectl apply -f ./ingress/metallb/metallb.yaml
# MetalLB’s components will still start, but will remain idle until you define and deploy a configmap :
kubectl apply -f ./ingress/metallb/layer2-config.yaml
https://metallb.universe.tf/usage/#requesting-specific-ips
"MetalLB respects the spec.loadBalancerIP parameter, so if you want your service to be set up with a specific address, you can request it by setting that parameter."
MetalLB also supports requesting a specific address pool, if you want a certain kind of address but don’t care which one exactly. To request assignment from a specific pool, add the metallb.universe.tf/address-pool annotation to your service, with the name of the address pool as the annotation value. For example:
apiVersion: v1
kind: Service
metadata:
name: nginx
annotations:
metallb.universe.tf/address-pool: production-public-ips
spec:
ports:
- port: 80
targetPort: 80
selector:
app: nginx
type: LoadBalancer
('production-public-ips' pool beeing defined in ingress/metallb/layer2-config.yaml)
because ingress-nginx does not work on ARM arch at the moment (see below), i'll switch to traefik.
# RBAC ClusterRoleBinding:
kubectl -f ingress/traefik/traefik-rbac.yaml
# deployment :
kubectl apply -f ingress/traefik/traefik-deployment.yaml
# http basic auth for traefik dashboard access:
# (secret has to be create in same namespace as ingress object)
htpasswd -c ./traefik-admin-auth someusername
kubectl create secret generic traefik-dashboard-basic-auth --from-file ./traefik-admin-auth --namespace kube-system
# service and ingress to expose traefik web ui :
# openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout traefik.p13.p.s18m2.com.key -out traefik.p13.p.s18m2.com.crt -subj "/CN=traefik.p13.p.s18m2.com"
# kubectl -n kube-system create secret tls traefik-ui-tls-cert --key=traefik.p13.p.s18m2.com.key --cert=traefik.p13.p.s18m2.com.crt
kubectl -n kube-system create secret tls traefik-ui-tls-cert --key=./certificates/certs/traefik.p13.p.s18m2.com.key --cert=./certificates/certs/traefik.p13.p.s18m2.com.crt
kubectl apply -f ingress/traefik/ui.yaml
https://github.com/nginxinc/kubernetes-ingress/blob/master/docs/installation.md
Note that we had to use another image, because of our ARM arch, hence the use of ./ingress/nginx/mandatory.yaml
# amd64 only: kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/mandatory.yaml
# arm:
kubectl apply -f ./ingress/nginx/mandatory.yaml
# to see progress :
kubectl describe pods -n ingress-nginx
start our nginx ingress controller in loadBalancer mode so it gets an external ip :
kubectl create -f ingress/nginx/service-loadbalancer.yaml
nginx ingress controller gets assigned an IPaddress provided by METALLB.
Later on, Services (not requiring to use spec.loadBalancer anymore) are automatically made available externally by nginx ingress controller via Kind: Ingress rules.
we can still expose services directly without Ingress rules, by using spec.type: loadBalancer for those services.
Running into this issue : kubernetes/ingress-nginx#3545 (crashes after 1 or two hits)
I0205 07:58:13.441048 7 controller.go:195] Backend successfully reloaded.
I0205 07:58:13.451748 7 controller.go:212] Dynamic reconfiguration succeeded.
192.168.100.4 - [192.168.100.4] - - [05/Feb/2019:07:58:18 +0000] "GET / HTTP/1.1" 302 31 "-" "curl/7.62.0" 83 0.007 [default-gogs-ui-3000] 10.244.3.24:3000 31 0.010 302 8597a5ef8706a4416d94bb36e59f035f
192.168.100.4 - [192.168.100.4] - - [05/Feb/2019:07:58:19 +0000] "GET / HTTP/1.1" 302 31 "-" "curl/7.62.0" 83 0.015 [default-gogs-ui-3000] 10.244.3.24:3000 31 0.010 302 9fe59e2fc2e523e5fcfaea8326552cac
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x11a70]
goroutine 256 [running]:
runtime/internal/atomic.goXadd64(0x28e44bc, 0x2, 0x0, 0x3126e979, 0x3f7cac08)
/usr/local/go/src/runtime/internal/atomic/atomic_arm.go:96 +0x1c
k8s.io/ingress-nginx/vendor/github.com/prometheus/client_golang/prometheus.(*histogram).Observe(0x28e4460, 0x3126e979, 0x3f7cac08)
/go/src/k8s.io/ingress-nginx/vendor/github.com/prometheus/client_golang/prometheus/histogram.go:272 +0x68
k8s.io/ingress-nginx/internal/ingress/metric/collectors.(*SocketCollector).handleMessage(0x2be1780, 0x2fe8000, 0x14a, 0x600)
/go/src/k8s.io/ingress-nginx/internal/ingress/metric/collectors/socket.go:269 +0xb8c
nginx hello world with spec.type: LoadBalancer (dynamically getting an external ip address + Loadbalancer)
- create nginx deployment :
this creates a nginx pod, only accessible from within the cluster. (using 'nginx:stable' image, working properly on arm arch)
kubectl create -f ./use/deployments/helloworld-nginx.yml
kubectl get pods -o wide
- exposing service to whole cluster :
we create a service, exposing the nginx application within the entire cluster.
either :
kubectl expose deployment/my-nginx
or
kubectl create -f ./use/services/helloworld-svc-nginx.yml
kubectl get svc -o wide my-nginx
kubectl describe svc my-nginx
kubectl scale --current-replicas=2 --replicas=1 deployment/my-nginx
See if metallb did what it was supposed to do :
$ kubectl get svc -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 6h7m <none>
my-nginx LoadBalancer 10.101.142.20 192.168.100.240 80:30020/TCP 16m run=my-nginx
We got an 'EXTERNAL-IP' assigned, in the range configured within metallb config :p
me@machine_external_to_cluster$ curl -v http://192.168.100.240/
* Trying 192.168.100.240...
* TCP_NODELAY set
* Connected to 192.168.100.240 (192.168.100.240) port 80 (#0)
> GET / HTTP/1.1
> Host: 192.168.100.240
> User-Agent: curl/7.62.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Server: nginx/1.14.2
< Date: Thu, 31 Jan 2019 10:56:08 GMT
< Content-Type: text/html
< Content-Length: 612
< Last-Modified: Tue, 04 Dec 2018 14:44:49 GMT
< Connection: keep-alive
< ETag: "5c0692e1-264"
< Accept-Ranges: bytes
<
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
...
(i can't use/validate ingress-nginx right now, because it crashes on arm right now (in its 0.20 version that is, the latest available atm).
references/doc :
- https://github.com/kubernetes/dashboard/wiki/Access-control#admin-privileges
- https://github.com/kubernetes/dashboard/releases
- image: k8s.gcr.io/kubernetes-dashboard-arm:v1.10.1
- (to edit to switch to arm image i guess :
kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v1.10.1/src/deploy/recommended/kubernetes-dashboard.yaml
kubectl apply -f ./dashboard/kubernetes-dashboard.yaml
# then, to access the dashboard :
# (for example, i'm executing ```kubectl proxy``` on my desktop machine previously configured to access the cluster using kubectl
kubectl proxy
# create ServiceAccount :
kubectl create -f ./dashboard/serviceaccount.yml
# clusterrole binding:
kubectl apply -f ./dashboard/clusterrole_binding.yml
# retrieve access token using :
kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep admin-user | awk '{print $1}')
# visit http://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/
# paste the 'token' value of previous command to login
Metrics :
ref: https://github.com/kubernetes/dashboard/wiki/Integrations
- https://github.com/coreos/prometheus-operator (limited)
- https://github.com/coreos/prometheus-operator/blob/master/contrib/kube-prometheus (complete solution)
@TODO use this with letsencrypt - similar to traefik's letsencrypt support in terms of features (should also be possible to "mix" both letsencrypt and self-signed certs, like traefik does)
--> no official ARM support yet :
optional : if you want to allow kubernetes master node to run tasks
# remove 'master' role from master nodes, so they also accept tasks:
kubectl taint nodes --all node-role.kubernetes.io/master-
kubectl get pods --all-namespaces
kubectl get pods --namespace=kube-system
kubectl get nodes
kubectl logs --tail=50 kube-flannel-ds-arm-rdk7f -n kube-system
# list services in kube-system namespace :
kubectl get svc -n kube-system
kubectl apply -f <file.yml | url_to_yml>
kubectl delete -f <file.yml | url_to_yml>
# basic health report :
kubectl get cs
# basic :
kubectl run appname --image=myimage:tag
# useful when in 'creating' status (while no logs are available yet) :
kubectl describe pods -n ingress-nginx
kubectl drain pi03.p13.p.s18m2.com --delete-local-data --force --ignore-daemonsets ; kubectl delete node pi03.p13.p.s18m2.com
kubectl drain pi04.p13.p.s18m2.com --delete-local-data --force --ignore-daemonsets ; kubectl delete node pi04.p13.p.s18m2.com
kubectl drain pi01.p13.p.s18m2.com --delete-local-data --force --ignore-daemonsets ; kubectl delete node pi01.p13.p.s18m2.com
kubectl drain pi02.p13.p.s18m2.com --delete-local-data --force --ignore-daemonsets ; kubectl delete node pi02.p13.p.s18m2.com
# on each node
kubeadm reset
iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X
ipvsadm -C
https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/#tear-down
Tear down
To undo what kubeadm did, you should first drain the node and make sure that the node is empty before shutting it down.
Talking to the master with the appropriate credentials, run:
kubectl drain --delete-local-data --force --ignore-daemonsets kubectl delete node
Then, on the node being removed, reset all kubeadm installed state:
kubeadm reset
The reset process does not reset or clean up iptables rules or IPVS tables. If you wish to reset iptables, you must do so manually:
iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X
If you want to reset the IPVS tables, you must run the following command:
ipvsadm -C
If you wish to start over simply run kubeadm init or kubeadm join with the appropriate arguments.
More options and information about the kubeadm reset command
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
# yes, this line shows "xenial" although we're running debian stretch on our desktop machine :
echo "deb http://apt.kubernetes.io/ kubernetes-xenial main" > /etc/apt/sources.list.d/kubernetes.list
apt-get update && apt-get -y install kubectl
me@desktop$ mkdir ~/.kube
me@desktop$ scp pi04:/home/lonelyone/.kube/config $HOME/.kube/
me@desktop$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default my-nginx-7fd64b656c-j78sc 1/1 Running 0 42m
default my-nginx-7fd64b656c-pjjmb 1/1 Running 0 42m
ingress-nginx nginx-ingress-controller-594f658645-nhxg2 1/1 Running 0 53m
kube-system coredns-86c58d9df4-kjgl7 1/1 Running 0 6h30m
kube-system coredns-86c58d9df4-znd5j 1/1 Running 0 6h30m
kube-system etcd-pi04.p13.p.s18m2.com 1/1 Running 0 6h30m
kube-system kube-apiserver-pi04.p13.p.s18m2.com 1/1 Running 1 6h31m
kube-system kube-controller-manager-pi04.p13.p.s18m2.com 1/1 Running 0 6h30m
kube-system kube-flannel-ds-arm-46bxm 1/1 Running 1 6h26m
kube-system kube-flannel-ds-arm-lg69b 1/1 Running 1 6h26m
kube-system kube-flannel-ds-arm-lrj2n 1/1 Running 0 6h28m
kube-system kube-flannel-ds-arm-wtcxj 1/1 Running 1 6h27m
kube-system kube-proxy-c5cjn 1/1 Running 0 6h26m
kube-system kube-proxy-cts58 1/1 Running 0 6h27m
kube-system kube-proxy-k758k 1/1 Running 0 6h26m
kube-system kube-proxy-lncbd 1/1 Running 0 6h30m
kube-system kube-scheduler-pi04.p13.p.s18m2.com 1/1 Running 0 6h31m
metallb-system controller-7cc9c87cfb-b6m55 1/1 Running 0 74m
metallb-system speaker-2gmfk 1/1 Running 0 74m
metallb-system speaker-h67f8 1/1 Running 0 74m
metallb-system speaker-wh5ww 1/1 Running 0 74m
refs:
- https://kubernetes.io/docs/setup/independent/high-availability/
- https://kubernetes.io/docs/setup/independent/ha-topology/
- 1 kubernetes namespace per environment+project (dev, staging, production) : see https://kubernetes.io/docs/tasks/administer-cluster/namespaces/
- Role-Based Acces Control --> per users access to namespaces : see https://kubernetes.io/docs/reference/access-authn-authz/rbac/
- network policies : restrict applications to given networks ; see https://kubernetes.io/docs/concepts/services-networking/network-policies/
When you create a Service, it creates a corresponding DNS entry. This entry is of the form <service-name>.<namespace-name>.svc.cluster.local
("cluster.local" beeing the default cluster name created by kubeadm init), which means that if a container just uses <service-name>
it will resolve to the service which is local to a namespace.
Containers grouped together in the same pod may use localhost
if tight coupling is needed.
- prometheus
- https://github.com/weaveworks/kubediff - tool for Kubernetes to show differences between running state and version controlled configuration
- https://github.com/weaveworks/grafanalib - Python library for building Grafana dashboards
This sample sandbox violates quite a few production rules ; here is a kubernetes best practices talk presenting a few kubernetes best practices (it starts slowly,but finally gets into interesting/useful points) :
- "Kubernetes Best Practices with Sandeep Dinesh (Google)"
- video : https://www.youtube.com/watch?v=BznjDNxp4Hs
- presentation slides : https://speakerdeck.com/thesandlord/kubernetes-best-practices
see gogs/README.md
WARNING : this is the first attempt, it does not (yet) use secrets, uses an emptyDir storage (meaning you'll lose everything on teardown).. but at least it works with separate containers/pods and is quite light as opposed to gitlab-ce.
i'll configure persistent volumes using my local glusterfs installation (also running on Raspberry pi machines)