Horizontal Pod Autoscaling in Kubernetes (minikube). Ideally HPA needs to be done in EKS/AKS with CA which will be done later.
- In Kubernetes, a HorizontalPodAutoscaler automatically updates a workload resource (such as a Deployment or StatefulSet), with the aim of automatically scaling the workload to match demand.
- Kubernetes uses the horizontal pod autoscaler (HPA) to monitor the resource demand and automatically scale the number of replicas.
- This method can also be referred to as scaling out. In this method, Kubernetes allows DevOps engineer, SRE, or cluster admin to increase or decrease the number of pods automatically based upon application resource usage. The below diagrams explains the process.
- Enable the metrics-server addon for minikube by running the below command (Confirmation will be like the image below). The metrics-server fetches resource metrics from the kubelets and exposes them in the Kubernetes API server through the Metrics API for use by the HPA and VPA. By default, the horizontal pod autoscaler checks the Metrics API every 15 seconds for any required changes in replica count, but the Metrics API retrieves data from the Kubelet every 60 seconds. Effectively, the HPA is updated every 60 seconds. When changes are required, the number of replicas is increased or decreased accordingly.
$ minikube addons enable metrics-server
- It’s helpful to use lower collection intervals of 1 minute and 10 seconds to see more immediate action.
$ minikube start — extra-config=controller-manager.horizontal-pod-autoscaler-upscale-delay=1m — extra-config=controller-manager.horizontal-pod-autoscaler-downscale-delay=1m — extra-config=controller-manager.horizontal-pod-autoscaler-sync-period=10s — extra-config=controller-manager.horizontal-pod-autoscaler-downscale-stabilization=1m - Download and deploy the manifest HPA.yaml
$ kubectl apply -f HPA.yaml - Check the pod and service created. Wait for the pod to be in running state
$ kubectl get pod,svc
- Create the HorizontalPodAutoscaler.
$ kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=5 - Check the current status of the newly-made HorizontalPodAutoscaler, by running below command. The output should be similar to the image.
$ kubectl get hpa
- Check how the autoscaler reacts to increased load. To do this, start a different Pod to act as a client. The container within the client Pod runs in an infinite loop, sending queries to the php-apache service. Open a new window and run the below command.
$ kubectl run -i --tty load-generator --rm --image=busybox:1.28 --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://php-apache; done" - In main window run the below command. (type Ctrl+C to end the watch). Within a minute or so, the terminal should display the higher CPU load like the image below
$ kubectl get hpa php-apache --watch
- After the load increased the deployment size will also increase and can be seen by running below command.
$ kubectl get deployment php-apache
- After few moments if the pods are checked then all 5 pods will be running. (replica count matching the figure from the HorizontalPodAutoscaler)
$ kubectl get pods
- Stoping the load - In the terminal where the Pod that runs a busybox image, terminate the load generation by typing Control + C.
- After few mins if we check the deployment and replicas then we can see only 1 pod with 1 replica.