In this dojo, you will learn to use custom metrics to scale a service deployed in a Kubernetes cluster. You will even scale the service down to zero instances when possible, to avoid wasting resources.
For this exercise, you will need :
Make sure Docker is running on your computer. Otherwise, none of this will work.
Create a local Kubernetes cluster with the exercise ready:
./scripts/bootstrap.sh
You now have a Kubernetes cluster running on your machine. Have a look at what is inside:
kubectl get pods -A
An application is deployed to the default
namespace. It has a simple
event-driven architecture:
- Users send requests to the producer.
- The producer pushes items to a queue stored in a Redis instance.
- A consumer pops items from the queue.
- The consumer does some work for each item.
Try it out! First, tell the producer to publish a message:
curl -X POST producer.vcap.me/publish
# or
http POST producer.vcap.me/publish
Now check the consumer's logs:
kubectl logs -l app.kubernetes.io/name=consumer
Notice that the consumer takes 1 second to process an item from the queue. Now, publish 10 messages at once:
curl -X POST producer.vcap.me/publish/10
# or
http POST producer.vcap.me/publish/10
The consumer's logs show that it only processes one item at a time. By adding more instances of the consumer, your application can process items faster. Scale the consumer to 3 instances:
kubectl scale deployment consumer --replicas=3
Now publish 10 messages again. By watching all of the consumers' logs at once, you should see that they can each process an item at a time. Your application now has 3 times the throughput!
You might have guessed that you are going to scale the number of consumer instances based on the number of items waiting to be processed. If you did, you are right!
Let's walk through how you are going to do this.
First, you need to deploy a monitoring system to measure the length of your Redis queue. You are going to do this with Prometheus and the Redis exporter.
Second, you need to expose this metric inside the Kubernetes API. The native Kubernetes pod autoscaler does not know how to query Prometheus. You are going to use the Prometheus adapter to serve as a bridge between the Kubernetes API server and Prometheus.
Finally, you are going to deploy a horizontal pod autoscaler that will set the number of replicas of your consumer based on the metric found in the Kubernetes API.
This is what the target architecture looks like:
The simplest way to deploy Prometheus to your cluster is with the prometheus-community/kube-prometheus-stack Helm chart.
Here is what you need to do:
- Set Prometheus's scrape interval to 5s.
- Make Prometheus watch all Rules and ServiceMonitors in the cluster.
- Create an Ingress for Prometheus with the
prometheus.vcap.me
hostname. - Deploy Prometheus to the
prometheus
namespace. - Go to the Prometheus UI: http://prometheus.vcap.me.
Ready? Set. Go!
Hint n°1
Have a look at the chart's values.yml
file. Everything you need is in there.
Hint n°2
You can configure Prometheus with the prometheus.prometheusSpec
field.
Hint n°3
Setting matchLabels
to an empty object makes it match everything.
Compare your work to the solution before moving on. Are there differences? Is your approach better or worse? Why?
You can find this step's solution here:
The easiest way to deploy the Redis exporter is with the prometheus-community/prometheus-redis-exporter chart.
Here is what you need to do:
-
Configure the chart to create a ServiceMonitor for the exporter.
-
Configure the exporter to connect to the
redis-master
Service. -
Configure the exporter to watch a single key:
padok
. -
Deploy the exporter to the
default
namespace. -
See your application activity in the Prometheus UI with this query:
rate(redis_commands_processed_total{service="prometheus-redis-exporter",namespace="default"}[20s])
You can do it!
Hint n°1
Have a look at the chart's values.yml
file. Everything you need is in there.
Hint n°2
Have a look at the exporter's GitHub repository.
Hint n°3
Did you notice the REDIS_EXPORTER_CHECK_SINGLE_KEYS
variable?
Compare your work to the solution before moving on. Are there differences? Is your approach better or worse? Why?
You can find this step's solution here:
The Redis exporter has a feature that is going to be a problem for you: when your Redis queue is empty, the exporter does not expose a metric for it. See for yourself:
-
Run this query in the Prometheus UI:
redis_key_size{service="prometheus-redis-exporter",namespace="default",key="padok"}
-
Publish a small number of items to your queue.
-
See how the metric exists when there are items in the queue, but not when the queue is empty.
Here is what you need to do to work around this:
-
Add a PrometheusRule resource to the
consumer
chart. -
In the rule, define a new metric called
redis_items_in_queue
. -
In the rule, write a PromQL query that makes it so that:
- When
redis_key_size{service="prometheus-redis-exporter",namespace="{{ .Release.Namespace }}",key="padok"}
exists,redis_items_in_queue
has the same value and labels. - When
redis_key_size
is null,redis_items_in_queue
has a value of 0 with the following labels:{service="prometheus-redis-exporter",namespace="{{ .Release.Namespace }}",key="padok"}
- When
-
Update the
consumer
's release (there's a script for that). -
Check that this query returns a value whether the queue is empty or not:
redis_items_in_queue{service="prometheus-redis-exporter",namespace="default",key="padok"}
What are you waiting for?
Hint n°1
This great blog article has an example of a PrometheusRule.
Hint n°2
Have a look at the absent
and clamp_max
Prometheus functions, and the or
keyword.
Hint n°3
This is the PromQL query to use in your PrometheusRule:
redis_key_size{service="prometheus-redis-exporter",namespace="default",key="padok"}
or
clamp_max(absent(redis_key_size{service="prometheus-redis-exporter",namespace="default",key="padok"}), 0)
Compare your work to the solution before moving on. Are there differences? Is your approach better or worse? Why?
You can find this step's solution here:
The easiest way to deploy the Prometheus Adapter is with the prometheus-community/prometheus-adapter chart.
Here is what you need to do:
-
Configure the adapter to query the existing Prometheus service.
-
Configure the adapter to expose all metrics starting with
redis_
. -
Deploy the adapter to the
prometheus
namespace. -
Read the number of items in your queue from the Kubernetes API :
kubectl get --raw '/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/services/prometheus-redis-exporter/redis_items_in_queue' | jq
Get to it!
Hint n°1
Have a look at the chart's values.yml
file. Everything you need is in there.
Hint n°2
Have a look at this documentation. The introduction is very helpful. The Discovery and Querying might help you too.
Hint n°3
You need to define a custom rule. You only need to set the seriesQuery
and
metricsQuery
fields. Forget name
and resources
.
Compare your work to the solution before moving on. Are there differences? Is your approach better or worse? Why?
You can find this step's solution here:
This step is pretty self-explanatory. Here is what you need to do:
- Add a v2beta2 HorizontalPodAutoscaler resource to the
consumer
chart. - Configure the HPA to scale the consumer's Deployment.
- Set
minReplicas
to1
andmaxReplicas
to20
. - Scale the Deployment based on a metric of type
Object
. - Scale based on the
redis_items_in_queue
timeseries with thekey=padok
label. - Set a target average of 20 items per consumer replica.
- Update the
consumer
's release (there's a script for that). - Check that the number of consumers adapts to the number of items in the queue.
Are you ready? Go!
Hint n°1
This great blog article has an example of a v2beta2 HorizontalPodAutoscaler.
Hint n°2
The Prometheus adapter mapped the metrics to Kubernetes resources. This mapping is based on the timeseries' labels.
Hint n°3
The metrics are mapped to the prometheus-redis-exporter
Service.
Compare your work to the solution before moving on. Are there differences? Is your approach better or worse? Why?
You can find this step's solution here:
Your local cluster has a special feature activated: your HPA can scale to 0 instances if you let it. Here is what you need to do:
- Set the consumer's HPA's minimum to 0.
- Update the
consumer
release. - Check that the HPA scales to 0 instances when the Redis queue is empty.
Easy, wasn't it?
If you haven't already, have a look at your HPA:
kubectl get hpa consumer
You should notice that there is something wrong with what is displayed.
-9223372036854775808 is a strange number. You might know that it is equal to
-2^63. But why is your HPA displaying this nonsensical value? Is this a bug in
kubectl
?
Check your HPA's status directly, without kubectl
's user-friendly format:
kubectl get hpa consumer -o json | jq .status
The value is not there. There is a simple reason why. By default, kubectl
fetches your HPA from the autoscaling/v1
API, instead of the
autoscaling/v2beta2
API that we want to use. See for yourself:
kubectl get hpa consumer -o json | jq .apiVersion
Kubernetes translates between versions for you, so your HPA exists in both versions. You can access specific API versions like this:
kubectl get hpa.v2beta2.autoscaling consumer -o json | jq .apiVersion
Check your HPA's status in the autoscaling/v2beta2
API:
kubectl get hpa.v2beta2.autoscaling consumer -o json | jq .status
You should see the value there. The only way for this value to be in your HPA's raw status is if the Kubernetes controller manager put it there. This seems like a bug. Here is what you need to do:
- Go the kubernetes/kubernetes repository.
- Find the line that causes the bug.
Debugging Kubernetes. This should be fun!
Hint n°1
The controller manager's code is in the pkg/controller directory.
Hint n°2
The HPA computes its status in the podautoscaler/replica_calculator.go file.
Hint n°3
In Go, diving a floating value by 0 gives a special number: math.NaN().
Casting this value to a 64-bit integer results in -9223372036854775808
, as
seen here.
Compare your work to the solution before moving on. Are there differences? Is your approach better or worse? Why?
The bug is right here. The Kubernetes controller divides resource utilisation by the number of replicas to compute an average. When the number of replicas is 0, the Kubernetes controller divides by 0! 😱
Once you are done with this exercise, you can destroy your local environment:
./scripts/teardown.sh
I hope you had fun and learned something!