terraform: autoscale k8s deployments #536

tgeoghegan · 2021-04-02T18:47:10Z

To spare ourselves the hassle of manually resizing deployments, we
configure Kubernetes to automatically resize intake-batch and
aggregate worker pools. This requires deploying the Stackdriver custom
metrics adapter to make PubSub metrics visible to a Kubernetes
Horizontal Pod Autoscaler, then configuring an HPA for each deployment
that consults the PubSub num_undelivered_messages metric. See
terraform/README.md for more details and discussion of config
parameter choices.

This commit also modifies the integration-tester deployment so that it
emits more batches, forcing dev and staging clusters to exercise the
autoscaling feature. We also amend the alert for intake-batch task
queue size: we no longer expect that queue to periodically empty since
we have configured Kubernetes to keep it at a steady state of ~150
messages.

This is essentially a revert of the revert of #507, but corrects a couple mistakes in that PR:

Run fewer replicas of the sample generator and less often, to reduce thrash in deployment size and reduce risk of resource exhaustion in our relatively small dev and staging envs
Removes TF vars files for my devenv

Resolves #484

To spare ourselves the hassle of manually resizing deployments, we configure Kubernetes to automatically resize `intake-batch` and `aggregate` worker pools. This requires deploying the Stackdriver custom metrics adapter to make PubSub metrics visible to a Kubernetes Horizontal Pod Autoscaler, then configuring an HPA for each deployment that consults the PubSub `num_undelivered_messages` metric. See `terraform/README.md` for more details and discussion of config parameter choices. This commit also modifies the `integration-tester` deployment so that it emits more batches, forcing dev and staging clusters to exercise the autoscaling feature. We also amend the alert for `intake-batch` task queue size: we no longer expect that queue to periodically empty since we have configured Kubernetes to keep it at a steady state of ~150 messages. Resolves #484

ezekiel · 2021-08-24T16:30:51Z

Leaving this here for reference: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/

I'm somewhat more confident now that we can revise the metric target here to improve autoscaling behaviors.

tgeoghegan · 2021-10-20T01:12:52Z

Obsoleted by #1042

tgeoghegan requested a review from aaomidi April 2, 2021 18:47

aaomidi approved these changes Apr 5, 2021

View reviewed changes

tgeoghegan closed this Oct 20, 2021

tgeoghegan deleted the timg/autoscale-again branch October 12, 2022 19:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

terraform: autoscale k8s deployments #536

terraform: autoscale k8s deployments #536

tgeoghegan commented Apr 2, 2021

ezekiel commented Aug 24, 2021

tgeoghegan commented Oct 20, 2021

terraform: autoscale k8s deployments #536

terraform: autoscale k8s deployments #536

Conversation

tgeoghegan commented Apr 2, 2021

ezekiel commented Aug 24, 2021

tgeoghegan commented Oct 20, 2021