Skip to content
This repository has been archived by the owner on Jul 21, 2023. It is now read-only.

terraform: autoscale k8s deployments #536

Closed
wants to merge 1 commit into from
Closed

Conversation

tgeoghegan
Copy link
Contributor

To spare ourselves the hassle of manually resizing deployments, we
configure Kubernetes to automatically resize intake-batch and
aggregate worker pools. This requires deploying the Stackdriver custom
metrics adapter to make PubSub metrics visible to a Kubernetes
Horizontal Pod Autoscaler, then configuring an HPA for each deployment
that consults the PubSub num_undelivered_messages metric. See
terraform/README.md for more details and discussion of config
parameter choices.

This commit also modifies the integration-tester deployment so that it
emits more batches, forcing dev and staging clusters to exercise the
autoscaling feature. We also amend the alert for intake-batch task
queue size: we no longer expect that queue to periodically empty since
we have configured Kubernetes to keep it at a steady state of ~150
messages.

This is essentially a revert of the revert of #507, but corrects a couple mistakes in that PR:

  • Run fewer replicas of the sample generator and less often, to reduce thrash in deployment size and reduce risk of resource exhaustion in our relatively small dev and staging envs
  • Removes TF vars files for my devenv

Resolves #484

To spare ourselves the hassle of manually resizing deployments, we
configure Kubernetes to automatically resize `intake-batch` and
`aggregate` worker pools. This requires deploying the Stackdriver custom
metrics adapter to make PubSub metrics visible to a Kubernetes
Horizontal Pod Autoscaler, then configuring an HPA for each deployment
that consults the PubSub `num_undelivered_messages` metric. See
`terraform/README.md` for more details and discussion of config
parameter choices.

This commit also modifies the `integration-tester` deployment so that it
emits more batches, forcing dev and staging clusters to exercise the
autoscaling feature. We also amend the alert for `intake-batch` task
queue size: we no longer expect that queue to periodically empty since
we have configured Kubernetes to keep it at a steady state of ~150
messages.

Resolves #484
@tgeoghegan tgeoghegan requested a review from aaomidi April 2, 2021 18:47
@ezekiel
Copy link
Contributor

ezekiel commented Aug 24, 2021

Leaving this here for reference: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/

I'm somewhat more confident now that we can revise the metric target here to improve autoscaling behaviors.

@tgeoghegan
Copy link
Contributor Author

Obsoleted by #1042

@tgeoghegan tgeoghegan closed this Oct 20, 2021
@tgeoghegan tgeoghegan deleted the timg/autoscale-again branch October 12, 2022 19:23
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Autoscale deployments based on PubSub queue depth
3 participants