-
Notifications
You must be signed in to change notification settings - Fork 14
Add autoscaling for intake/aggregate tasks. #1042
Conversation
Autoscaling is based on the queue depth: if there is a queue, we'll scale up. The current policy is to stabilize over 5 minutes, then add/remove one replica per minute.
Codecov Report
@@ Coverage Diff @@
## main #1042 +/- ##
=======================================
Coverage 59.33% 59.33%
=======================================
Files 35 35
Lines 6596 6596
=======================================
Hits 3914 3914
Misses 2632 2632
Partials 50 50
Flags with carried forward coverage won't be shown. Click here to find out more. Continue to review full report at Codecov.
|
This is required to use the "kubernetes_manifest" resource.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few nits but I'm really happy with how simple the target rule is. I'll be curious to see how this contends with a production locality like us-ca-apple
.
## AWS-specific resources | ||
## | ||
resource "kubernetes_cluster_role" "external_metrics_reader" { | ||
count = var.use_aws ? 1 : 0 # This cluster role already exists in GKE deployments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oof, gotta love these little impedance mismatches between EKS and GKE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup. TBH I'm surprised the setup for extracting metrics from CloudWatch/Stackdriver is so ... manual, considering how (presumably) useful it would be to so many k8s deployments.
} | ||
selector = { app = "custom-metrics-adapter" } | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: missing newline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
port { | ||
name = "http" | ||
port = 80 | ||
target_port = 8080 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it doesn't matter because we have no public internet access to any of this, but I'm curious: why does this metrics adapter have to listen on both HTTP and HTTPS? I forget how exactly this works, but I'm guessing the service exists at all so that the k8s API server can scrape metrics from the adapter, but I would have guessed it would use either always HTTPS or always HTTP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this page gives a description of the two ports. tl;dr is: the secure port (443) is used for "normal" serving, uses TLS, has all the needed authn/authz functionality, etc; the insecure port (80) is used for turnup, bypasses a lot of security, but only binds to localhost (ie loopback) by default.
Given that last bit, I suspect that we don't actually need to expose port 80, because the metrics server won't be listening over the network on that port anyway. I tried removing it from my dev-env deployment and things seem to work OK (at least, the metric adapter logs look fine, and the horizontal-pod-autoscalers can still get their metrics). So I dropped port 80 entirely.
[and to close the loop on why it was there in the first place: I copied it blindly from Amazon's suggested YAML config]
@@ -558,6 +657,96 @@ resource "kubernetes_deployment" "aggregate" { | |||
} | |||
} | |||
} | |||
|
|||
resource "kubernetes_manifest" "aggregate_queue_depth_metric" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no GCP equivalent to this. Does the GCP metrics adapter automagically create the manifest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does: specifically it creates metrics called something like pubsub.googleapis.com|subscription|num_undelivered_messages
, and the specific pubsub subscription to examine is decided by using a selector, e.g. matching on resource.labels.subscription_id
in this case. You can see an example in the config for the autoscalers. I guess GCP's method requires a little less plumbing to set up, while AWS' method potentially saves on scraping unused metrics?
It's sort of unfortunate that the setup is so different between the two different cloud providers, but I think this difference is baked into the respective metric scrapers that we're now running. I thought about creating a module for this to abstract away the details, but right now we have exactly one use case (implemented on two different deployments), so the proper general abstraction isn't clear to me. If we somehow end up needing to wire this up in many more places, we'd probably want an abstraction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reminds me of network configuration, which mostlu Just Works in GKE, but EKS requires you to jump through a ton of hoops to construct subnets and security groups and so on. Anyway, I agree with your approach here.
* Stop exposing port 80 for AWS metrics server. (it's not required.) * Don't specify a replica count for HPA-controlled deployments. (it's not required) * Switch from scaling on the total number of available messages, to scaling on the number of available messages per replica.
BTW, I wanted to address the use of This initially got pulled in because it was specified in both Google's & Apple's recommended configuration for their custom metric adapters. The best resource I found addressing the use of
Given the relatively mild downside of leaving this flag in place, and the relatively high complexity of removing this flag, I decided leaving it in place was acceptable for an initial revision. But I do think it's worth raising for posterity & in case folks disagree. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I appreciate the thorough research here! I agree with you on the question of insecure_skip_tls_verify
. This seems safe to me since our application's deployment is on a purely private network.
Autoscaling is based on the queue depth: if there is a queue, we'll
scale up. The current policy is to stabilize over 5 minutes, then
add/remove one replica per minute.
Fixes #484.