-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
User configurable rate limiting for event recording #236
Comments
Can you share how many |
5k - 20k |
We should only be firing that many events when there is a large number of pods that can't be scheduled on any provisioner. Do you mind sharing the amount of pods that you had that couldn't schedule? Also, how did this compare to other cluster components? My assumption is that if Karpenter is reacting to the pod, there are also events that are coming from the kube-scheduler as well around not being able to schedule the pod. |
Without getting into specifics, there were a large amount of pods waiting to be scheduled. Our events per hour during the incident ranged from 100k - 150k, so Karpenter FailedToSchedule events accounted for 5%-10% of events during the incident. |
If we were to make this user-configurable, what would you want to rate-limit it to? Would you want rate-limiting across all events or across certain types of events? |
Labeled for closure due to inactivity in 10 days. |
@jonathan-innis I'm open to one or both. I think at the very least, rate-limiting across all events. |
I think in general we think our current event recording is fine considering we see ourselves as a critical cluster component. Adding the |
Labeled for closure due to inactivity in 10 days. |
FWIW, my context is ML research, where we regularly have more pods to schedule than compute available. This leads to Karpenter routinely introducing a large number of events that introduce more pressure on the control plane. |
Consider using API Priority and Fairness to limit event QPS to ensure control plane performance. Cc @rschalo |
Hi all, I'm on the same team as @hwangmoretime and wanted to provide more details about our use case. We currently use karpenter for autoscaling/provisioning in clusters where we have a mix of CPU-only and GPU workloads. For GPU instances, we manage that capacity ourselves and don't want karpenter to autoscale it. We also encourage our users to launch workloads even though they may not immediately schedule because GPU capacity is freed up throughout the day. Thus we really only use karpenter for autoscaling CPU-only instances, but it still tries to find a provisioner for pending GPU workload pods, and this generates a metric boatload of events--which puts a lot of strain on the control plane. Which brings us to this issue: we'd like some way to configure karpenter to reduce the number of events it emits. Rate limiting is one way to do it, but we'd also be happy if we could configure karpenter to ignore certain workloads and avoid generating those FailedToSchedule events altogether. |
@anthropic-eli have you attempted to use API Priority and Fairness to limit event? |
I'll be looking into this in the coming days |
Hi @anthropic-eli and @hwangmoretime, I'm on EKS Scalability and I'm looking into limiting the impact of events on control plane performance. Out of curiosity, is there a controller you manage that relies upon events? What would the impact be, if any, to your workloads if events were rate-limited to 1 qps? Alternatively, I've looked into creating a FlowSchema that catches all events and sends them to a PriorityLevelConfiguration that is limited to one concurrency share and can share some of my work there. |
@anthropic-eli @hwangmoretime After doing some investigation, karpenter publishes an event if a pod will not be able to be scheduled with any of the provisioners. When a pod is in a pending state and can’t be scheduled karpenter will emit 3 events per minutes. The provisioning reconciliation for karpenter does happen every 10 seconds on pending pods. Karpenter does emit an event for every pod that can’t be scheduled, so that number of events does grow linearly. As the customer is in our intending for pods to stay in a pending state, this is an expected behavior. In contrast, |
@anthropic-eli @hwangmoretime The team did a deep dive on the issue. There was a bug in the produced events. Karpenter was firing off more events than was intended. Here is the PR for the fix: #372 |
Tell us about your request
PodFailedToSchedule
The request is to
Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
Earlier versions of Karpenter refer to the problem that I'm facing:
https://github.com/aws/karpenter/blob/ce235744438601bd78fc89d23cfd402f6e38cb1c/pkg/events/loadshedding.go#L35
We see that Karpenter hammering the control plane with events, which has impacted the uptime of our control plane.
Are you currently working around this issue?
no current good work arounds.
Additional Context
No response
Attachments
No response
Community Note
The text was updated successfully, but these errors were encountered: