Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User configurable rate limiting for event recording #236

Closed
hwangmoretime opened this issue Mar 10, 2023 · 17 comments
Closed

User configurable rate limiting for event recording #236

hwangmoretime opened this issue Mar 10, 2023 · 17 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@hwangmoretime
Copy link

Tell us about your request

The request is to

  • add rate limiting to PodFailedToSchedule
  • add the ability for users to configure the constants involved in rate limiting event production

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?

Earlier versions of Karpenter refer to the problem that I'm facing:

https://github.com/aws/karpenter/blob/ce235744438601bd78fc89d23cfd402f6e38cb1c/pkg/events/loadshedding.go#L35

This prevents us from hammering the API server with events that likely aren't useful...

We see that Karpenter hammering the control plane with events, which has impacted the uptime of our control plane.

Are you currently working around this issue?

no current good work arounds.

Additional Context

No response

Attachments

No response

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@jonathan-innis
Copy link
Member

Can you share how many FailedToSchedule events that you are seeing across all your pods?

@jonathan-innis jonathan-innis added the kind/bug Categorizes issue or PR as related to a bug. label Mar 10, 2023
@jonathan-innis jonathan-innis added kind/support Categorizes issue or PR as a support question. and removed kind/bug Categorizes issue or PR as related to a bug. labels Mar 15, 2023
@hwangmoretime
Copy link
Author

5k - 20k FailedToSchedule per hour during our recent incidents

@jonathan-innis
Copy link
Member

We should only be firing that many events when there is a large number of pods that can't be scheduled on any provisioner. Do you mind sharing the amount of pods that you had that couldn't schedule? Also, how did this compare to other cluster components? My assumption is that if Karpenter is reacting to the pod, there are also events that are coming from the kube-scheduler as well around not being able to schedule the pod.

@hwangmoretime
Copy link
Author

Without getting into specifics, there were a large amount of pods waiting to be scheduled.

Our events per hour during the incident ranged from 100k - 150k, so Karpenter FailedToSchedule events accounted for 5%-10% of events during the incident.

@jonathan-innis
Copy link
Member

If we were to make this user-configurable, what would you want to rate-limit it to? Would you want rate-limiting across all events or across certain types of events?

@github-actions
Copy link

github-actions bot commented Apr 6, 2023

Labeled for closure due to inactivity in 10 days.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 6, 2023
@hwangmoretime
Copy link
Author

@jonathan-innis I'm open to one or both. I think at the very least, rate-limiting across all events.

@jonathan-innis jonathan-innis added the triage/unresolved Indicates an issue that can not or will not be resolved. label Apr 6, 2023
@jonathan-innis
Copy link
Member

I think in general we think our current event recording is fine considering we see ourselves as a critical cluster component. Adding the wontfix label for now since I don't think we are planning to take this one up.

@github-actions github-actions bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 7, 2023
@github-actions
Copy link

Labeled for closure due to inactivity in 10 days.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 28, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 10, 2023
@hwangmoretime
Copy link
Author

I think in general we think our current event recording is fine considering we see ourselves as a critical cluster component. Adding the wontfix label for now since I don't think we are planning to take this one up.

FWIW, my context is ML research, where we regularly have more pods to schedule than compute available. This leads to Karpenter routinely introducing a large number of events that introduce more pressure on the control plane.

@ellistarn
Copy link
Contributor

Consider using API Priority and Fairness to limit event QPS to ensure control plane performance. Cc @rschalo

@anthropic-eli
Copy link

Hi all, I'm on the same team as @hwangmoretime and wanted to provide more details about our use case. We currently use karpenter for autoscaling/provisioning in clusters where we have a mix of CPU-only and GPU workloads. For GPU instances, we manage that capacity ourselves and don't want karpenter to autoscale it. We also encourage our users to launch workloads even though they may not immediately schedule because GPU capacity is freed up throughout the day. Thus we really only use karpenter for autoscaling CPU-only instances, but it still tries to find a provisioner for pending GPU workload pods, and this generates a metric boatload of events--which puts a lot of strain on the control plane.

Which brings us to this issue: we'd like some way to configure karpenter to reduce the number of events it emits. Rate limiting is one way to do it, but we'd also be happy if we could configure karpenter to ignore certain workloads and avoid generating those FailedToSchedule events altogether.

@engedaam
Copy link
Contributor

@anthropic-eli have you attempted to use API Priority and Fairness to limit event?

@engedaam
Copy link
Contributor

engedaam commented Jun 1, 2023

I'll be looking into this in the coming days

@engedaam engedaam reopened this Jun 1, 2023
@rschalo
Copy link
Contributor

rschalo commented Jun 2, 2023

Hi @anthropic-eli and @hwangmoretime, I'm on EKS Scalability and I'm looking into limiting the impact of events on control plane performance. Out of curiosity, is there a controller you manage that relies upon events? What would the impact be, if any, to your workloads if events were rate-limited to 1 qps?

Alternatively, I've looked into creating a FlowSchema that catches all events and sends them to a PriorityLevelConfiguration that is limited to one concurrency share and can share some of my work there.

@github-actions github-actions bot removed lifecycle/closed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 2, 2023
@engedaam
Copy link
Contributor

@anthropic-eli @hwangmoretime After doing some investigation, karpenter publishes an event if a pod will not be able to be scheduled with any of the provisioners. When a pod is in a pending state and can’t be scheduled karpenter will emit 3 events per minutes. The provisioning reconciliation for karpenter does happen every 10 seconds on pending pods. Karpenter does emit an event for every pod that can’t be scheduled, so that number of events does grow linearly. As the customer is in our intending for pods to stay in a pending state, this is an expected behavior. In contrast, kube-scheduler only emits an event for pods that can’t be scheduled approximately every 5 minutes.

@engedaam engedaam added kind/feature Categorizes issue or PR as related to a new feature. kind/bug Categorizes issue or PR as related to a bug. and removed kind/support Categorizes issue or PR as a support question. triage/unresolved Indicates an issue that can not or will not be resolved. kind/feature Categorizes issue or PR as related to a new feature. labels Jun 13, 2023
@engedaam
Copy link
Contributor

engedaam commented Jun 16, 2023

@anthropic-eli @hwangmoretime The team did a deep dive on the issue. There was a bug in the produced events. Karpenter was firing off more events than was intended. Here is the PR for the fix: #372
One change that was introduced along with the bug fix is for karpenter to mirror kube-scheduler in producing FaildToSchedule event every 5 minutes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

7 participants