Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ResourceQuota plugin should account for resource limits #5130

Open
mszacillo opened this issue Jul 3, 2024 · 0 comments
Open

ResourceQuota plugin should account for resource limits #5130

mszacillo opened this issue Jul 3, 2024 · 0 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@mszacillo
Copy link
Contributor

mszacillo commented Jul 3, 2024

What would you like to be added:

It would be nice if ResourceQuota plugin could account for resourcequota limits when estimating maxReplica count.

Why is this needed:

We ignore the limits during estimation because currently pb.ReplicaRequirements only supports setting resource requests. This can cause problems if there are burstable resources on the destination namespace. Recently I hit an edge case where the resourcequota on my destination namespace had enough requests, but not enough limits to schedule an additional pod. This caused the FlinkDeployment to fail:

mszacillo@control-plane-1:~$ k get resourcequota -n workspace
NAME                        AGE   REQUEST                                                   LIMIT
pods-free                   57d   requests.cpu: 28800m/30, requests.memory: 86354Mi/180Gi   limits.cpu: 29100m/30, limits.memory: 86682Mi/180Gi

Scheduling an additional TaskManager pod requires 1 CPU and 4096m - this can fit within the ResourceQuota's requests, but this resource does not fit within the ResourceQuota's limits, causing a scheduling error:

Could not create pod mszacillo-karmada-taskmanager-2-4, exception: java.util.concurrent.CompletionException: org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://10.96.0.1/api/v1/namespaces/workspace/pods. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. pods "mszacillo-karmada-taskmanager-2-4" is forbidden: exceeded quota: pods-free, requested: limits.cpu=1, used: limits.cpu=29100m, limited: limits.cpu=30.

Potential Options:

We could extend the pb.ReplicaRequirements API to include resource limits. We can then also check limits during the ResourceQuota plugin estimation. Another option would be to check the resource's requests against the available limits on the relevant ResourceQuota when calculating maxReplicas.

@mszacillo mszacillo added the kind/feature Categorizes issue or PR as related to a new feature. label Jul 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
Status: No status
Development

No branches or pull requests

1 participant