-
Notifications
You must be signed in to change notification settings - Fork 796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Identify quota changes needed for gpu jobs, create pool of gpu projects #1095
Comments
/assign |
... none of these are GPU related probably need to look for something zone/region-specific |
|
Jobs that use this project type:
|
Will use canary job in kubernetes/test-infra#18664 to verify whether quota works I'm also noticing this fun preset appears to be involved:
Not sure how it's used, but that may present other complications |
Based on the preset I'm going to assume this is the quota we should pay attention to. I'm less clear about the rest
|
I went ahead and submitted a request for Committed NVIDIA K80 GPUs, us-west1: 0->2 |
That was enough to get https://testgrid.k8s.io/sig-testing-canaries#gce-device-plugin-gpu to pass The existing gpu-project pool is 15 projects and peaks at 5 projects |
Ah! Fun fact: pull-kubernetes-e2e-gce-device-plugin-gpu is pinned to a single project k8s-jkns-pr-gce-gpus. So, 15 projects may not quite be enough. |
kubernetes/test-infra#18728 - demoted pull-kubernetes-e2e-gce-device-plugin-gpu from merge-blocking, now it's manually triggered with max_concurrency 5 |
Now filling out quota requests for 10 projects... |
... which were small enough to be automatically approved |
Opened #1125 to add the projects to k8s-infra-prow-build's boskos as a new gpu-project pool |
kubernetes/test-infra#18740 will add the gpu-project pool to https://monitoring.prow.k8s.io/d/wSrfvNxWz/boskos-resource-usage?orgId=1 |
/close |
@spiffxp: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This is like #851 but for gpu
The default quotas for an e2e project (eg: k8s-infra-e2e-gce-project) MAY be insufficient to run ci-kubernetes-e2e-gce-device-plugin-gpu
Currently this job runs in the google.com k8s-prow-builds cluster, using a project from that boskos' gpu-project pool
The text was updated successfully, but these errors were encountered: