-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate 100 node scalability release-blocking job to k8s-infra-prow-build #17725
Migrate 100 node scalability release-blocking job to k8s-infra-prow-build #17725
Conversation
/assign @mm4tt @jprzychodzen @mborsz - FYI as current oncall |
It seems that this change will affect access to projects, affecting possibility to debug for sig-scalability members. Could you explain what is an expected IAM policy for projects in this pool? |
@jprzychodzen what IAM policy do you need? I can create an @kubernetes.io google group for whomever scalability needs, and assign it the project viewer role for these projects, WDYT? |
Right now we have an Owner on those projects, so it's hard to provide complete list of permissions that we are requiring. Viewer seems like a good starting point for most of the use cases, however it would be great to have elevated privileges for some small subset of people - like Leadership and sig-scalability oncall. This way we will have quick way to react during emergencies (eg. broken tests). |
I've opened kubernetes/k8s.io#919 which uses owner for leads and oncall |
kubernetes/k8s.io#919 has merged, I'd like to see this in during working hours today so I can rollback if there are any issues, and leave it to soak over the weekend if not |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: BenTheElder, spiffxp The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@spiffxp: Updated the
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Thanks! It looks good, it seems that there are appropriate permissions in place. |
Keeping an eye on runs Earlier runs hover around ~50min overall So a 40min git fetch is probably part of the problem, but doesn't explain how it got over 120min
|
I do not know the details, but sig-scalability test jobs are not migrated to pod-utilities and use service account to directly access bucket and store logs. Just from a quick glance this can be related. Ofc it's not realated to long git fetch, but also should be checked. |
Last run passed, logexporter provided artifact into expected location. |
FWIW this cluster uses a different service account, but it's also allowed write access into Looks like the timeout was a blip. Will let this run over the weekend to see how it fares before switching over the release-branch variants of this job |
Checking back in: https://testgrid.k8s.io/sig-release-master-blocking#gce-cos-master-scalability-100&width=5&graph-metrics=test-duration-minutes There were some sporadic failures or timeouts over the weekend which I'm attributing to the build cluster being overloaded.
All seems calm since then. |
Demonstrate use of the scalability-project pool added to k8s-infra's boskos instance added via kubernetes/k8s.io#898