Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate 100 node scalability release-blocking job to k8s-infra-prow-build #17725

Merged

Conversation

spiffxp
Copy link
Member

@spiffxp spiffxp commented May 27, 2020

Demonstrate use of the scalability-project pool added to k8s-infra's boskos instance added via kubernetes/k8s.io#898

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. area/config Issues or PRs related to code in /config area/jobs sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels May 27, 2020
@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 27, 2020
@wojtek-t
Copy link
Member

/assign @mm4tt @jprzychodzen

@mborsz - FYI as current oncall

@jprzychodzen
Copy link
Contributor

It seems that this change will affect access to projects, affecting possibility to debug for sig-scalability members.

Could you explain what is an expected IAM policy for projects in this pool?

@spiffxp
Copy link
Member Author

spiffxp commented May 27, 2020

@jprzychodzen what IAM policy do you need? I can create an @kubernetes.io google group for whomever scalability needs, and assign it the project viewer role for these projects, WDYT?

@jprzychodzen
Copy link
Contributor

Right now we have an Owner on those projects, so it's hard to provide complete list of permissions that we are requiring.

Viewer seems like a good starting point for most of the use cases, however it would be great to have elevated privileges for some small subset of people - like Leadership and sig-scalability oncall. This way we will have quick way to react during emergencies (eg. broken tests).

@spiffxp
Copy link
Member Author

spiffxp commented May 28, 2020

I've opened kubernetes/k8s.io#919 which uses owner for leads and oncall

@spiffxp
Copy link
Member Author

spiffxp commented May 29, 2020

kubernetes/k8s.io#919 has merged, I'd like to see this in during working hours today so I can rollback if there are any issues, and leave it to soak over the weekend if not

@BenTheElder
Copy link
Member

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 29, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: BenTheElder, spiffxp

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit f498a4a into kubernetes:master May 29, 2020
@k8s-ci-robot k8s-ci-robot added this to the v1.19 milestone May 29, 2020
@k8s-ci-robot
Copy link
Contributor

@spiffxp: Updated the job-config configmap in namespace default at cluster default using the following files:

  • key sig-scalability-release-blocking-jobs.yaml using file config/jobs/kubernetes/sig-scalability/sig-scalability-release-blocking-jobs.yaml

In response to this:

Demonstrate use of the scalability-project pool added to k8s-infra's boskos instance added via kubernetes/k8s.io#898

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jprzychodzen
Copy link
Contributor

Thanks! It looks good, it seems that there are appropriate permissions in place.

@spiffxp spiffxp deleted the migrate-release-blocking-scale-job branch May 29, 2020 21:38
@spiffxp
Copy link
Member Author

spiffxp commented May 29, 2020

Keeping an eye on runs

https://prow.k8s.io/view/gcs/kubernetes-jenkins/logs/ci-kubernetes-e2e-gci-gce-scalability/1266440343058911233 - timed out

Earlier runs hover around ~50min overall

So a 40min git fetch is probably part of the problem, but doesn't explain how it got over 120min

I0529 18:49:00.764] Call:  git fetch --quiet --tags https://github.com/kubernetes/kubernetes master
I0529 19:28:40.363] process 66 exited with code 0 after 39.7m

@jprzychodzen
Copy link
Contributor

jprzychodzen commented May 29, 2020

I do not know the details, but sig-scalability test jobs are not migrated to pod-utilities and use service account to directly access bucket and store logs. Just from a quick glance this can be related.

Ofc it's not realated to long git fetch, but also should be checked.

@jprzychodzen
Copy link
Contributor

Last run passed, logexporter provided artifact into expected location.

@spiffxp
Copy link
Member Author

spiffxp commented May 29, 2020

FWIW this cluster uses a different service account, but it's also allowed write access into gs://kubernetes-jenkins

Looks like the timeout was a blip. Will let this run over the weekend to see how it fares before switching over the release-branch variants of this job

@spiffxp
Copy link
Member Author

spiffxp commented Jun 1, 2020

Checking back in: https://testgrid.k8s.io/sig-release-master-blocking#gce-cos-master-scalability-100&width=5&graph-metrics=test-duration-minutes

There were some sporadic failures or timeouts over the weekend which I'm attributing to the build cluster being overloaded.

All seems calm since then.

@spiffxp
Copy link
Member Author

spiffxp commented Jun 9, 2020

ref: kubernetes/k8s.io#841

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/config Issues or PRs related to code in /config area/jobs cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants