Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wg-k8s-infra: canary prowjobs for sig-scalability #22430

Merged
merged 2 commits into from
Jun 18, 2021

Conversation

ameukam
Copy link
Member

@ameukam ameukam commented Jun 5, 2021

Add canary jobs running on k8s-infra-prow-build for some sig-scalability periodics prowjobs.
Switch from --gcp-project=foo to --gcp-project=k8s-infra-e2e-scale-5k-project

Signed-off-by: Arnaud Meukam <ameukam@gmail.com>
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. area/config Issues or PRs related to code in /config area/jobs area/testgrid sig/testing Categorizes an issue or PR as relevant to SIG Testing. wg/k8s-infra labels Jun 5, 2021
@k8s-ci-robot k8s-ci-robot requested review from dims and spiffxp June 5, 2021 01:30
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 5, 2021
@dims
Copy link
Member

dims commented Jun 5, 2021

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ameukam, dims

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 5, 2021
@ameukam
Copy link
Member Author

ameukam commented Jun 5, 2021

/assign @BenTheElder @spiffxp

@ameukam
Copy link
Member Author

ameukam commented Jun 7, 2021

xref: kubernetes/k8s.io#1469.
Failures are expected so there is no need of a active babysitting.

Copy link
Member

@spiffxp spiffxp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/hold
If you expect these to fail and just want to see how, then remove the hold. But I'm not ok with the 5k node jobs as-is.

I would much rather see provisioning of a special 5k node project for the 5k node jobs. Like modify infra/gcp/prow/ensure-e2e-projects.sh to add a k8s-infra-e2e-scale-5k-project to E2E_MANUAL_PROJECTS and then pin the jobs to that.

- --gcp-master-image=gci
- --gcp-node-image=gci
- --gcp-node-size=e2-small
- --gcp-nodes=5000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need a quota request for this. I'd rather avoid giving all scalability projects this kind of quota, we should pin to a specific project for this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@spiffxp I'll make the quota requests in k8s-infra-e2e-scale-5k-project. See kubernetes/k8s.io#2225

- --env=KUBE_DNS_MEMORY_LIMIT=300Mi
- --extract=ci/latest-fast
- --extract-ci-bucket=k8s-release-dev
- --gcp-nodes=5000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same. This should use the same project as correctness. What's the node type used here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a lot at a successful job of ci-kubernetes-e2e-gce-scale-performance: https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-scale-performance/1401585162642264064/build-log.txt.
The node type appears to be e2-standard-32.

- --extract=ci/latest
- --gcp-node-image=gci
- --gcp-node-size=e2-standard-8
- --gcp-nodes=84
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly not sure if this will need quota increase or not

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to increase the quota for this job. the smallest quota in us-east1 is 1250 (for all the projects with type scalability-project)

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Jun 7, 2021
@jkaniuk
Copy link
Contributor

jkaniuk commented Jun 8, 2021

There are various GCP quotas apart from CPU one that need to be lifted to run 5k node jobs (to be copied from legacy project).

Are you tracking this anywhere?

@ameukam
Copy link
Member Author

ameukam commented Jun 9, 2021

There are various GCP quotas apart from CPU one that need to be lifted to run 5k node jobs (to be copied from legacy project).

Are you tracking this anywhere?

@jkaniuk For the moment nothing is tracked. Do you have an existing document with the list of quotas that need to be raised ? I can create a new document if not.

@jkaniuk
Copy link
Contributor

jkaniuk commented Jun 10, 2021

@jkaniuk For the moment nothing is tracked. Do you have an existing document with the list of quotas that need to be raised ? I can create a new document if not.

Please do, I do not believe we have anything yet.

@wojtek-t, @jprzychodzen

ameukam added a commit to ameukam/k8s.io that referenced this pull request Jun 10, 2021
Add e2e projects for sig-scalability prowjobs that need 5K nodes.
Initial suggestion : kubernetes/test-infra#22430 (review)

Signed-off-by: Arnaud Meukam <ameukam@gmail.com>
ameukam added a commit to ameukam/k8s.io that referenced this pull request Jun 11, 2021
Add e2e projects for sig-scalability prowjobs that need 5K nodes.
Initial suggestion : kubernetes/test-infra#22430 (review)

Signed-off-by: Arnaud Meukam <ameukam@gmail.com>
@ameukam
Copy link
Member Author

ameukam commented Jun 11, 2021

@jkaniuk For the moment nothing is tracked. Do you have an existing document with the list of quotas that need to be raised ? I can create a new document if not.

Please do, I do not believe we have anything yet.

@wojtek-t, @jprzychodzen

@jkaniuk I created https://docs.google.com/spreadsheets/d/1v9ynsUx3pcMJKVaHViuVqxVtRP1OmGIadvw2uro7D7Y/edit#gid=0. Feel free to adjust at your own convenience.

ameukam added a commit to ameukam/k8s.io that referenced this pull request Jun 14, 2021
Add e2e project for sig-scalability prowjobs that need 5K nodes.
Initial suggestion : kubernetes/test-infra#22430 (review)

Signed-off-by: Arnaud Meukam <ameukam@gmail.com>
ameukam added a commit to ameukam/k8s.io that referenced this pull request Jun 15, 2021
Add e2e project for sig-scalability prowjobs that need 5K nodes.
Initial suggestion : kubernetes/test-infra#22430 (review)

Signed-off-by: Arnaud Meukam <ameukam@gmail.com>
Ref: kubernetes/k8s.io#1469

Add canaries running on `k8s-infra-prow-build` for some sig-scalability periodics prowjobs.
Switch from `--gcp-project=foo` to
`--gcp-project=k8s-infra-e2e-scale-5k-project`

Signed-off-by: Arnaud Meukam <ameukam@gmail.com>
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 16, 2021
@dims
Copy link
Member

dims commented Jun 17, 2021

/lgtm
/hold

please feel free to remove hold if you are ready.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 17, 2021
@ameukam
Copy link
Member Author

ameukam commented Jun 18, 2021

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 18, 2021
@k8s-ci-robot k8s-ci-robot merged commit 44b0738 into kubernetes:master Jun 18, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.22 milestone Jun 18, 2021
@k8s-ci-robot
Copy link
Contributor

@ameukam: Updated the job-config configmap in namespace default at cluster test-infra-trusted using the following files:

  • key wg-k8s-infra-canaries.yaml using file config/jobs/kubernetes/wg-k8s-infra/wg-k8s-infra-canaries.yaml

In response to this:

Add canary jobs running on k8s-infra-prow-build for some sig-scalability periodics prowjobs.
Switch from --gcp-project=foo to --gcp-project=k8s-infra-e2e-scale-5k-project

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/config Issues or PRs related to code in /config area/jobs area/testgrid cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants