Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add jobs for CoreDNS scalability tests #8078

Closed
wants to merge 0 commits into from

Conversation

chrisohaver
Copy link
Contributor

Adds Cluster DNS tests to scalability jobs (see proposed tests in kubernetes/kubernetes#63820)
Adds CoreDNS versions of the existing scalability jobs.

This is done to satisfy related graduation criteria of the KEP kubernetes/community#1956. In that KEP, CoreDNS becomes a GA alternative to kube-dns, and therefore it should be subject to the same level of scalability testing as kube-dns.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 18, 2018
@chrisohaver
Copy link
Contributor Author

chrisohaver commented May 18, 2018

FYI, @bowei, @johnbelamaric, @fturib, @rajansandeep

@rajansandeep
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label May 18, 2018
@BenTheElder
Copy link
Member

/cc @shyamjvs @wojtek-t

@BenTheElder
Copy link
Member

/cc @MrHohn

@k8s-ci-robot k8s-ci-robot requested a review from MrHohn May 18, 2018 17:36
@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented May 18, 2018

@chrisohaver: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-test-infra-gubernator 73907d5 link /test pull-test-infra-gubernator

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@shyamjvs
Copy link
Member

@chrisohaver I'm not sure if duplicating all our scalability jobs is the right thing to do, for following reasons:

  • if we double our jobs every time we want to test a new feature, this is going to explode the no. of jobs (which isn't scalable across verticals)
  • also the jobs you added would run on 2000/5000-node clusters (which is a huge amount of quota) and they'd fail due to conflict with our regular 5k-node jobs (which are release-blocking)

If possible, I'd instead suggest replacing kube-dns with core-dns in our existing jobs temporarily for testing (that'll make this much easier and viable).
Does that SGTY?

@BenTheElder
Copy link
Member

/hold
Quotas for scalability testing (which is costly[!]) come from @shyamjvs's team, deferring there.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 18, 2018
@chrisohaver
Copy link
Contributor Author

If possible, I'd instead suggest replacing kube-dns with core-dns in our existing jobs temporarily for testing (that'll make this much easier and viable). Does that SGTY?

OK, the decision during KEP approval was to support both coredns and kube-dns for at least one release... So if we are only allowed to test one, then we'll leave it at testing kube-dns, then switch it if/when coredns becomes default.

@fturib
Copy link
Contributor

fturib commented May 18, 2018

@shyamjvs : As CoreDNS is GA for this release, I guess it would be good to have one way to verify that it does not impact somehow the performance test, let say when we suspect that something goes wrong in the DNS server.

I understand that, due to the cost, we would not like to run those tests all time. But maybe we could have a prepared job that we can run "manually" when needed ? (using a /test command)
Would that be ok ? and how we would do that - having "pre-submit" that are not run by default ?

@shyamjvs
Copy link
Member

Sure, that sounds reasonable. Or alternatively we can do some one-off testing (by switching it on in our job for 1 or 2 runs) - so that we can find any issues early on.

@shyamjvs
Copy link
Member

@fturib Just missed your comment before my reply, sorry. Yes, that sounds reasonable (see suggestion I gave above).
Another easy option is to modify this job (which currently lies around as a manual job) to use for your purpose (instead of creating new one).

@fturib
Copy link
Contributor

fturib commented May 18, 2018

@shyamjvs : I am not very familiar with all usages of the infra-test jobs.
if we go with your proposition to use the "ci-kubernetes-e2e-gce-large-manual-up", what should we do, how does it work to run that job ?

I mean:

  • do you expect we modify this job permanently to include the CoreDns deployment ? or is it only temporary when we need to make a test ? In that case, that means we need to have the temporary change merged before using that job against a kubernetes/kubternetes PR - right ?

About usage of this specific "manual-up" test:

  • What permission or level of privilege is needed to run that job ?
  • As I understand that "manual" job, it keeps the cluster running until you run the counterpart job ci-kubernetes-e2e-gce-large-manual-down . How do you get the credentials to access the cluster to evaluate its status ?
  • Is there a way to run that job against a specific commit of kubernetes/kubernetes ? In order to be able to reproduce an issue in the field ?

@fturib
Copy link
Contributor

fturib commented May 21, 2018

@shyamjvs : could you clarify the how to "modify the job" ? (see my questions in the comment above). Thanks!

@fturib
Copy link
Contributor

fturib commented May 22, 2018

In order to go ahead on this PerformanceDNS, I would propose to:

  • re-submit this PR by only extending the current jobs with "PerformanceDNS" -> that will make run this test for kube-dns (as it is the default)
  • create another PR with one job that allow to run that "scalability" test in manual mode for CoreDNS when needed. I would think to duplicate the definition of "pull-kubernetes-e2e-gce-large-performance-coredns".

@chrisohaver , @shyamjvs : are you ok with that proposition ?

@chrisohaver
Copy link
Contributor Author

Of course thats OK with me. But I think more important that it's OK with @shyamjvs et al.

@k8s-ci-robot k8s-ci-robot removed the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 24, 2018
@k8s-ci-robot k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label May 24, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: chrisohaver

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 24, 2018
@chrisohaver
Copy link
Contributor Author

FYI, I did not close this issue. I backed out the rejected changes, and pushed the branch.

I guess Github automatically closes a PR if it ever see's a commit with no deltas from master.

@shyamjvs
Copy link
Member

@chrisohaver @fturib - Would you be able to attend today's sig-scalability meeting at 9:30 AM pacific time? It might be easier to reach a consensus there than on the github thread.

@fturib
Copy link
Contributor

fturib commented May 24, 2018

@shyamjvs : sure will attend.

@fturib
Copy link
Contributor

fturib commented May 24, 2018

@shyamjvs : we missed the link you pasted into the chat zoom. Could you paste them here ? Thanks!

@shyamjvs
Copy link
Member

https://github.com/kubernetes/kubernetes/tree/master/test/integration/scheduler_perf - This is where we have our scheduler integration tests for benchmarking performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants