Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🏃Add Etcd e2e tests #2785

Merged
merged 6 commits into from
Mar 27, 2020
Merged

Conversation

wfernandes
Copy link
Contributor

What this PR does / why we need it:
This PR adds an e2e test for testing etcd upgrades.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Ref #2753

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 25, 2020
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 26, 2020

// WaitForPodListCondition waits for the specified condition to be true for all
// pods returned from the list filter.
func WaitForPodListCondition(ctx context.Context, input WaitForPodListConditionInput, intervals ...interface{}) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use this to also test the CoreDNS images having the correct image tags after a coredns upgrade.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gab-satchi If you'd like you can also just do the image check on the coredns deployment.spec.containers[0].Image similar to how @sedefsavas did for kube-proxy.
I had to do a pod list because etcd is deployed as static pods.

Warren Fernandes added 5 commits March 26, 2020 09:43
Since we are creating the workload clusters within the tests, let's
delete them in the AfterEach instead of making the "mgmtClient" and
"cluster" vars global.
Also pass in prefix to cluster generator and improve test output
@wfernandes wfernandes force-pushed the etcd-e2e-tests branch 2 times, most recently from 213ffd7 to e55bad0 Compare March 26, 2020 15:56
@wfernandes wfernandes changed the title WIP: 🏃Add Etcd e2e tests 🏃Add Etcd e2e tests Mar 26, 2020
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 26, 2020
@wfernandes
Copy link
Contributor Author

wfernandes commented Mar 26, 2020

@gab-satchi @sedefsavas @vincepri If y'all got some time for a review or even a local test run.
I don't mind doing a code walkthrough since the changes "look" substantial.
The tests are quite solid and I'm confident in their stability 🙂

SetDefaultEventuallyTimeout(10 * time.Minute)
SetDefaultEventuallyPollingInterval(10 * time.Second)

BeforeEach(func() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this part is mostly same with Basic test, instead of separating them to multiple files, why not keep them as is so that beforeach and after each code can be reused?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made it separate because the original test had failure domains configured and I didn't want to couple the failure domain assertions with the other upgrade tests.
TBH I'm currently unfamiliar with failure domains and as I was trying to write these tests up I wanted to do the upgrade against a cluster that was "as vanilla as I could get".

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having failure domains should not cause issue and checking that logic is important too. I think this is fine for this PR, but we should consider combining these files for not maintaining before/after suites in two places and also AFAIK to run it in parallel, we will need to have them together.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a single before/after suites to create the management cluster.
The Before/After each is responsible for creating workload clusters (with different identifiers) and because of that isolation I believe they can run in parallel.

I also think that keeping these tests separate would make them more readable and provide a better understanding of what the expectations are.
Initially, as someone who did not have context, when I came to these tests I didn't know if the upgrades somehow relied on the failure domain configuration because the original full upgrade test was running against a cluster with failure domains configured.

That being said, if others feel that it would be better to couple these tests together I don't mind doing that in a follow up PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to separating them but mitigating the speed hit with running them in parallel.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can create a separate issue to track the work of making these tests run in parallel.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're duplicating code, I'd suggest to make the similar code in a different function and call it from multiple places

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue for running CAPD e2e in parallel #2795

Copy link
Member

@gab-satchi gab-satchi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still running it locally but it's failing to bring up 3 control plane nodes.

@@ -92,7 +92,7 @@ test-e2e: ## Run the end-to-end tests
E2E_CONF_FILE ?= e2e/local-e2e.conf
SKIP_RESOURCE_CLEANUP ?= false
run-e2e:
go test ./e2e -v -ginkgo.v -ginkgo.trace -count=1 -timeout=20m -tags=e2e -e2e.config="$(abspath $(E2E_CONF_FILE))" -skip-resource-cleanup=$(SKIP_RESOURCE_CLEANUP)
go test ./e2e -v -ginkgo.v -ginkgo.trace -count=1 -timeout=35m -tags=e2e -e2e.config="$(abspath $(E2E_CONF_FILE))" -skip-resource-cleanup=$(SKIP_RESOURCE_CLEANUP)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sidenote: we keep increasing the timeout value 😬

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heh I noticed the same. But we're also adding more assertions that wait on Pod's running and their attributes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I know. 😐sigh.
But that's why I suggested opening an issue to run the tests in parallel! 🙂

I can't think of any other way to reduce running time as we add more tests other than compounding all the behaviors under a single test. But at that point, we are writing an e2e test that says "test everything". I don't know the parameters of our CI but maybe we can tweak docker params to make it run faster if we start to see failures due to timeouts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue for running CAPD e2e in parallel #2795

@gab-satchi
Copy link
Member

I've got the CoreDNS upgrade part ready. I'll wait for this to get merged in

@vincepri
Copy link
Member

Are we good to merge this in?

@vincepri
Copy link
Member

/milestone v0.3.3

@k8s-ci-robot k8s-ci-robot added this to the v0.3.3 milestone Mar 27, 2020
@sedefsavas
Copy link

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 27, 2020
@vincepri
Copy link
Member

/approve

@vincepri
Copy link
Member

/milestone v0.3.3

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vincepri, wfernandes

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 27, 2020
@k8s-ci-robot k8s-ci-robot merged commit 2cc8570 into kubernetes-sigs:master Mar 27, 2020
@wfernandes
Copy link
Contributor Author

Yikes. I should've squashed those commits. My bad.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants