-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🏃Add Etcd e2e tests #2785
🏃Add Etcd e2e tests #2785
Conversation
e04a288
to
91c02bb
Compare
b6449c9
to
016483e
Compare
|
||
// WaitForPodListCondition waits for the specified condition to be true for all | ||
// pods returned from the list filter. | ||
func WaitForPodListCondition(ctx context.Context, input WaitForPodListConditionInput, intervals ...interface{}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use this to also test the CoreDNS images having the correct image tags after a coredns upgrade.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gab-satchi If you'd like you can also just do the image check on the coredns deployment.spec.containers[0].Image
similar to how @sedefsavas did for kube-proxy.
I had to do a pod list because etcd is deployed as static pods.
Since we are creating the workload clusters within the tests, let's delete them in the AfterEach instead of making the "mgmtClient" and "cluster" vars global.
Also pass in prefix to cluster generator and improve test output
213ffd7
to
e55bad0
Compare
e55bad0
to
d205b68
Compare
@gab-satchi @sedefsavas @vincepri If y'all got some time for a review or even a local test run. |
SetDefaultEventuallyTimeout(10 * time.Minute) | ||
SetDefaultEventuallyPollingInterval(10 * time.Second) | ||
|
||
BeforeEach(func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this part is mostly same with Basic test, instead of separating them to multiple files, why not keep them as is so that beforeach and after each code can be reused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made it separate because the original test had failure domains configured and I didn't want to couple the failure domain assertions with the other upgrade tests.
TBH I'm currently unfamiliar with failure domains and as I was trying to write these tests up I wanted to do the upgrade against a cluster that was "as vanilla as I could get".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having failure domains should not cause issue and checking that logic is important too. I think this is fine for this PR, but we should consider combining these files for not maintaining before/after suites in two places and also AFAIK to run it in parallel, we will need to have them together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a single before/after suites to create the management cluster.
The Before/After each is responsible for creating workload clusters (with different identifiers) and because of that isolation I believe they can run in parallel.
I also think that keeping these tests separate would make them more readable and provide a better understanding of what the expectations are.
Initially, as someone who did not have context, when I came to these tests I didn't know if the upgrades somehow relied on the failure domain configuration because the original full upgrade test was running against a cluster with failure domains configured.
That being said, if others feel that it would be better to couple these tests together I don't mind doing that in a follow up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to separating them but mitigating the speed hit with running them in parallel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can create a separate issue to track the work of making these tests run in parallel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're duplicating code, I'd suggest to make the similar code in a different function and call it from multiple places
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue for running CAPD e2e in parallel #2795
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still running it locally but it's failing to bring up 3 control plane nodes.
@@ -92,7 +92,7 @@ test-e2e: ## Run the end-to-end tests | |||
E2E_CONF_FILE ?= e2e/local-e2e.conf | |||
SKIP_RESOURCE_CLEANUP ?= false | |||
run-e2e: | |||
go test ./e2e -v -ginkgo.v -ginkgo.trace -count=1 -timeout=20m -tags=e2e -e2e.config="$(abspath $(E2E_CONF_FILE))" -skip-resource-cleanup=$(SKIP_RESOURCE_CLEANUP) | |||
go test ./e2e -v -ginkgo.v -ginkgo.trace -count=1 -timeout=35m -tags=e2e -e2e.config="$(abspath $(E2E_CONF_FILE))" -skip-resource-cleanup=$(SKIP_RESOURCE_CLEANUP) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sidenote: we keep increasing the timeout value 😬
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
heh I noticed the same. But we're also adding more assertions that wait on Pod's running and their attributes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I know. 😐sigh.
But that's why I suggested opening an issue to run the tests in parallel! 🙂
I can't think of any other way to reduce running time as we add more tests other than compounding all the behaviors under a single test. But at that point, we are writing an e2e test that says "test everything". I don't know the parameters of our CI but maybe we can tweak docker params to make it run faster if we start to see failures due to timeouts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue for running CAPD e2e in parallel #2795
I've got the CoreDNS upgrade part ready. I'll wait for this to get merged in |
Are we good to merge this in? |
/milestone v0.3.3 |
/lgtm |
/approve |
/milestone v0.3.3 |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: vincepri, wfernandes The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Yikes. I should've squashed those commits. My bad. |
What this PR does / why we need it:
This PR adds an e2e test for testing etcd upgrades.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Ref #2753