Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider dropping MachinePools from ClusterClass self-hosted tests #9656

Closed
killianmuldoon opened this issue Nov 1, 2023 · 9 comments
Closed
Assignees
Labels
triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@killianmuldoon
Copy link
Contributor

killianmuldoon commented Nov 1, 2023

We should consider dropping MachinePools from the self-hosted ClusterClass tests in order to stabilize the signal ahead of the release of v1.6.0 on November 28th.

As described in #9522 the self-hosted ClusterClass tests are very flaky on the main branch of Cluster API. These tests are much more stable on release-1.4 and release-1.5.

At least some of the flakiness in the self-hosted tests is down to the addition of MachinePools to a number of test templates in #9393. There's a bug in the DockerMachinePool implementation - #9655 - that causes CAPI components in a self-hosted cluster using MachinePools to become unstable.

I don't think this new bug was introduced in the 1.6 release cycle. Before introducing MachinePools in ClusterClass there is only one e2e tests covering MachinePools - covering creation, scaling and deletion. There was no coverage for self-hosted Clusters using MachinePools.

In my view if these tests aren't stable on main by the publication of CAPI 1.6.0-rc.0, due to be published on 14th November, we should drop this test case from the release branch. We should maintain the tests on main with a view to fixing them, and if possible backporting the fixes.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Nov 1, 2023
@killianmuldoon
Copy link
Contributor Author

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 1, 2023
@CecileRobertMichon
Copy link
Contributor

I don't think this new bug was introduced in the 1.6 release cycle. Before introducing MachinePools in ClusterClass there were no e2e tests covering MachinePools - e.g. the e2e tests running on release-1.5 have no MachinePools in the artifacts AFAICT.

I don't think this is true, there is a MachinePools test in the framework capi-e2e.[It] When testing MachinePools Should successfully create a cluster with machine pool machines https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api#capi-e2e-main

@killianmuldoon
Copy link
Contributor Author

killianmuldoon commented Nov 1, 2023

I don't think this is true, there is a MachinePools test in the framework capi-e2e.[It] When testing MachinePools Should successfully create a cluster with machine pool machines testgrid.k8s.io/sig-cluster-lifecycle-cluster-api#capi-e2e-main

Thanks for catching that - bad searching on my part - updated the issue to be more accurate.

@killianmuldoon
Copy link
Contributor Author

/assign @nawazkh

/assign @adilGhaffarDev

@k8s-ci-robot
Copy link
Contributor

@killianmuldoon: GitHub didn't allow me to assign the following users: adilGhaffarDev.

Note that only kubernetes-sigs members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @nawazkh

/assign @adilGhaffarDev

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@CecileRobertMichon
Copy link
Contributor

CecileRobertMichon commented Nov 15, 2023

We shouldn't need to do this now that #8842 has merged. Let's keep monitoring the signal in the next few days and close this issue if we don't see the flake anymore.

@nawazkh
Copy link
Member

nawazkh commented Nov 21, 2023

We should be closing out this issue now since we have decided to not to drop the ClusterClass self-hosted tests and that #8842 has merged.
Moreover, the associated PR #9672 to this issue has been closed.

@nawazkh
Copy link
Member

nawazkh commented Nov 21, 2023

/close

@k8s-ci-robot
Copy link
Contributor

@nawazkh: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants