Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 test/e2e check for machines being ready after provisioning on Runtime SDK test #8625

Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions test/e2e/cluster_upgrade_runtimesdk.go
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,30 @@ func clusterUpgradeWithRuntimeSDKSpec(ctx context.Context, inputGetter func() cl
clusterRef,
input.E2EConfig.GetIntervals(specName, "wait-cluster"))
},
PostMachinesProvisioned: func() {
Eventually(func() error {
// Before running the BeforeClusterUpgrade hook, the topology controller
// checks if the ControlPlane `IsScaling()` and for MachineDeployments if
// `IsAnyRollingOut()`.
// This PostMachineProvisioned function ensures that the clusters machines
// are healthy by checking the MachineNodeHealthyCondition, so the upgrade
// below does not get delayed or runs into timeouts before even reaching
// the BeforeClusterUpgrade hook.
machineList := &clusterv1.MachineList{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the goal here to check that Nodes of all Machines are ready (CP + MD)?

If it's just about MDs wouldn't it be better to check the MD status replica fields?

Copy link
Member Author

@chrischdi chrischdi May 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's about all nodes of the cluster.

Edit: more correct:

  • KCP should not be IsScaling

cpScaling, err := contract.ControlPlane().IsScaling(s.Current.ControlPlane.Object)

  • No MD should be in RollingOut

if s.Current.MachineDeployments.IsAnyRollingOut() {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So basically we use node readiness as a proxy for the things the topology controller is actually checking.

Let's please add a godoc comment that we are checking for node healthiness to ensure the cluster is ~ stable so that the upgrade we run below is not delayed by too much through the initial cluster creation.

I think the check is not really the same as MD IsAnyRollingOut as that one is also checking for available replicas which can have some delay depending on MinReadySeconds.

But overall I guess it's good enough, let's just make sure we document clearly why we do it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment and squashed. PTAL

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, thank you!

if err := input.BootstrapClusterProxy.GetClient().List(ctx, machineList, client.InNamespace(namespace.Name)); err != nil {
return errors.Wrap(err, "list machines")
}

for i := range machineList.Items {
machine := &machineList.Items[i]
if !conditions.IsTrue(machine, clusterv1.MachineNodeHealthyCondition) {
chrischdi marked this conversation as resolved.
Show resolved Hide resolved
return errors.Errorf("machine %q does not have %q condition set to true", machine.GetName(), clusterv1.MachineNodeHealthyCondition)
}
}

return nil
}, 5*time.Minute, 15*time.Second).Should(Succeed(), "Waiting for rollouts to finish")
},
WaitForClusterIntervals: input.E2EConfig.GetIntervals(specName, "wait-cluster"),
WaitForControlPlaneIntervals: input.E2EConfig.GetIntervals(specName, "wait-control-plane"),
WaitForMachineDeployments: input.E2EConfig.GetIntervals(specName, "wait-worker-nodes"),
Expand Down