Reflect the VM statuses to the KubeVirtCluster status #272

nunnatsa · 2023-12-14T12:13:49Z

What this PR does / why we need it

Be more verbose when VM is not scheduled. Add meaningful reasons and messages to the KubeVirtMachine conditions, to be reflected in the Cluster resources.

Also, make the KubeVirtMachine.status.ready field to be printed also if it false, and adding it as a new printed column named "Rwady".

todo: complete unit tests

nunnatsa · 2023-12-14T12:14:19Z

/ok-to-test

davidvossel · 2023-12-14T15:00:53Z

controllers/kubevirtmachine_controller.go

@@ -275,6 +276,7 @@ func (r *KubevirtMachineReconciler) reconcileNormal(ctx *context.MachineContext)
 		// Mark VMProvisionedCondition to indicate that the VM has successfully started
 		conditions.MarkTrue(ctx.KubevirtMachine, infrav1.VMProvisionedCondition)
 	} else {
+		conditions.MarkFalse(ctx.KubevirtMachine, infrav1.VMProvisionedCondition, "notScheduled", clusterv1.ConditionSeverityInfo, "unknown reason")


I think what we need here is a series of checks into why the VM isn't ready.

So, think about a function here to set the Ready condition to false. something like the logic below. The idea here is we start by looking for detailed reasons the VM might not be ready, and then slowly work down to more generic reasons the VM isn't ready.

func mapVMStatusToMachineCondition(...) { if <vm has condition reason: Unschedulable status: "False" type: PodScheduled> { conditions.MarkFalse(ctx.KubevirtMachine, infrav1.VMProvisionedCondition, "notScheduled", clusterv1.ConditionSeverityInfo, <reason given from vm condition> ) } else { conditions.MarkFalse(ctx.KubevirtMachine, infrav1.VMProvisionedCondition, "vmNotReady", clusterv1.ConditionSeverityInfo, <reason vm condition is set to not ready> ) } }

In the future that function will keep getting expanded. The logic will look something like this from a high level

if vm is not schedulable { } else if dv storage profile not found { } else if dv pending storage provisioning { } else if dv failure to import image { } else { < generic VMProvisioned = false reason > }

coveralls · 2023-12-17T16:02:00Z

Pull Request Test Coverage Report for Build 7285359056

59 of 65 (90.77%) changed or added relevant lines in 2 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage increased (+1.2%) to 63.356%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
controllers/kubevirtmachine_controller.go	3	4	75.0%
pkg/kubevirt/machine.go	56	61	91.8%

Totals
Change from base Build 7226166058:	1.2%
Covered Lines:	1008
Relevant Lines:	1591

💛 - Coveralls

nunnatsa · 2023-12-17T16:04:43Z

/hold

need to complete unit tests

nunnatsa · 2023-12-18T12:39:27Z

/unhold

davidvossel

One high level comment is that I want to make sure the capi Machine object get's the condition messages from the KubeVirtMachine object.

From what I understand, right now the KubeVirtMachine's ready condition will show something like the output below when a DV can't be imported because of an insecure registry.

{
  "conditions": [
    {
      "lastTransitionTime": "2023-12-17T15:37:31Z",
      "message": "0 of 2 completed",
      "reason": "DVNotReady",
      "severity": "Info",
      "status": "False",
      "type": "Ready"
    },
    {
      "lastTransitionTime": "2023-12-17T15:37:31Z",
      "message": "DataVolume capi-quickstart-md-0-rjxxm-capi-quickstart-boot-volume is still provisioning",
      "reason": "DVNotReady",
      "severity": "Info",
      "status": "False",
      "type": "VMProvisioned"
    }
  ]
}

What does the corresponding Machine object and MachineDeployment have in it's status for these KubeVIrtMachine conditions?

The goal is for the KubeVirtMachine object to present these conditions in a way that the capi Machine apis bubble up information regarding to why the VM isn't progresssing as expected

config/default/manager_image_patch.yaml

davidvossel · 2023-12-18T18:35:39Z

controllers/kubevirtmachine_controller.go

-	clusterToKubevirtMachines, err := util.ClusterToObjectsMapper(mgr.GetClient(), &infrav1.KubevirtMachineList{}, mgr.GetScheme())
+	clusterToKubevirtMachines, err := util.ClusterToTypedObjectsMapper(mgr.GetClient(), &infrav1.KubevirtMachineList{}, mgr.GetScheme())


i'm curious what this change does

the ClusterToObjectsMapper function is deprecated. In its documentation, it says to use ClusterToTypedObjectsMapper instead:

// Deprecated: This function is deprecated and will be removed in a future release, use ClusterToTypedObjectsMapper instead. // The problem with this function is that it uses UnstructuredList to retrieve objects, with the default client configuration // this will lead to uncached List calls, which is a major performance issue.

davidvossel · 2023-12-18T19:29:49Z

pkg/kubevirt/machine.go

+	case cdiv1.Failed:
+		return "DVFailed", fmt.Sprintf("DataVolume %s failed", m.dataVolume.Name), true
+	default:
+		msg := "DataVolume is not ready"


maybe add the phase to this default message?

msg: fmt.Sprintf("DataVolume %s is not ready. Phase = %s", m.dataVolume.Name, m.dataVolume.Status.Phase)

pkg/kubevirt/machine.go

nunnatsa · 2023-12-19T06:51:22Z

One high level comment is that I want to make sure the capi Machine object get's the condition messages from the KubeVirtMachine object.

For this KubeVirtMachine status:

{
  "conditions": [
    {
      "lastTransitionTime": "2023-12-19T07:26:28Z",
      "message": "0 of 2 completed",
      "reason": "DVNotReady",
      "severity": "Info",
      "status": "False",
      "type": "Ready"
    },
    {
      "lastTransitionTime": "2023-12-19T07:28:28Z",
      "message": "DataVolume capi-quickstart-md-0-d7xjr-capi-quickstart-boot-volume import is not running: DataVolume too small to contain image",
      "reason": "DVNotReady",
      "severity": "Info",
      "status": "False",
      "type": "VMProvisioned"
    }
  ],
  "ready": false
}

We get this Machine (capi) status (only the Reason field get here, without the message):

{
  "bootstrapReady": true,
  "conditions": [
    {
      "lastTransitionTime": "2023-12-19T07:26:28Z",
      "message": "1 of 2 completed",
      "reason": "DVNotReady",
      "severity": "Info",
      "status": "False",
      "type": "Ready"
    },
    {
      "lastTransitionTime": "2023-12-19T07:26:18Z",
      "status": "True",
      "type": "BootstrapReady"
    },
    {
      "lastTransitionTime": "2023-12-19T07:26:28Z",
      "message": "0 of 2 completed",
      "reason": "DVNotReady",
      "severity": "Info",
      "status": "False",
      "type": "InfrastructureReady"
    },
    {
      "lastTransitionTime": "2023-12-19T07:24:36Z",
      "reason": "WaitingForNodeRef",
      "severity": "Info",
      "status": "False",
      "type": "NodeHealthy"
    }
  ],
  "lastUpdated": "2023-12-19T07:26:18Z",
  "observedGeneration": 2,
  "phase": "Provisioning"
}

What does the corresponding Machine object and MachineDeployment have in it's status for these KubeVIrtMachine conditions?

And the MachineDeployment status is (no reason nor message is reflected here):

{
  "conditions": [
    {
      "lastTransitionTime": "2023-12-19T07:24:36Z",
      "message": "Minimum availability requires 1 replicas, current 0 available",
      "reason": "WaitingForAvailableMachines",
      "severity": "Warning",
      "status": "False",
      "type": "Ready"
    },
    {
      "lastTransitionTime": "2023-12-19T07:24:36Z",
      "message": "Minimum availability requires 1 replicas, current 0 available",
      "reason": "WaitingForAvailableMachines",
      "severity": "Warning",
      "status": "False",
      "type": "Available"
    }
  ],
  "observedGeneration": 1,
  "phase": "ScalingUp",
  "replicas": 1,
  "selector": "cluster.x-k8s.io/cluster-name=capi-quickstart,cluster.x-k8s.io/deployment-name=capi-quickstart-md-0",
  "unavailableReplicas": 1,
  "updatedReplicas": 1
}

However, the capi Cluster shows nothing:

{
  "conditions": [
    {
      "lastTransitionTime": "2023-12-19T07:26:11Z",
      "status": "True",
      "type": "Ready"
    },
    {
      "lastTransitionTime": "2023-12-19T07:26:11Z",
      "status": "True",
      "type": "ControlPlaneInitialized"
    },
    {
      "lastTransitionTime": "2023-12-19T07:26:11Z",
      "status": "True",
      "type": "ControlPlaneReady"
    },
    {
      "lastTransitionTime": "2023-12-19T07:24:41Z",
      "status": "True",
      "type": "InfrastructureReady"
    }
  ],
  "infrastructureReady": true,
  "observedGeneration": 2,
  "phase": "Provisioned"
}

nunnatsa · 2023-12-19T08:22:44Z

Comparing the conditions with the current version (main branch)

capi Machine InfrastructureReady condition (the only change):

Main:

    {
      "lastTransitionTime": "2023-12-19T08:13:21Z",
      "message": "0 of 2 completed",
      "reason": "WaitingForBootstrapData",
      "severity": "Info",
      "status": "False",
      "type": "InfrastructureReady"
    },

PR:

    {
      "lastTransitionTime": "2023-12-19T07:26:28Z",
      "message": "0 of 2 completed",
      "reason": "DVNotReady",
      "severity": "Info",
      "status": "False",
      "type": "InfrastructureReady"
    },

No change in MachineDeployment

davidvossel · 2023-12-20T15:35:05Z

pkg/kubevirt/machine.go

@@ -90,6 +94,23 @@ func NewMachine(ctx *context.MachineContext, client client.Client, namespace str
 		machine.vmInstance = vm
 	}

+	if vm != nil && vm.Spec.Template != nil {
+		for _, vol := range vm.Spec.Template.Spec.Volumes {


sorry i didn't catch this earlier.

Technically we need to be looking for the presence of vm.Spec.DataVolumeTemplates here instead of vm.Spec.Template.Spec.Volumes.

This is because the Volumes could be attaching the DataVolume as a PVC instead of a DataVolume. These are interchangable.

The source of truth in our case for whether a VM has a data import flow associated with it is to look at the DataVolumeTemplate section.

davidvossel · 2023-12-20T15:36:13Z

pkg/kubevirt/machine.go

@@ -63,6 +66,7 @@ func NewMachine(ctx *context.MachineContext, client client.Client, namespace str
 		vmiInstance:        nil,
 		vmInstance:         nil,
 		sshKeys:            sshKeys,
+		dataVolume:         nil,


I think this needs to be a slice instead of a single varable. It's possible for a VM to have mulitple DataVolumeTemplates associated with it. In the use case that you and I are most familiar with (hypershift) this will not be the case right now, but it's possible in the generic capk use case.

davidvossel · 2023-12-20T19:17:51Z

config/rbac/role.yaml

+- apiGroups:
+  - cdi.kubevirt.io
+  resources:
+  - datavolumes
+  verbs:
+  - get
+  - list
+  - watch


oh, i guess this is needed here too, clusterkubevirtadm/cmd/credentials/credentials.go

@davidvossel - done

If not all the cluster VMs are ready, CAPK operator now sets the new `AllMachinesAreReady` condition in the KubeVirtCluster CR to `False`. Signed-off-by: Nahshon Unna-Tsameret <nunnatsa@redhat.com>

davidvossel

/approve
/lgtm

Looks like a good start. I'm sure we'll expand on this in the future

k8s-ci-robot · 2023-12-21T18:26:58Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: davidvossel, nunnatsa

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [davidvossel,nunnatsa]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Dec 14, 2023

k8s-ci-robot requested review from fabriziopandini and qinqon December 14, 2023 12:13

k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 14, 2023

k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Dec 14, 2023

nunnatsa force-pushed the reflect-vm-status-to-cluster branch from 506cf7f to a6cea36 Compare December 14, 2023 13:53

davidvossel reviewed Dec 14, 2023

View reviewed changes

nunnatsa force-pushed the reflect-vm-status-to-cluster branch from a6cea36 to 82bfd09 Compare December 17, 2023 16:00

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 17, 2023

nunnatsa force-pushed the reflect-vm-status-to-cluster branch 2 times, most recently from a22e2d1 to 341359d Compare December 18, 2023 12:39

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 18, 2023

nunnatsa force-pushed the reflect-vm-status-to-cluster branch from 341359d to 7ccd90f Compare December 18, 2023 12:41

davidvossel reviewed Dec 18, 2023

View reviewed changes

nunnatsa force-pushed the reflect-vm-status-to-cluster branch from 7ccd90f to c146ead Compare December 19, 2023 07:44

davidvossel reviewed Dec 20, 2023

View reviewed changes

nunnatsa force-pushed the reflect-vm-status-to-cluster branch from c146ead to afad5d5 Compare December 20, 2023 16:49

davidvossel reviewed Dec 20, 2023

View reviewed changes

nunnatsa force-pushed the reflect-vm-status-to-cluster branch from afad5d5 to b409944 Compare December 21, 2023 07:33

Reflect the VM statuses to the KubeVirtCluster status

dc8c7bd

If not all the cluster VMs are ready, CAPK operator now sets the new `AllMachinesAreReady` condition in the KubeVirtCluster CR to `False`. Signed-off-by: Nahshon Unna-Tsameret <nunnatsa@redhat.com>

nunnatsa force-pushed the reflect-vm-status-to-cluster branch from b409944 to dc8c7bd Compare December 21, 2023 07:44

davidvossel reviewed Dec 21, 2023

View reviewed changes

k8s-ci-robot assigned davidvossel Dec 21, 2023

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 21, 2023

k8s-ci-robot merged commit 1ab9fc5 into kubernetes-sigs:main Dec 21, 2023
13 checks passed

nunnatsa deleted the reflect-vm-status-to-cluster branch December 21, 2023 18:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reflect the VM statuses to the KubeVirtCluster status #272

Reflect the VM statuses to the KubeVirtCluster status #272

nunnatsa commented Dec 14, 2023 •

edited

Loading

nunnatsa commented Dec 14, 2023

davidvossel Dec 14, 2023

coveralls commented Dec 17, 2023 •

edited

Loading

nunnatsa commented Dec 17, 2023

nunnatsa commented Dec 18, 2023

davidvossel left a comment

davidvossel Dec 18, 2023

nunnatsa Dec 19, 2023

davidvossel Dec 18, 2023

nunnatsa commented Dec 19, 2023 •

edited

Loading

nunnatsa commented Dec 19, 2023

davidvossel Dec 20, 2023

davidvossel Dec 20, 2023

davidvossel Dec 20, 2023

nunnatsa Dec 21, 2023

davidvossel left a comment

k8s-ci-robot commented Dec 21, 2023

		clusterToKubevirtMachines, err := util.ClusterToObjectsMapper(mgr.GetClient(), &infrav1.KubevirtMachineList{}, mgr.GetScheme())
		clusterToKubevirtMachines, err := util.ClusterToTypedObjectsMapper(mgr.GetClient(), &infrav1.KubevirtMachineList{}, mgr.GetScheme())

Reflect the VM statuses to the KubeVirtCluster status #272

Reflect the VM statuses to the KubeVirtCluster status #272

Conversation

nunnatsa commented Dec 14, 2023 • edited Loading

What this PR does / why we need it

nunnatsa commented Dec 14, 2023

Choose a reason for hiding this comment

coveralls commented Dec 17, 2023 • edited Loading

Pull Request Test Coverage Report for Build 7285359056

💛 - Coveralls

nunnatsa commented Dec 17, 2023

nunnatsa commented Dec 18, 2023

davidvossel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nunnatsa commented Dec 19, 2023 • edited Loading

nunnatsa commented Dec 19, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidvossel left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Dec 21, 2023

nunnatsa commented Dec 14, 2023 •

edited

Loading

coveralls commented Dec 17, 2023 •

edited

Loading

nunnatsa commented Dec 19, 2023 •

edited

Loading