-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reflect the VM statuses to the KubeVirtCluster status #272
Reflect the VM statuses to the KubeVirtCluster status #272
Conversation
/ok-to-test |
506cf7f
to
a6cea36
Compare
@@ -275,6 +276,7 @@ func (r *KubevirtMachineReconciler) reconcileNormal(ctx *context.MachineContext) | |||
// Mark VMProvisionedCondition to indicate that the VM has successfully started | |||
conditions.MarkTrue(ctx.KubevirtMachine, infrav1.VMProvisionedCondition) | |||
} else { | |||
conditions.MarkFalse(ctx.KubevirtMachine, infrav1.VMProvisionedCondition, "notScheduled", clusterv1.ConditionSeverityInfo, "unknown reason") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what we need here is a series of checks into why the VM isn't ready.
So, think about a function here to set the Ready condition to false. something like the logic below. The idea here is we start by looking for detailed reasons the VM might not be ready, and then slowly work down to more generic reasons the VM isn't ready.
func mapVMStatusToMachineCondition(...) {
if <vm has condition reason: Unschedulable status: "False" type: PodScheduled> {
conditions.MarkFalse(ctx.KubevirtMachine, infrav1.VMProvisionedCondition, "notScheduled", clusterv1.ConditionSeverityInfo, <reason given from vm condition> )
} else {
conditions.MarkFalse(ctx.KubevirtMachine, infrav1.VMProvisionedCondition, "vmNotReady", clusterv1.ConditionSeverityInfo, <reason vm condition is set to not ready> )
}
}
In the future that function will keep getting expanded. The logic will look something like this from a high level
if vm is not schedulable {
} else if dv storage profile not found {
} else if dv pending storage provisioning {
} else if dv failure to import image {
} else {
< generic VMProvisioned = false reason >
}
a6cea36
to
82bfd09
Compare
Pull Request Test Coverage Report for Build 7285359056
💛 - Coveralls |
/hold need to complete unit tests |
a22e2d1
to
341359d
Compare
/unhold |
341359d
to
7ccd90f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One high level comment is that I want to make sure the capi Machine
object get's the condition messages from the KubeVirtMachine object.
From what I understand, right now the KubeVirtMachine's ready condition will show something like the output below when a DV can't be imported because of an insecure registry.
{
"conditions": [
{
"lastTransitionTime": "2023-12-17T15:37:31Z",
"message": "0 of 2 completed",
"reason": "DVNotReady",
"severity": "Info",
"status": "False",
"type": "Ready"
},
{
"lastTransitionTime": "2023-12-17T15:37:31Z",
"message": "DataVolume capi-quickstart-md-0-rjxxm-capi-quickstart-boot-volume is still provisioning",
"reason": "DVNotReady",
"severity": "Info",
"status": "False",
"type": "VMProvisioned"
}
]
}
What does the corresponding Machine object and MachineDeployment have in it's status for these KubeVIrtMachine conditions?
The goal is for the KubeVirtMachine object to present these conditions in a way that the capi Machine apis bubble up information regarding to why the VM isn't progresssing as expected
clusterToKubevirtMachines, err := util.ClusterToObjectsMapper(mgr.GetClient(), &infrav1.KubevirtMachineList{}, mgr.GetScheme()) | ||
clusterToKubevirtMachines, err := util.ClusterToTypedObjectsMapper(mgr.GetClient(), &infrav1.KubevirtMachineList{}, mgr.GetScheme()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm curious what this change does
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the ClusterToObjectsMapper
function is deprecated. In its documentation, it says to use ClusterToTypedObjectsMapper
instead:
// Deprecated: This function is deprecated and will be removed in a future release, use ClusterToTypedObjectsMapper instead.
// The problem with this function is that it uses UnstructuredList to retrieve objects, with the default client configuration
// this will lead to uncached List calls, which is a major performance issue.
pkg/kubevirt/machine.go
Outdated
case cdiv1.Failed: | ||
return "DVFailed", fmt.Sprintf("DataVolume %s failed", m.dataVolume.Name), true | ||
default: | ||
msg := "DataVolume is not ready" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add the phase to this default message?
msg: fmt.Sprintf("DataVolume %s is not ready. Phase = %s", m.dataVolume.Name, m.dataVolume.Status.Phase)
For this {
"conditions": [
{
"lastTransitionTime": "2023-12-19T07:26:28Z",
"message": "0 of 2 completed",
"reason": "DVNotReady",
"severity": "Info",
"status": "False",
"type": "Ready"
},
{
"lastTransitionTime": "2023-12-19T07:28:28Z",
"message": "DataVolume capi-quickstart-md-0-d7xjr-capi-quickstart-boot-volume import is not running: DataVolume too small to contain image",
"reason": "DVNotReady",
"severity": "Info",
"status": "False",
"type": "VMProvisioned"
}
],
"ready": false
} We get this {
"bootstrapReady": true,
"conditions": [
{
"lastTransitionTime": "2023-12-19T07:26:28Z",
"message": "1 of 2 completed",
"reason": "DVNotReady",
"severity": "Info",
"status": "False",
"type": "Ready"
},
{
"lastTransitionTime": "2023-12-19T07:26:18Z",
"status": "True",
"type": "BootstrapReady"
},
{
"lastTransitionTime": "2023-12-19T07:26:28Z",
"message": "0 of 2 completed",
"reason": "DVNotReady",
"severity": "Info",
"status": "False",
"type": "InfrastructureReady"
},
{
"lastTransitionTime": "2023-12-19T07:24:36Z",
"reason": "WaitingForNodeRef",
"severity": "Info",
"status": "False",
"type": "NodeHealthy"
}
],
"lastUpdated": "2023-12-19T07:26:18Z",
"observedGeneration": 2,
"phase": "Provisioning"
}
And the {
"conditions": [
{
"lastTransitionTime": "2023-12-19T07:24:36Z",
"message": "Minimum availability requires 1 replicas, current 0 available",
"reason": "WaitingForAvailableMachines",
"severity": "Warning",
"status": "False",
"type": "Ready"
},
{
"lastTransitionTime": "2023-12-19T07:24:36Z",
"message": "Minimum availability requires 1 replicas, current 0 available",
"reason": "WaitingForAvailableMachines",
"severity": "Warning",
"status": "False",
"type": "Available"
}
],
"observedGeneration": 1,
"phase": "ScalingUp",
"replicas": 1,
"selector": "cluster.x-k8s.io/cluster-name=capi-quickstart,cluster.x-k8s.io/deployment-name=capi-quickstart-md-0",
"unavailableReplicas": 1,
"updatedReplicas": 1
} However, the capi {
"conditions": [
{
"lastTransitionTime": "2023-12-19T07:26:11Z",
"status": "True",
"type": "Ready"
},
{
"lastTransitionTime": "2023-12-19T07:26:11Z",
"status": "True",
"type": "ControlPlaneInitialized"
},
{
"lastTransitionTime": "2023-12-19T07:26:11Z",
"status": "True",
"type": "ControlPlaneReady"
},
{
"lastTransitionTime": "2023-12-19T07:24:41Z",
"status": "True",
"type": "InfrastructureReady"
}
],
"infrastructureReady": true,
"observedGeneration": 2,
"phase": "Provisioned"
} |
7ccd90f
to
c146ead
Compare
Comparing the conditions with the current version (main branch) capi Machine Main: {
"lastTransitionTime": "2023-12-19T08:13:21Z",
"message": "0 of 2 completed",
"reason": "WaitingForBootstrapData",
"severity": "Info",
"status": "False",
"type": "InfrastructureReady"
}, PR: {
"lastTransitionTime": "2023-12-19T07:26:28Z",
"message": "0 of 2 completed",
"reason": "DVNotReady",
"severity": "Info",
"status": "False",
"type": "InfrastructureReady"
}, No change in |
pkg/kubevirt/machine.go
Outdated
@@ -90,6 +94,23 @@ func NewMachine(ctx *context.MachineContext, client client.Client, namespace str | |||
machine.vmInstance = vm | |||
} | |||
|
|||
if vm != nil && vm.Spec.Template != nil { | |||
for _, vol := range vm.Spec.Template.Spec.Volumes { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry i didn't catch this earlier.
Technically we need to be looking for the presence of vm.Spec.DataVolumeTemplates
here instead of vm.Spec.Template.Spec.Volumes
.
This is because the Volumes
could be attaching the DataVolume as a PVC instead of a DataVolume. These are interchangable.
The source of truth in our case for whether a VM has a data import flow associated with it is to look at the DataVolumeTemplate section.
pkg/kubevirt/machine.go
Outdated
@@ -63,6 +66,7 @@ func NewMachine(ctx *context.MachineContext, client client.Client, namespace str | |||
vmiInstance: nil, | |||
vmInstance: nil, | |||
sshKeys: sshKeys, | |||
dataVolume: nil, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs to be a slice instead of a single varable. It's possible for a VM to have mulitple DataVolumeTemplates associated with it. In the use case that you and I are most familiar with (hypershift) this will not be the case right now, but it's possible in the generic capk use case.
c146ead
to
afad5d5
Compare
- apiGroups: | ||
- cdi.kubevirt.io | ||
resources: | ||
- datavolumes | ||
verbs: | ||
- get | ||
- list | ||
- watch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, i guess this is needed here too, clusterkubevirtadm/cmd/credentials/credentials.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davidvossel - done
afad5d5
to
b409944
Compare
If not all the cluster VMs are ready, CAPK operator now sets the new `AllMachinesAreReady` condition in the KubeVirtCluster CR to `False`. Signed-off-by: Nahshon Unna-Tsameret <nunnatsa@redhat.com>
b409944
to
dc8c7bd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
/lgtm
Looks like a good start. I'm sure we'll expand on this in the future
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: davidvossel, nunnatsa The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What this PR does / why we need it
Be more verbose when VM is not scheduled. Add meaningful reasons and messages to the KubeVirtMachine conditions, to be reflected in the Cluster resources.
Also, make the
KubeVirtMachine.status.ready
field to be printed also if itfalse
, and adding it as a new printed column named "Rwady".