Filter OpenShift VMs by features #770

rszwajko · 2023-10-31T18:42:34Z

Supported features:

NUMA
GPUs/Host Devices
Persistent TPM/EFI
Dedicated CPU

yaacov · 2023-11-05T09:34:19Z

@rszwajko looks nice, but it looks too detailed to me, a user need to do lots of clicks and decisions to select all vms with some feature.

@ahadas can we group some features that have things in common together (efi+tpm, cpu+gpu ... )?
should we do "vm has special features" selection option, and remove all granularity ?

ahadas · 2023-11-06T10:09:14Z

@ahadas can we group some features that have things in common together (efi+tpm, cpu+gpu ... )? should we do "vm has special features" selection option, and remove all granularity ?

yeah, grouping features sounds like a good idea
the question is how - let's say that we have 'efi+tpm' group, does it mean we'll filter VMs that have either nvram or tpm?

thinking out aloud:
as I see it, this view is useful to assessing the ability to migrate the virtual machine before creating plans - get to this view and see which VMs have warnings / errors, that's more important than showing the features themselves since when we think about the features we can divide them to two groups:

features that will be 'lost' during the migration, e.g., host devices. I don't see much value in filtering them here since users are unlikely to go back, e.g., to RHV and detach the host devices when they can just select those VMs in MTV and continue with their migration, knowing the host devices will not be attached to the migrated VMs.
features that block the migration. there aren't many features like that, and we need to think about those that exist today (e.g., sr-iov is blocking migrations from vsphere but it doesn't block migrations from rhv.. same case for host/passthrough devices). it can be useful to filter by such features so users could make changes in the source environment in order to be able to migrate them, but the question is whether it makes sense at all to require users to do it or treat all features like those in the first group that are dropped if they're not supported by kubevirt (with a warning)

so maybe that's actually a question to ask - what value should filtering by features bring to users?

yaacov · 2023-11-06T12:26:46Z

@ahadas thanks,

@rszwajko can we do finer filtering using the warnings / errors, IARC we only filter by existing of warning or error, we don't differentiate between specific errors or warnings, @ahadas does it makes sense to allow filtering by different error / warning types ?

@ahadas doe the scenario of making different plans for different VM gropups makes sense ?
"as an admin I want to create a different plan to migrate all the vms that require GPU and a different migration plan for the VMs that require special vnics"

rszwajko · 2023-11-06T13:58:19Z

@ahadas

this view is useful to assessing the ability to migrate the virtual machine before creating plans

The future plan is to replace Plan Wizard with this screen i.e. you filter the VMs and click "Migrate" button.

features that will be 'lost' during the migration, e.g., host devices.

For OpenShift providers we have the same capabilities on source and on the target so no feature need to be lost.
The target needs only to have the correct configuration in place.
For other providers this is more complex - that's why it's not supported yet (this PR is only about OpenShift)

what value should filtering by features bring to users?

The primary use case is a successful migration of VMs with features that are not covered by our mappings. Such features require manual config on the target(i.e. in a dedicated namespace). Most likely other VMs will need a different setup so it's better to keep both groups separated.

The flow could look the following:

filter VMs that use some special feature (i.e. GPU)
select the VMs and start the migration wizard
pick a namespace on the target that have the given special feature pre-configured
customize other migration option
trigger the migration

rszwajko · 2023-11-06T14:10:13Z

@yaacov

can we do finer filtering using the warnings / errors, IARC we only filter by existing of warning or error, we don't differentiate between specific errors or warnings,

It's possible to do free-text filtering based on labels. The structure looks the following:

{
      category: 'Warning',
      label: 'Shareable disk detected',
      assessment:
        'Shared disks are only supported by certain OpenShift Virtualization storage configurations. Ensure that the correct storage is selected for the disk.',
    },

ahadas · 2023-11-06T14:10:27Z

@rszwajko can we do finer filtering using the warnings / errors, IARC we only filter by existing of warning or error, we don't differentiate between specific errors or warnings, @ahadas does it makes sense to allow filtering by different error / warning types ?

yeah, that makes sense to me
if you do this by the label of a validation, like this, then we'll avoid making changes to the labels (we already rephrased/changed few assessments)

@ahadas doe the scenario of making different plans for different VM gropups makes sense ? "as an admin I want to create a different plan to migrate all the vms that require GPU and a different migration plan for the VMs that require special vnics"

it certainly makes sense to make different plans for different VM groups since we need a plan per target namespace, however, I don't think it would be based on the VM features/capabilities but rather based on their ownership/permissions

ahadas · 2023-11-06T14:18:41Z

@ahadas

this view is useful to assessing the ability to migrate the virtual machine before creating plans

The future plan is to replace Plan Wizard with this screen i.e. you filter the VMs and click "Migrate" button.

ah ok, I didn't realize that's the context, thanks

features that will be 'lost' during the migration, e.g., host devices.

For OpenShift providers we have the same capabilities on source and on the target so no feature need to be lost. The target needs only to have the correct configuration in place. For other providers this is more complex - that's why it's not supported yet (this PR is only about OpenShift)

currently you're right as we assume we migrate between OCP clusters that have the same OCP version
I thought you wrote "OpenShift VMs" by mistake and you meant RHV/vSphere/OpenStack VMs, ok

what value should filtering by features bring to users?

The primary use case is a successful migration of VMs with features that are not covered by our mappings. Such features require manual config on the target(i.e. in a dedicated namespace). Most likely other VMs will need a different setup so it's better to keep both groups separated.

The flow could look the following:

filter VMs that use some special feature (i.e. GPU)

select the VMs and start the migration wizard

pick a namespace on the target that have the given special feature pre-configured

customize other migration option

trigger the migration

I see, but then looking at the features above:

Dedicated CPU - not namespace-scoped
GPUs - I'm not sure if that's namespace-scoped
Host devices - unsupported in kubevirt
NUMA - not namespace-scoped
Persistent EFI - not namespace-scoped
Persistent TPM - not namespace-scoped

there is SR-IOV that is namespace-scoped but that's supposed to be covered by the network mapping (i.e., if we get a VM from the source that is mapped to a network-attachment-definition with SR-IOV, then we need to make sure that network is mapped to a network-attachment-definition with SR-IOV on the target)

rszwajko · 2023-11-06T16:26:35Z

@ahadas
I've browsed the official Kubevirt user guide and picked features that looked promising. The list requires more attention for sure.

As for your comments - note that most of those features require configuration at node level. Nodes in turn can be assigned to namespaces (see below). However even if we assume that this is a cluster wide setting it only means that decision is made at different level i.e.

select the VMs
pick a target cluster with given capability
trigger the migration

Dedicated CPU - not namespace-scoped

The feature requires nodes with a running CPU manager.

GPUs - I'm not sure if that's namespace-scoped

Host devices - unsupported in kubevirt

See https://kubevirt.io/user-guide/virtual_machines/host-devices/
Note the user guide lists NVMe passthrough as one of the use cases and recommends nodeSelector to mitigate issues with assignment.

NUMA - not namespace-scoped

Requires dedicated CPU resources and hugepages which are node-level.

Persistent EFI - not namespace-scoped
Persistent TPM - not namespace-scoped

I've over-interpreted the docs. The setting seems per cluster.

yaacov · 2023-11-07T06:36:47Z

Note:
when filtering VMs it is not just for namespace, it can also be for other reasons, for example migrating some VMs in phase 1 and doing other VMs in phase 2 and 3, IMHO filters that can help users to select different VMs for different migration phase based on resource can be helpful

ahadas · 2023-11-07T08:39:10Z

@rszwajko ack

... Nodes in turn can be assigned to namespaces (see below). However even if we assume that this is a cluster wide setting it only means that decision is made at different level i.e.

select the VMs

pick a target cluster with given capability

trigger the migration

can you point to me to where we see nodes can be assigned to namespaces? I'm not familiar with that configuration

about the steps above, in our case it's not like oVirt that we have a data center that contains multiple clusters and then we pick VMs in the first step and follow with picking a target cluster, but rather we pick the target cluster (cluster == OCP instance) and then pick the VMs to migrate to a namespace in that cluster

I doubt users would pick different OCP instances based on features, even though it's possible, but I think it's more likely that people would want to test things, e.g., test migration of a VM with vGPU, before initiating the full migration as @yaacov wrote above, so ok, I see the value in this now - it is worth mentioning it somewhere though

Dedicated CPU - not namespace-scoped

The feature requires nodes with a running CPU manager.

right, and CPU manager is enabled per-cluster

GPUs - I'm not sure if that's namespace-scoped
Host devices - unsupported in kubevirt

See https://kubevirt.io/user-guide/virtual_machines/host-devices/ Note the user guide lists NVMe passthrough as one of the use cases and recommends nodeSelector to mitigate issues with assignment.

ack, it didn't exist back then so we also need to adjust our validations - we currently still warn users that host devices won't be mapped

NUMA - not namespace-scoped

Requires dedicated CPU resources and hugepages which are node-level.

ok, but the problem with node-level is that we don't know, in MTV, the node-level specification in kubevirt

Persistent EFI - not namespace-scoped
Persistent TPM - not namespace-scoped

I've over-interpreted the docs. The setting seems per cluster.

here we have a bigger issue - we cannot get the persistent data from RHV and that from vSphere won't help us even if we could have got it, so regardless of the support for these features in kubevirt, we should warn about VMs with these features.

we thought at some point, and I proposed it in kubevirt user's mailing list, to try to identify which features are supported or enabled in kubevirt - this could allow us to produce different message, e.g., when TPM is not supported in kubvirt vs. when it's supported, but it didn't get much attention from the kubevirt side so it means we'll have to store, e.g., TPM is supported as from version X, however this is not that easy and we'll need to have that mapping also for k8s..

rszwajko · 2023-11-07T11:50:53Z

@ahadas

can you point to me to where we see nodes can be assigned to namespaces? I'm not familiar with that configuration

check out PodNodeSelector

Dedicated CPU - not namespace-scoped
The feature requires nodes with a running CPU manager.
right, and CPU manager is enabled per-cluster

My understanding is that a cluster can have multiple nodes and not all nodes may have a CPU manager. Generally running the CPU manager and labeling seems a manual process.

NUMA - not namespace-scoped
Requires dedicated CPU resources and hugepages which are node-level.
ok, but the problem with node-level is that we don't know, in MTV, the node-level specification in kubevirt

Yes and it's unlikely that we will cover such advanced features with mappings. Preparing such configuration will rather stay a manual process. We can however try to support the users in the process i.e. with filtering the right VMs and choosing the pre-configured target provider/namespace.

ahadas · 2023-11-07T13:48:06Z

@ahadas

can you point to me to where we see nodes can be assigned to namespaces? I'm not familiar with that configuration

check out PodNodeSelector

ah ok, so they're not really "assigned" but rather "selected", like we can affect pod scheduling with taints and tolerations but here it's done on at the namespace-level, nice

Dedicated CPU - not namespace-scoped
The feature requires nodes with a running CPU manager.
right, and CPU manager is enabled per-cluster

My understanding is that a cluster can have multiple nodes and not all nodes may have a CPU manager. Generally running the CPU manager and labeling seems a manual process.

yes, the CPU manager runs on kubelet, i.e., on the nodes
I meant that the feature can be enabled at the cluster-level in kubevirt

NUMA - not namespace-scoped
Requires dedicated CPU resources and hugepages which are node-level.
ok, but the problem with node-level is that we don't know, in MTV, the node-level specification in kubevirt

Yes and it's unlikely that we will cover such advanced features with mappings. Preparing such configuration will rather stay a manual process. We can however try to support the users in the process i.e. with filtering the right VMs and choosing the pre-configured target provider/namespace.

we are definitely not going to prepare such configuration. btw, we also don't prepare network configuration but count on users to define the network attachment definitions and only map to them, so that makes sense

about choosing pre-configured target provider/namespace automatically - that's very complicated.
even taking a simpler case: an OCP instance, that has OpenShift Virtualization deployed on it, and have a namespace that selects hosts that non of them have virtualization enabled - we (in MTV) can't tell that this namespace cannot be used for VMs, let alone more complicated scenarios of nodes with host devices..

also take into account that in k8s it all works with "desired state" - you may want to migrate the vm to a new namespace or an existing namespace that don't yet have the resources the VMs need, but the VMs should still ask for them as you're going to add it later on

I think it comes down to: in which situation we need a feature without a concern because if we have concerns (warnings / errors) for each feature, we better let users filter concerns rather than features

yaacov · 2023-11-07T13:54:00Z

I think it comes down to: in which situation we need a feature without a concern because if we have concerns (warnings / errors) for each feature, we better let users filter concerns rather than features

examples:
a. migrating in batches by features, first batch are small vms with one distk, and in the next batch vms that require GPU, and lastly if all is ok migrate vms with special nics
b. migrating only a subset of the vms
c. creating a re-migration of vms that failed the last migration because the had some issue with efi so we need to re-select only this ones

ahadas · 2023-11-07T14:00:50Z

I think it comes down to: in which situation we need a feature without a concern because if we have concerns (warnings / errors) for each feature, we better let users filter concerns rather than features

ok, let's check this concretely for the features above:
dedicated CPUs: not all the features in ovirt exist in kubevirt (see those validations), so for those that don't exist we need a concern and for those that exist we can have a feature (?)
GPUs: we currently don't handle vGPUs in forklift so it would be better to have a concern that warns about this
Host devices: forklift doesn't handle that as well (I just saw that USB passthrough was added in kubevirt 1.1.0 that was released today)
NUMA: same as GPUs and host devices
and for persistent EFI/TPM we need concerns

ahadas · 2023-11-07T14:27:00Z

I think it comes down to: in which situation we need a feature without a concern because if we have concerns (warnings / errors) for each feature, we better let users filter concerns rather than features

examples: a. migrating in batches by features, first batch are small vms with one distk, and in the next batch vms that require GPU, and lastly if all is ok migrate vms with special nics b. migrating only a subset of the vms c. creating a re-migration of vms that failed the last migration because the had some issue with efi so we need to re-select only this ones

@yaacov ok, that's something different - here you say when filtering of either concerns or features may be useful

for the first case, do we agree that it's more likely that such filtering would happen with features that we have concerns for? the way you put it is too broad since for that we can add features of, e.g., 'two disks', 'nic with special properties', etc

also think about our internal migration from RHV to OpenShift - this would mean to divide the plans that we created for each target namespace to multiple plans that are per-target namespace and per-feature. we have in the backlog a task about prioritization of the migrations - not to create plans with less VMs but to say "here is my plan, I want to start with certain VMs, then other group of VMs, and so on"

for the third case, if users need to create new plans for failed migrations, we did something incorrect. user are supposed to be able to restart a plan and then only the failed migrations would be retriggered

yaacov · 2023-11-07T15:38:08Z

for the first case, do we agree that it's more likely that such filtering would happen with features that we have concerns for? the way you put it is too broad since for that we can add features of, e.g., 'two disks', 'nic with special properties', etc

hmm, from my side this is a discussion about how to choose vms for migration plan, not about this specific feature list.
the question I want answer to, is how to best choose vms for migration,
IMHO using only warnings, e.g. "migrate all vms without warnnigs" is less helpful then "migrate all vms that match this criteria"
for me, this discussion is about what are the best criteria , i less care about the list in this specific PR

this discussion is about how we can make the plan creation more useful,
the question we ask is not if this specific list of features is useful, the question is what list of features will be useful,
should we filter by number if disks ?
should we filter by labels ? what provider support something that resemble labels ?
also note that this feature list is provider specific, this discussion is not specific to overt or openshift, the question is how do we best help users to choose vms when creating a migration plan when they need to choose 100s of vms out of the 1000s a provider include

yaacov · 2023-11-07T16:00:08Z

@ahadas , I think I understand the problem ... this is a PR that is a sub task of an issue:
#764

@rszwajko @ahadas let's continue in the issue as it is a better place for discussions

Supported features: 1. NUMA 2. GPUs/Host Devices 3. Persistent TPM/EFI 4. Dedicated CPU Signed-off-by: Radoslaw Szwajkowski <rszwajko@redhat.com>

sonarcloud · 2023-11-30T11:19:33Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells

No Coverage information
0.3% Duplication

rszwajko · 2023-11-30T11:23:16Z

Change summary in the force push above:

merge similar features to limit the number of options in the filter
in the feature column display labels
rebase on top of the current main branch

rszwajko · 2023-11-30T11:24:29Z

@yaacov please re-review.
note that the screenshots (far) above have been updated

yaacov · 2023-11-30T11:30:40Z

packages/eslint-plugin/cspell.wordlist.txt

@@ -64,3 +64,5 @@ autoattach
 virt
 multiqueue
 filesystems
+bootloader
+typeahead


rszwajko requested review from yaacov, sjd78 and sgratch October 31, 2023 18:42

rszwajko force-pushed the vmFeatures branch from 84f7e47 to 4e3b01e Compare November 2, 2023 20:19

ahadas mentioned this pull request Nov 8, 2023

Filter VMs in vm list views #764

Closed

4 tasks

rszwajko force-pushed the vmFeatures branch from 4e3b01e to 68f1556 Compare November 28, 2023 15:43

Filter OpenShift VMs by features

09f2fdd

Supported features: 1. NUMA 2. GPUs/Host Devices 3. Persistent TPM/EFI 4. Dedicated CPU Signed-off-by: Radoslaw Szwajkowski <rszwajko@redhat.com>

rszwajko force-pushed the vmFeatures branch from 68f1556 to 09f2fdd Compare November 30, 2023 11:18

rszwajko marked this pull request as ready for review November 30, 2023 11:23

yaacov reviewed Nov 30, 2023

View reviewed changes

packages/eslint-plugin/cspell.wordlist.txt

@@ -64,3 +64,5 @@ autoattach

virt

multiqueue

filesystems

bootloader

typeahead

Copy link

Member

yaacov Nov 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NICE !

yaacov merged commit ab39a45 into kubev2v:main Nov 30, 2023
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filter OpenShift VMs by features #770

Filter OpenShift VMs by features #770

rszwajko commented Oct 31, 2023 •

edited

Loading

yaacov commented Nov 5, 2023

ahadas commented Nov 6, 2023

yaacov commented Nov 6, 2023

rszwajko commented Nov 6, 2023 •

edited

Loading

rszwajko commented Nov 6, 2023

ahadas commented Nov 6, 2023

ahadas commented Nov 6, 2023 •

edited

Loading

rszwajko commented Nov 6, 2023

yaacov commented Nov 7, 2023

ahadas commented Nov 7, 2023 •

edited

Loading

rszwajko commented Nov 7, 2023

ahadas commented Nov 7, 2023 •

edited

Loading

yaacov commented Nov 7, 2023

ahadas commented Nov 7, 2023 •

edited

Loading

ahadas commented Nov 7, 2023 •

edited

Loading

yaacov commented Nov 7, 2023

yaacov commented Nov 7, 2023

sonarcloud bot commented Nov 30, 2023

rszwajko commented Nov 30, 2023

rszwajko commented Nov 30, 2023

yaacov Nov 30, 2023

Filter OpenShift VMs by features #770

Filter OpenShift VMs by features #770

Conversation

rszwajko commented Oct 31, 2023 • edited Loading

yaacov commented Nov 5, 2023

ahadas commented Nov 6, 2023

yaacov commented Nov 6, 2023

rszwajko commented Nov 6, 2023 • edited Loading

rszwajko commented Nov 6, 2023

ahadas commented Nov 6, 2023

ahadas commented Nov 6, 2023 • edited Loading

rszwajko commented Nov 6, 2023

yaacov commented Nov 7, 2023

ahadas commented Nov 7, 2023 • edited Loading

rszwajko commented Nov 7, 2023

ahadas commented Nov 7, 2023 • edited Loading

yaacov commented Nov 7, 2023

ahadas commented Nov 7, 2023 • edited Loading

ahadas commented Nov 7, 2023 • edited Loading

yaacov commented Nov 7, 2023

yaacov commented Nov 7, 2023

sonarcloud bot commented Nov 30, 2023

rszwajko commented Nov 30, 2023

rszwajko commented Nov 30, 2023

yaacov Nov 30, 2023

Choose a reason for hiding this comment

rszwajko commented Oct 31, 2023 •

edited

Loading

rszwajko commented Nov 6, 2023 •

edited

Loading

ahadas commented Nov 6, 2023 •

edited

Loading

ahadas commented Nov 7, 2023 •

edited

Loading

ahadas commented Nov 7, 2023 •

edited

Loading

ahadas commented Nov 7, 2023 •

edited

Loading

ahadas commented Nov 7, 2023 •

edited

Loading