Reset NUMA configuration with resize and pin VM #504

liranr23 · 2022-06-28T13:10:48Z

When running a Resize and Pin NUMA VM, we create the virtual NUMA
nodes including the pinning to the physical NUMA nodes of the host.
This is a static configuration that being set on the VM.
Until now, this configuration harmed many flows to update the VM
configuration and a manual workaround needed to overcome it. Now the
NUMA configuration will be deleted once the VM shutdown and when
updating the VM, it will see the current configuration as there is no
NUMA to that VM.

Change-Id: I7d6e8200e7830dc6903a76ac219225fa359d2c53
Bug-Url: https://bugzilla.redhat.com/2074525
Signed-off-by: Liran Rotenberg lrotenbe@redhat.com

backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/UpdateVmCommand.java

...manager/modules/vdsbroker/src/main/java/org/ovirt/engine/core/vdsbroker/ResourceManager.java

backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/VdsEventListener.java

ljelinkova

The patch looks good and will make some flows easier. However, it might still be missing some parts if we want to keep (resize and pin -> no numa nodes defined by the user as invariant).

For example, we should set an empty list also when creating a new VM with resize and pin numa (for REST API), for the import and we could drop some validations in VmHandler.validateCpuPinningPolicy().

backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/UpdateVmCommand.java

liranr23 · 2022-06-29T13:18:33Z

The patch looks good and will make some flows easier. However, it might still be missing some parts if we want to keep (resize and pin -> no numa nodes defined by the user as invariant).

For example, we should set an empty list also when creating a new VM with resize and pin numa (for REST API), for the import and we could drop some validations in VmHandler.validateCpuPinningPolicy().

AFAIK, for REST-API you need to execute other actions, not AddVm.
I thought of doing it for AddVm as well, but it's nonsense. In the API it requires additional API calls to the NUMA, in UI it's blocked since we are in resize and pin. That was my thought.

.../common/src/main/java/org/ovirt/engine/core/common/action/VmNumaNodeOperationParameters.java

backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/UpdateVmCommand.java

...odules/vdsbroker/src/main/java/org/ovirt/engine/core/vdsbroker/monitoring/VmsMonitoring.java

ljelinkova · 2022-07-11T10:56:05Z

Could you please update to the latest master before we continue the review? Changes introduced in 2d8e98f will effect your PR.

liranr23 · 2022-07-12T08:53:34Z

Could you please update to the latest master before we continue the review? Changes introduced in 2d8e98f will effect your PR.

Any specific flow I should test with it? The annoying part now is about running VM, changing something that relates to NUMA (selecting host to run on, changing to high performance), indicates that NUMA configuration changed but not really.

backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/UpdateVmCommand.java

...r/modules/vdsbroker/src/main/java/org/ovirt/engine/core/vdsbroker/monitoring/VmAnalyzer.java

liranr23 · 2022-07-14T16:16:55Z

/ost

backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/VmHandler.java

backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/UpdateVmCommand.java

liranr23 · 2022-07-18T11:18:44Z

/ost

ahadas · 2022-07-20T12:43:08Z

... and when updating the VM, it will see the current configuration as there is no NUMA to that VM.

do we still need this part or is everything sorted out with the last change that skipped that validation?

liranr23 · 2022-07-20T12:50:48Z

... and when updating the VM, it will see the current configuration as there is no NUMA to that VM.

do we still need this part or is everything sorted out with the last change that skipped that validation?

well, now we reset the vNUMA count to 0 on update/shutdown. so it's still correct to my eyes.

liranr23 · 2022-07-20T12:57:59Z

/ost

ahadas · 2022-07-20T13:13:34Z

... and when updating the VM, it will see the current configuration as there is no NUMA to that VM.

do we still need this part or is everything sorted out with the last change that skipped that validation?

well, now we reset the vNUMA count to 0 on update/shutdown. so it's still correct to my eyes.

I meant whether we need it in the code itself - if we clear the topology when the vm reaches down state and in update-vm we skip the problematic validation, is there another justification for setting the nodes to empty list on update-vm?

ljelinkova · 2022-07-20T13:40:16Z

... and when updating the VM, it will see the current configuration as there is no NUMA to that VM.

do we still need this part or is everything sorted out with the last change that skipped that validation?

well, now we reset the vNUMA count to 0 on update/shutdown. so it's still correct to my eyes.

I meant whether we need it in the code itself - if we clear the topology when the vm reaches down state and in update-vm we skip the problematic validation, is there another justification for setting the nodes to empty list on update-vm?

I think that is is a reasonable default and it is more understandable to the user, compared to having some nodes when the VM is down and different during the runtime. And when the VM is stopped back to the static nodes... It is confusing.

liranr23 · 2022-07-20T14:10:14Z

... and when updating the VM, it will see the current configuration as there is no NUMA to that VM.

do we still need this part or is everything sorted out with the last change that skipped that validation?

well, now we reset the vNUMA count to 0 on update/shutdown. so it's still correct to my eyes.

I meant whether we need it in the code itself - if we clear the topology when the vm reaches down state and in update-vm we skip the problematic validation, is there another justification for setting the nodes to empty list on update-vm?

I think that is is a reasonable default and it is more understandable to the user, compared to having some nodes when the VM is down and different during the runtime. And when the VM is stopped back to the static nodes... It is confusing.

i agree. although i think another change is required to the UI (in resize and pin don't show the popup field regarding vNUMA change), but without this part in backend it will be even more confusing, it will looks as there is next-run changes.

liranr23 · 2022-07-20T14:10:20Z

/ost

ahadas · 2022-07-20T15:06:28Z

I think that is is a reasonable default and it is more understandable to the user, compared to having some nodes when the VM is down and different during the runtime. And when the VM is stopped back to the static nodes... It is confusing.

I'm not sure we talk about the same thing then because I'm talking about something that should not be visible for users but to developers that see the code -
if we clear the numa settings when the vm goes down (which I believe we already do) then in the common scenario of updating a vm that is down, we don't need to do anything on update-vm (assuming the client got empty numa node list)
if the user updates the vm while it was running and provided us the automatically created numa settings, we should not persist them to the next-run snapshot as we know they are transient for vms with resize-and-pin
if the vm is down and the client provides us with numa settings, either because the user specified it or the client loaded the previously automatically created numa nodes, the backend should reject that rather than reset it to an empty list

from user's perspective it would be the same - the vm won't have numa nodes as long as the vm is down

liranr23 · 2022-07-21T08:36:39Z

I think that is is a reasonable default and it is more understandable to the user, compared to having some nodes when the VM is down and different during the runtime. And when the VM is stopped back to the static nodes... It is confusing.

I'm not sure we talk about the same thing then because I'm talking about something that should not be visible for users but to developers that see the code - if we clear the numa settings when the vm goes down (which I believe we already do) then in the common scenario of updating a vm that is down, we don't need to do anything on update-vm (assuming the client got empty numa node list) if the user updates the vm while it was running and provided us the automatically created numa settings, we should not persist them to the next-run snapshot as we know they are transient for vms with resize-and-pin if the vm is down and the client provides us with numa settings, either because the user specified it or the client loaded the previously automatically created numa nodes, the backend should reject that rather than reset it to an empty list

from user's perspective it would be the same - the vm won't have numa nodes as long as the vm is down

the current state hopefully answers it.

liranr23 · 2022-07-21T10:28:12Z

/ost

ahadas

looks better now, minor comment inside
I'm ok with dropping the validation of numa nodes and ignore it instead, if that simplifies the client side

backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/UpdateVmCommand.java

When running a `Resize and Pin NUMA` VM, we create the virtual NUMA nodes including the pinning to the physical NUMA nodes of the host. This is a static configuration that being set on the VM. Until now, this configuration harmed many flows to update the VM configuration and a manual workaround needed to overcome it. Now the NUMA configuration will be deleted once the VM shutdown and when updating the VM, it will see the current configuration as there is no NUMA to that VM. Change-Id: I7d6e8200e7830dc6903a76ac219225fa359d2c53 Bug-Url: https://bugzilla.redhat.com/2074525 Signed-off-by: Liran Rotenberg <lrotenbe@redhat.com>

ahadas · 2022-07-21T16:03:37Z

/ost

liranr23 requested review from ahadas, bennyz, emesika, mwperina, michalskrivanek, oliel, sgratch and didib as code owners June 28, 2022 13:10

liranr23 commented Jun 28, 2022

View reviewed changes

backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/UpdateVmCommand.java Outdated Show resolved Hide resolved

liranr23 added the virt label Jun 28, 2022

ahadas requested changes Jun 28, 2022

View reviewed changes

backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/UpdateVmCommand.java Outdated Show resolved Hide resolved

...manager/modules/vdsbroker/src/main/java/org/ovirt/engine/core/vdsbroker/ResourceManager.java Outdated Show resolved Hide resolved

ahadas reviewed Jun 28, 2022

View reviewed changes

backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/VdsEventListener.java Outdated Show resolved Hide resolved

ljelinkova reviewed Jun 29, 2022

View reviewed changes

backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/UpdateVmCommand.java Outdated Show resolved Hide resolved

liranr23 force-pushed the resize_reset_numa branch from b374935 to 244cdda Compare June 29, 2022 12:49

ahadas reviewed Jun 30, 2022

View reviewed changes

liranr23 force-pushed the resize_reset_numa branch from 244cdda to 3171014 Compare July 3, 2022 15:05

sandrobonazzola assigned liranr23 Jul 6, 2022

sandrobonazzola added this to the ovirt-4.5.2 milestone Jul 6, 2022

liranr23 force-pushed the resize_reset_numa branch from 3171014 to 1b6f5c2 Compare July 12, 2022 08:52

liranr23 force-pushed the resize_reset_numa branch from 1b6f5c2 to a1b7dca Compare July 12, 2022 08:59

ljelinkova reviewed Jul 12, 2022

View reviewed changes

backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/UpdateVmCommand.java Outdated Show resolved Hide resolved

ahadas reviewed Jul 13, 2022

View reviewed changes

...r/modules/vdsbroker/src/main/java/org/ovirt/engine/core/vdsbroker/monitoring/VmAnalyzer.java Outdated Show resolved Hide resolved

liranr23 force-pushed the resize_reset_numa branch from a1b7dca to 092ec61 Compare July 14, 2022 14:33

ahadas reviewed Jul 14, 2022

View reviewed changes

backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/VmHandler.java Outdated Show resolved Hide resolved

backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/UpdateVmCommand.java Outdated Show resolved Hide resolved

liranr23 force-pushed the resize_reset_numa branch from 092ec61 to 1e6e09d Compare July 17, 2022 11:52

ljelinkova approved these changes Jul 18, 2022

View reviewed changes

liranr23 force-pushed the resize_reset_numa branch from 1e6e09d to fffd518 Compare July 20, 2022 12:29

ahadas self-requested a review July 20, 2022 12:44

liranr23 force-pushed the resize_reset_numa branch from fffd518 to 5f911f9 Compare July 21, 2022 08:35

liranr23 force-pushed the resize_reset_numa branch from 5f911f9 to 839709d Compare July 21, 2022 08:56

ahadas reviewed Jul 21, 2022

View reviewed changes

backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/UpdateVmCommand.java Outdated Show resolved Hide resolved

liranr23 force-pushed the resize_reset_numa branch from 839709d to 13a6922 Compare July 21, 2022 15:18

ahadas approved these changes Jul 21, 2022

View reviewed changes

ahadas merged commit 18a2eb6 into oVirt:master Jul 21, 2022

liranr23 deleted the resize_reset_numa branch July 24, 2022 06:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reset NUMA configuration with resize and pin VM #504

Reset NUMA configuration with resize and pin VM #504

liranr23 commented Jun 28, 2022

ljelinkova left a comment

liranr23 commented Jun 29, 2022

ljelinkova commented Jul 11, 2022

liranr23 commented Jul 12, 2022

liranr23 commented Jul 14, 2022

liranr23 commented Jul 18, 2022

ahadas commented Jul 20, 2022 •

edited

Loading

liranr23 commented Jul 20, 2022

liranr23 commented Jul 20, 2022

ahadas commented Jul 20, 2022

ljelinkova commented Jul 20, 2022

liranr23 commented Jul 20, 2022

liranr23 commented Jul 20, 2022

ahadas commented Jul 20, 2022 •

edited

Loading

liranr23 commented Jul 21, 2022

liranr23 commented Jul 21, 2022

ahadas left a comment

ahadas commented Jul 21, 2022

Reset NUMA configuration with resize and pin VM #504

Reset NUMA configuration with resize and pin VM #504

Conversation

liranr23 commented Jun 28, 2022

ljelinkova left a comment

Choose a reason for hiding this comment

liranr23 commented Jun 29, 2022

ljelinkova commented Jul 11, 2022

liranr23 commented Jul 12, 2022

liranr23 commented Jul 14, 2022

liranr23 commented Jul 18, 2022

ahadas commented Jul 20, 2022 • edited Loading

liranr23 commented Jul 20, 2022

liranr23 commented Jul 20, 2022

ahadas commented Jul 20, 2022

ljelinkova commented Jul 20, 2022

liranr23 commented Jul 20, 2022

liranr23 commented Jul 20, 2022

ahadas commented Jul 20, 2022 • edited Loading

liranr23 commented Jul 21, 2022

liranr23 commented Jul 21, 2022

ahadas left a comment

Choose a reason for hiding this comment

ahadas commented Jul 21, 2022

ahadas commented Jul 20, 2022 •

edited

Loading

ahadas commented Jul 20, 2022 •

edited

Loading