You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have some VMs that have been created on the Proxmox cluster and added as HA resources. I am trying to modify the datastore for some disks on these VMs.
When trying to do that, the plugin gets confused by the fact that sending a shutdown command to a HA-enabled VM returns immediately, without the VM having been shut down. Because of this fact:
the plugin moves on to trying to move the disk;
the Proxmox cluster's HA manager tries to shutdown the VM, which is now locked by the disk move operation;
after a few attempts, the Proxmox cluster sets the VM's HA state to error.
On some of the VMs the disk was not fully moved (copy created on new datastore but disk hardware unchanged), on others there is a residual Unused disk for which no Rados image actually exists.
To Reproduce
Steps to reproduce the behavior:
Create a Terraform resource for a few VMs with one disk on some datastore,
Manually add the VMs as HA resources,
Update the Terraform resource in order to change the disks' datastore.
Run Terraform
Watch the fireworks happening on the Proxmox cluster
Expected behavior
The VMs should shut down, then the disks should be moved, and then they should restart.
Screenshot
The third line here is unrelated, but the other lines clearly show the problem.
Additional context
The issue doesn't occur systematically. Trying to do it on multiple VMs in parallel improves the chances of triggering it.
Some of the additional mess with the disk's configuration may be my fault as I tried to set the HA state to disabled while the operation was still running.
Using latest version + my own HA support branch, but I also tested with a manually-managed HA resource.
My datastores are all Ceph-based, although I don't believe this has any influence.
The text was updated successfully, but these errors were encountered:
I can confirm that the disk misconfiguration that followed was caused by my attempts at disabling the HA resource. If left untouched, it will exit with an error when trying to start the VM, as the HA resource's error state will prevent that.
Describe the bug
I have some VMs that have been created on the Proxmox cluster and added as HA resources. I am trying to modify the datastore for some disks on these VMs.
When trying to do that, the plugin gets confused by the fact that sending a shutdown command to a HA-enabled VM returns immediately, without the VM having been shut down. Because of this fact:
error
.On some of the VMs the disk was not fully moved (copy created on new datastore but disk hardware unchanged), on others there is a residual
Unused disk
for which no Rados image actually exists.To Reproduce
Steps to reproduce the behavior:
Expected behavior
The VMs should shut down, then the disks should be moved, and then they should restart.
Screenshot
The third line here is unrelated, but the other lines clearly show the problem.
Additional context
disabled
while the operation was still running.The text was updated successfully, but these errors were encountered: