-
Notifications
You must be signed in to change notification settings - Fork 271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Race condition causes empty ovf field in vm list #797
Comments
The thread that removes the devices is the following btw:
|
Closes: oVirt#797 During the creation of the ovf config export, oVirt does modifications to the VmDevices. Now another thread cleans those entries again, and if you are unlucky it happens during the ovf generation itself. DEBUG [org.ovirt.engine.core.vdsbroker.monitoring.VmDevicesMonitoring] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-60) [] VM 'xxxx' unmanaged device was marked for remove : {1} And this causes the export to be skipped. So we add a device lock during the ovf export to resolve this. Signed-off-by: Jean-Louis Dupond <jean-louis@dupond.be>
@ahadas :
This was for example a VM that did not contain its ovf config in the api output. And then the VmDevicesMonitoring directly removes them again, before the ovf export ended. |
@dupondje and what happened before 2023-01-12 09:40:56 that led to this update? |
The updates are caused by the ovf export afaik. (https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/GetVmOvfByVmIdQuery.java#L39). |
@dupondje thanks for the additional logs - I'm afraid we can't tell what caused the VM to update from the portion of the log above because unless I'm missing something there're only calls to |
@ahadas: Thanks for checking. I tought the regular DumpXmls calls were normal, but your comment made me aware its not :) So I checked further, to find out why oVirt kept running the DumpXmls calls. It seems to run the DumpXml because of the following: In the attached logs you can clearly see the thread adds and removes some devices on the vm. Which will bump the generation and cause then the ovf export to fail. |
And if I check the db, I see indeed that the devices hash seems to change quite often.
For some reason the VM is stuck in RebootInProgress in VDSM, which seems to cause weird things? After restarting vdsm, the changes in devices hash went away, but still the timeouts to the guest agent for some reason... |
good, so now you don't see so many calls to DumpXmls, right? if so, that should practically resolve the issue with the missing OVFs - not entirely, it can still happen when the generation changes which in my opinion is something that should be handled on the client side, but the frequency should be reduced significantly
I don't think it's related to this issue because it shouldn't affect the computed hash but maybe you face the issue that was reported in https://bugzilla.redhat.com/show_bug.cgi?id=2120381? |
@ahadas : Finally found the real cause of the continous dumpxml's :) Fix in oVirt/vdsm#369 |
When taking backups of our VM's we also backup the OVF which is returned by the vm's call '/ovirt-engine/api/vms?all_content=true'
Now we started to notice that sometimes the OVF was missing from the output, if you refresh, random VM's end up without OVF data.
After some debugging this looked like a race condition between the export and some monitoring thread?
What we observed:
During the export it does:
But some moments later:
Some other thread removes the vm devices.
And afaik this bumps the generation, which causes the ovf helper not to export the ova/ovf:
https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/storage/ovfstore/OvfHelper.java#L184
As I don't know the code completely, it's hard to know what the correct fix would be here :) Guess we need some locking on the exporting code so nothing else modifies the vm while exporting?
The text was updated successfully, but these errors were encountered: