Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Do not restart a crashed VM within the handleModify function.
The `handleModify` function is called when `domainmanager` gets a notification from another agent that DomainConfig for a VM has been modified (see the `handleDomainModify` handler). Among other things, the function handles the situation when the VM was already activated before but is inactive at the moment. The code for this is executed regardless of the content of the real difference seen by the other agent (for example, it could be a disconnected VIF). So, any other agent may trigger this function. In this case, the function reboots the VM. In fact, the VM should not be started again from this function in several cases: if there is an internal error in EVE OS; and if the VM was crashed from inside of the VM itself. For the first case, there is already a condition in the handling code (`status.State != types.BROKEN`). And for the second case, there is the handling code in the other function: `maybeRetryBoot`. The `maybeRetryBoot` function is called with 10-30 seconds intervals for all the VMs. It checks the state of once-started VMs and restarts the crashed ones. But in contrast to the code in the `handleModify` function, it takes into account the value in the `config.timer.reboot` property. This behavior is considered the right one, as a user may want to wait for some time before a crashed VM restarts. So, it is necessary to eliminate the restart of the crashed VM from the `handleModify` function, which does not take the timer into account. Inside the function, we can detect that the VM has been crashed by checking the values in `status.State` and `status.Error`. It's known that a crashed VM has `status.State` set to `HALTED`. The `Info` implementation for containerd sets this state (see the "stopped" -> `HALTED` mapping there). And it's guaranteed that the container has the state set to "stopped" because of the handler for the `SHUTDOWN` event in `qmp.go:qmpEventHandler()` that terminates the container. And it's also known that a crashed VM has `status.Error` set. The `verifyStatus` function sets it when it schedules a crashed VM for a reboot. Worth noting that `status.Error` can be set only in the case of 2 values in `status.State`: `type.BROKEN` and `type.HALTED`. It is enough, bearing in mind all the facts described above, to add a check for `status.HasError()` in the condition in the `handleModify` function to eliminate the restart of crashed VM earlier than in `config.timer.reboot` seconds. Signed-off-by: Nikolay Martyanov <ohmspectator@gmail.com>
- Loading branch information