-
-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bunch of (possible) problems with qemu-guest-agent #669
Comments
That is correct, and intentional. Provider waits for guest agent to provide VM IP addresses, which are exported using By setting
Also correct. Other resources can depend on those IP addresses. Terraform reads resource state, to determine if its attributes have changed (and if it is necessary to modify configuration of dependent resources).
It looks like the provider does only 1 API call to get list of all hosts, and them 1 API call for each proxmox host to get list of all its VMs, details here. Maybe some VM is locked by another action (Shutdown/Reboot waiting to time out...), and Proxmox takes a while to respond to list of all VMs ?
Proxmox behavior depends on If it is not possible to log into the VM to shut it down it cleanly (e.g. errors in cloud-init configuration), then there is the option to use 'Monitor' tab of VM details, and run command 'quit', that will forcibly Stop selected VM (with all the data safety of pulling the power cord from a computer). There are additional advantages in running qemu-guest agent - see Proxmox docs1, docs2, but the main purpose of using guest agent with terraform provider is access to access IP address assigned to the VM, which can then be used by other terraform resources. If agent is not installed inside the guest, then VM which has
There just is not a way to distinguish not-installed agent from not-yet-started agent. The user just has to tell the provider the truth or suffer the consequences.
The rather large default timeout is there, because it is possible for VM with agent installed to take rather long time to boot (and start the agent). Consider VM which performs long disk check on boot (because unclean shutdown), when using spinning HDDs it could take 10 minutes before disk check finishes and guest agent finally starts.
Is there any compatiblity issue with 'reboot = true' when If VM has If VM has Using "Stop" instead of "Shutdown" (or "Reset" instead of "Reboot") are not reasonable substitutes. Guest operating system would not have the option to cleanly shutdown, and data loss would be almost certain. |
@otopetrik
Yeah, i completely forgot about possibility to retrieve IP addresses of the VMs even though i planned to use that functionality myself! It just didn't make into my module yet and that's probably why i missed the point here.
I thought so. That's basically how i deploy qemu-guest-agent - by using cloud-init. Only one of my cloud images has agent preinstalled. I was testing out some people's requests that sometimes they would not want cloud-init being employed.
There's no any issue when its running and enabled. I still have issues when agent is not running inside a guest and i disable in resource's arguments and employ "reboot = true". I just tested it right now to be sure. If i understood correctly what you said in such configuration (agent is not installed and disabled) reboot should still be possible.
I use that reboot option to make sure that all cloud-init provisioning is done completely and correctly and that deployed guest VM won't have any issues after reboot. Better to know it sooner than later. Also Ubuntu for example even with relatively fresh images installs plenty of updates and usually requests reboot which is pending in the system. @bpg I'm sorry for inconvenience if my comment will reopen the issue and i'll be unsuccessful in closing it again. |
@otopetrik resource "proxmox_virtual_environment_vm" "px_vm" {
for_each = var.config_px_vm
name = each.key
description = each.value.description
tags = sort(concat(each.value.tags, ["terraform"]))
pool_id = each.value.pool_id
node_name = each.value.node_name
vm_id = each.value.vm_id
migrate = each.value.migrate
on_boot = each.value.on_boot
started = each.value.started
reboot = each.value.reboot
scsi_hardware = each.value.scsi_hardware
boot_order = each.value.boot_order
agent {
enabled = each.value.agent_enabled
trim = true
timeout = "30s"
}
clone {
node_name = each.value.clone_node_name
retries = 3
vm_id = proxmox_virtual_environment_vm.px_template[each.value.clone_vm_id].id
}
cpu {
architecture = each.value.cpu_arch
type = each.value.cpu_type
cores = each.value.cores
sockets = each.value.sockets
numa = each.value.numa
}
memory {
dedicated = each.value.memory
floating = each.value.memory
}
disk {
datastore_id = "vzdata"
discard = "on"
file_format = "qcow2"
interface = "scsi0"
iothread = true
size = 50
ssd = true
}
initialization {
datastore_id = "vzdata"
interface = "ide2"
ip_config {
ipv4 {
address = each.value.ipv4_address
gateway = each.value.ipv4_address == "dhcp" ? null : each.value.ipv4_gateway
}
}
user_data_file_id = each.value.cicustom_userdata == false ? null : proxmox_virtual_environment_file.cloud_config_userdata_raw_file[each.value.clone_node_name].id
}
network_device {
bridge = "vmbr0"
}
operating_system {
type = each.value.os_type
}
serial_device {
device = "socket"
}
# lifecycle {
# ignore_changes = [
# initialization[0].user_account,
# initialization[0].user_data_file_id
# ]
# }
} And here's a map of values I'm iterating over: locals {
config_px_vm = {
"astra-test-vm" = {
description = "Managed by Terraform"
tags = ["astra-1.7.4-base"]
node_name = "prox-srv1"
migrate = true
on_boot = true
started = true
reboot = true
cicustom_userdata = false
agent_enabled = false
scsi_hardware = "virtio-scsi-single"
boot_order = ["scsi0"]
clone_node_name = "prox-srv1"
clone_vm_id = "astra-1.7.4-base"
cpu_arch = "x86_64"
cpu_type = "host"
cores = 1
sockets = 2
numa = true
memory = 4096
ipv4_address = "10.177.144.224/24"
ipv4_gateway = "10.177.144.254"
os_type = "l26"
}
"ubuntu-test-vm" = {
description = "Managed by Terraform"
tags = ["ubuntu-22.04"]
node_name = "prox-srv1"
migrate = true
on_boot = true
started = true
reboot = true
cicustom_userdata = false
agent_enabled = false
scsi_hardware = "virtio-scsi-single"
boot_order = ["scsi0"]
clone_node_name = "prox-srv1"
clone_vm_id = "ubuntu-22.04"
cpu_arch = "x86_64"
cpu_type = "host"
cores = 1
sockets = 2
numa = true
memory = 4096
ipv4_address = "dhcp"
os_type = "l26"
}
"debian-test-vm" = {
description = "Managed by Terraform"
tags = ["debian-12"]
node_name = "prox-srv1"
migrate = true
on_boot = true
started = true
reboot = true
cicustom_userdata = false
agent_enabled = false
scsi_hardware = "virtio-scsi-single"
boot_order = ["scsi0"]
clone_node_name = "prox-srv1"
clone_vm_id = "debian-12"
cpu_arch = "x86_64"
cpu_type = "host"
cores = 1
sockets = 2
numa = true
memory = 4096
ipv4_address = "dhcp"
os_type = "l26"
}
}
} Resulting in: module.proxmox.proxmox_virtual_environment_vm.px_vm["ubuntu-test-vm"]: Still creating... [3m40s elapsed]
module.proxmox.proxmox_virtual_environment_vm.px_vm["astra-test-vm"]: Still creating... [3m40s elapsed]
module.proxmox.proxmox_virtual_environment_vm.px_vm["debian-test-vm"]: Still creating... [3m40s elapsed] |
Thanks for exploring the If VM is configured with In any case, I see no reason to attempt to reboot the VM this early in the boot process and have no real idea about the intented use case. To reboot VM after cloud-init, consider using package_reboot_if_required or power_state. Reboot from inside the VM can ensure that all the configuration is actually applied. It does not seem like the Making it (somewhat) work with Making it work with Without knowing the original purpose of the If you thing that |
Describe the bug
Here I created a VM ("debian-test-vm") with "agent.enabled = true", "agent.timeout = 30s", "reboot = false" and no actual agent being installed inside guest.
Then I just ran plan generation which hanged for that value (plus some overhead i guess):
Here's a new plan output now with disabled qemu guest agent ("agent.enabled = false") and same other settings:
Here's an example where i tried to create 3 VMs from scratch (not tainted already existing resources) and they had "agent.enabled = true", "agent.timeout = 30s" and "reboot = true" (according to Proxmox's GUI the process stuck at Reboot task):
Expected behavior
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: