No ip addresses are provided with enabled agent #776

BaldFabi · 2023-12-06T21:47:19Z

Describe the bug
I try to clone a vm from a template with an enabled agent. I've found the issue #100 which looks like it's related to my problem.
It looks like the provider doesn't wait long enough (or something like that) because the ip is displayed in the Proxmox gui at the vm summary. Obviously the ip is not instantly available to Proxmox but after a couple of seconds after the vm started.

To Reproduce
Steps to reproduce the behavior:

Create a template which has the qemu agent already installed
Create a config that clones that created template with an enabled agent directive
The clone will fail and no ip is saved in the state file. But next to the vm the ip is displayed (as shown in the screenshot)

resource "proxmox_virtual_environment_vm" "machinexyz" {
  name      = "machinexyz"
  node_name = "server01"

  operating_system {
    type = "l26"
  }

  on_boot = true

  clone {
    vm_id = 912
  }

  agent {
    enabled = true
  }

  memory {
    dedicated = 4096
  }

  cpu {
    cores = 4
    type  = "x86-64-v2-AES"
  }

  disk {
    datastore_id = "pool1"
    size         = 20
    interface    = "scsi0"
  }

  connection {
    type     = "ssh"
    user     = "root"
    password = local.root_password
    host     = self.ipv4_addresses[0]
  }
}

Expected behavior
The provider should wait the defined (or default) value of the timeout option

Screenshots
IP in the Proxmox GUI

The error

╷
│ Error: Attempt to index null value
│
│   on machine.tf line 45, in resource "proxmox_virtual_environment_vm" "machine":
│   45:     host     = self.ipv4_addresses[0]
│     ├────────────────
│     │ self.ipv4_addresses is null
│
│ This value is null, so it does not have any indices.

Additional context

It's a single instance Proxmox server
Provider version 0.39.0
Terraform version v1.5.7
OS: MacOS

The text was updated successfully, but these errors were encountered:

BaldFabi · 2023-12-10T01:03:03Z

I just tried some things and found out that a previous warning I also had is the reason for this. My template had the iothread option set on the harddisk.
After removing it the ipv4_addresses attribute wasn't null anymore.
It's a little bit weird that the warning causes this.

Edit: And at the moment I don't have a clue how the ipv4_addresses is structured.

bpg · 2023-12-12T00:36:18Z

I just tried some things and found out that a previous warning I also had is the reason for this. My template had the iothread option set on the harddisk.

That could be related to #360, changing disk attributes while cloning might not always work as expected.
But I'm also curios what what the "previous warning" that you also saw. Do you have it captured somewhere, by any chance?

Also, you many want to skip the disk block in the clone if you just want to use the disk from the template.

Edit: And at the moment I don't have a clue how the ipv4_addresses is structured.

You can check your local terraform.tfstate, for my test VM it is

            "ipv4_addresses": [
              [
                "127.0.0.1"
              ],
              [
                "192.168.3.205"
              ]
            ],

vrcdx64 · 2023-12-12T09:03:10Z

Hello,

Proxmox single node 8.0.3
Provider version 0.40.0
Terraform 1.6.5

I'm learning Terraform and I have exactly the same problem as described by the author. My template, with enabled qemu-agent, doesn't use iothread (default value is false).

If i check the values of ipv4_addresses, ipv6_addresses or network_interface_names after the error, in TF, the list are empty. But on the Proxmox web UI the values are here. I've tried to adjust the agent timeout value but the default is 15m which is enough.

ll see if I can dig into the problem. Don't hesitate to ask me if I can help to troubleshoot.

BaldFabi · 2023-12-12T10:26:00Z

I just rerun Terraform with the iothread attribute set on the template to trigger the warning again

╷
│ Warning: the VM startup task finished with a warning, task log:
│
│       | WARN: iothread is only valid with virtio disk or virtio-scsi-single controller, ignoring
│       | TASK WARNINGS: 1
│
│   with proxmox_virtual_environment_vm.machine,
│   on machine.tf line 1, in resource "proxmox_virtual_environment_vm" "machine":
│    1: resource "proxmox_virtual_environment_vm" "machine" {
│
╵

Also, you many want to skip the disk block in the clone if you just want to use the disk from the template.

But if I skip the disk block the disk wouldn't be cloned right?
Otherwise I would have to recreate the template each time I want to provision a new vm or skip this step and provision and install it via iso.

You can check your local terraform.tfstate, for my test VM it is

            "ipv4_addresses": [
              [
                "127.0.0.1"
              ],
              [
                "192.168.3.205"
              ]
            ],

Thats a good hint. I did a rerun without the iothread attribute to prevent the warning again.
The ipv4_addresses in my state file are now like yours. Wouldn't it make sense to purge 127.0.0.1 and simplify the slice to be one dimensional?

otopetrik · 2023-12-12T11:31:46Z

Wouldn't it make sense to purge 127.0.0.1 and simplify the slice to be one dimensional?

Probably not. There are use cases for VMs with multiple interfaces (router, internal cluster networks, etc...), and even some use cases for one interface to have multiple addresses (high-availability using virtual ip).

The provider waits for one "reasonable" ip address (i.e. better than link-local), this fixes the original issue, where link-local ipv6 address was obtained the faster than ipv4 from DHCP server.

In cases where waiting for multiple interfaces/addresses is required, it should be possible to delay starting the qemu-guest-agent inside the VM until all addresses are obtained (by modifying guest agent's systemd unit dependencies).

ipv4_addresses data is taken directly from qemu-guest-agent, which reports all interfaces (including loopback) and uses names used by the system inside the VM (i.e. not "net0" but "eth0","eno1", "enp5s0" and likely even language-specific names in case of windows VMs).

Using fixed index like self.ipv4_addresses[0] does not really work.

Using element(element(self.ipv4_addresses, index(self.network_interface_names, "eth0")), 0) should work - assuming that the interface inside the VM is "eth0", and not enp5s0 or similar.

It might be useful to add something like ipv4_addresses_by_device[], which would use mac addresses of configured network devices to find matching ip addresses from qemu-guest-agent output, then ipv4_addresses_by_device[0] would really mean ipv4 address of 'net0' network device of the VM.

(Changing behavior of existing ipv4_addresses is probably not a good idea. It would break existing configurations and it can be useful to have access to IP addresses assigned to non-hardware interfaces - e.g. VPN, PPPoE,...)

Sorixelle · 2024-02-10T13:47:24Z

I ran into this issue today, where the state refresh was timing out, and ipv4_addresses etc. were empty after an apply. The issue turned out to be that I had not granted the Proxmox user I configured the provider with the VM.Monitor privilege, which seems to be required to be able to retrieve this information. Just dropping this one in here, in case anyone runs into the same problem.

I wonder, could this be handled better? The API route being called was returning a 403 response in this case, so it would be possible for the provider to catch this case and show an error message to the user. If that's desired, I can open a separate issue to track that.

bpg · 2024-02-14T02:15:01Z

Hi @Sorixelle! 👋🏼

Thanks for sharing your use case. That's a good suggestion, the provider can definitely handle this type of errors better.
Please go ahead and open a separate issue for this enhancement, much appreciated!

bpg-autobot · 2024-08-13T00:03:45Z

Marking this issue as stale due to inactivity in the past 180 days. This helps us focus on the active issues. If this issue is reproducible with the latest version of the provider, please comment. If this issue receives no comments in the next 30 days it will automatically be closed. If this issue was automatically closed and you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thank you!

Moortu · 2024-10-07T03:15:48Z

I was experiencing this issue.

I followed the advice above:
my proxmox user had the vm.monitor permission
I disabled the iothread,

When I ran the terraform script a 2nd time, after it has created the vms, but failed to get the ip addresses, it did find the ipv4_addresses.

What for me the problem was is that I enabled uefi and tpm, but I didn't have the efi_disk.
This gave warnings, and it used efivars, but was not a blocker apparently.

After I added the efidisk, no warnings, and it did get the ipv4_addresses and everything.

repo: https://github.com/Moortu/terraform-proxmox-talos-k8s

neutralalice · 2024-10-22T21:43:26Z

Using element(element(self.ipv4_addresses, index(self.network_interface_names, "eth0")), 0) should work - assuming that the interface inside the VM is "eth0", and not enp5s0 or similar.

I gave this a shot, and found that on first apply, I'd get Call to function "index" failed: cannot search an empty list. While every apply after is fine.

Moortu · 2024-10-22T21:47:45Z

Using element(element(self.ipv4_addresses, index(self.network_interface_names, "eth0")), 0) should work - assuming that the interface inside the VM is "eth0", and not enp5s0 or similar.

I gave this a shot, and found that on first apply, I'd get Call to function "index" failed: cannot search an empty list. While every apply after is fine.

I had this as well.
This seems to indicate a wrong configuration in my experience.
Do you use secure boot?

Or are you getting any warnings in your log?

neutralalice · 2024-10-22T22:26:52Z

Using element(element(self.ipv4_addresses, index(self.network_interface_names, "eth0")), 0) should work - assuming that the interface inside the VM is "eth0", and not enp5s0 or similar.

I gave this a shot, and found that on first apply, I'd get Call to function "index" failed: cannot search an empty list. While every apply after is fine.

I had this as well. This seems to indicate a wrong configuration in my experience. Do you use secure boot?

Or are you getting any warnings in your log?

Your setup is actually similar to mine. Downloading talos with the qemu-guest-agent extention from image factory.

no uefi/secure boot. no warnings that appear related to me other than the error for the above function call

I see in your setup, you actually have a 5second wait for outputs, so I'll give that a try.

Edit: That didn't work.

Moortu · 2024-10-23T07:39:32Z

Using element(element(self.ipv4_addresses, index(self.network_interface_names, "eth0")), 0) should work - assuming that the interface inside the VM is "eth0", and not enp5s0 or similar.

I gave this a shot, and found that on first apply, I'd get Call to function "index" failed: cannot search an empty list. While every apply after is fine.

I had this as well. This seems to indicate a wrong configuration in my experience. Do you use secure boot?
Or are you getting any warnings in your log?

Your setup is actually similar to mine. Downloading talos with the qemu-guest-agent extention from image factory.

no uefi/secure boot. no warnings that appear related to me other than the error for the above function call

I see in your setup, you actually have a 5second wait for outputs, so I'll give that a try.

Edit: That didn't work.

Any warning from the bgp provider or talos will most likely cause the problem. Even if it seems unrelated.

Can you share your repo?

neutralalice · 2024-10-23T22:46:13Z

https://github.com/neutralalice/talos-on-proxmox

I went ahead and tried all the other things listed and still get the issue on first apply.

I'm not yet at the point where nodes are joining together; this is just populating them on proxmox.

With TF_LOG=WARN The only error I see other than the the element output function call error is.

2024-10-23T23:39:14.737+0100 [WARN] Provider "provider[\"registry.opentofu.org/bpg/proxmox\"]" produced an unexpected new value for module.control_planes.proxmox_virtual_environment_vm.node[2], but we are tolerating it because it is using the legacy plugin SDK.

Edit: @Moortu I had another chance to look at this and it ended up being strictly because the qemu guest agent had not yet started reporting out the interfaces. By upping the time delay of output to 10seconds, I would usually get the IPv4 address, but not the ipv6 address. By increasing it to 15-20s, I get all of the ipv4 and most(but not always all!) of the ipv6 address. 25seconds seems to be enough time for me to always get the ip addresses on first apply.

Moortu · 2024-10-26T21:04:30Z

https://github.com/neutralalice/talos-on-proxmox

I went ahead and tried all the other things listed and still get the issue on first apply.

I'm not yet at the point where nodes are joining together; this is just populating them on proxmox.

With TF_LOG=WARN The only error I see other than the the element output function call error is.

2024-10-23T23:39:14.737+0100 [WARN] Provider "provider[\"registry.opentofu.org/bpg/proxmox\"]" produced an unexpected new value for module.control_planes.proxmox_virtual_environment_vm.node[2], but we are tolerating it because it is using the legacy plugin SDK.

Edit: @Moortu I had another chance to look at this and it ended up being strictly because the qemu guest agent had not yet started reporting out the interfaces. By upping the time delay of output to 10seconds, I would usually get the IPv4 address, but not the ipv6 address. By increasing it to 15-20s, I get all of the ipv4 and most(but not always all!) of the ipv6 address. 25seconds seems to be enough time for me to always get the ip addresses on first apply.

that agent timeout behaved strangely for me.
I would keep it at a few minutes at least.
default is 15m

neutralalice · 2024-10-26T22:57:12Z

that agent timeout behaved strangely for me. I would keep it at a few minutes at least. default is 15m

Yea, this does seem to be it with no need for a sleep timer. I had the timeout really low because I used to load the guest agent in via cloud-init, before image factory came along - so I had a really low timeout to compensate; looks like the sleep timer resource gave just enough buffer to get a qemu response.

Moving on from that, I wonder if there is something in this function that can make waiting for the interface IP addresses to be more robust. As mentioned, without a sleep timer after resource creation, I only seem to be getting the an ipv4 address and a link-local ipv6, but I'm actually only interested in the global ipv6 address since the rest of my network runs ipv6 (mostly non dual-stack).

bpg added the 🐛 bug Something isn't working label Dec 6, 2023

bpg-autobot bot added the stale label Aug 13, 2024

bpg added acknowledged and removed stale labels Aug 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No ip addresses are provided with enabled agent #776

No ip addresses are provided with enabled agent #776

BaldFabi commented Dec 6, 2023

BaldFabi commented Dec 10, 2023 •

edited

Loading

bpg commented Dec 12, 2023 •

edited

Loading

vrcdx64 commented Dec 12, 2023

BaldFabi commented Dec 12, 2023

otopetrik commented Dec 12, 2023

Sorixelle commented Feb 10, 2024 •

edited

Loading

bpg commented Feb 14, 2024

bpg-autobot bot commented Aug 13, 2024

Moortu commented Oct 7, 2024

neutralalice commented Oct 22, 2024

Moortu commented Oct 22, 2024 •

edited

Loading

neutralalice commented Oct 22, 2024 •

edited

Loading

Moortu commented Oct 23, 2024 •

edited

Loading

neutralalice commented Oct 23, 2024 •

edited

Loading

Moortu commented Oct 26, 2024

neutralalice commented Oct 26, 2024

No ip addresses are provided with enabled agent #776

No ip addresses are provided with enabled agent #776

Comments

BaldFabi commented Dec 6, 2023

BaldFabi commented Dec 10, 2023 • edited Loading

bpg commented Dec 12, 2023 • edited Loading

vrcdx64 commented Dec 12, 2023

BaldFabi commented Dec 12, 2023

otopetrik commented Dec 12, 2023

Sorixelle commented Feb 10, 2024 • edited Loading

bpg commented Feb 14, 2024

bpg-autobot bot commented Aug 13, 2024

Moortu commented Oct 7, 2024

neutralalice commented Oct 22, 2024

Moortu commented Oct 22, 2024 • edited Loading

neutralalice commented Oct 22, 2024 • edited Loading

Moortu commented Oct 23, 2024 • edited Loading

neutralalice commented Oct 23, 2024 • edited Loading

Moortu commented Oct 26, 2024

neutralalice commented Oct 26, 2024

BaldFabi commented Dec 10, 2023 •

edited

Loading

bpg commented Dec 12, 2023 •

edited

Loading

Sorixelle commented Feb 10, 2024 •

edited

Loading

Moortu commented Oct 22, 2024 •

edited

Loading

neutralalice commented Oct 22, 2024 •

edited

Loading

Moortu commented Oct 23, 2024 •

edited

Loading

neutralalice commented Oct 23, 2024 •

edited

Loading