Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

no cloud agents: qemu #74

Open
bgilbert opened this issue Oct 31, 2018 · 18 comments
Open

no cloud agents: qemu #74

bgilbert opened this issue Oct 31, 2018 · 18 comments
Labels
cloud* related to public/private clouds

Comments

@bgilbert
Copy link
Contributor

In #12 we decided that we'd like to try to not ship cloud agents. This ticket will document investigation and strategy for shipping without a cloud agent on the qemu virtualization platform.

See also #41 for a discussion of how to ship cloud specific bits using ignition.

@dustymabe
Copy link
Member

😄

I figured this one didn't need any explanation.. We don't need anything special for qemu do we ?

@bgilbert
Copy link
Contributor Author

bgilbert commented Nov 1, 2018

QEMU developers would like Ignition to read configs from SMBIOS OEM strings rather than the -fw_cfg mechanism; see coreos/ignition#656. This seems likely to require changes in upstream QEMU and libvirt as well as in Ignition. The switch from CL to FCOS would be a good time to document and recommend the new mechanism if it's available.

@mrguitar
Copy link

mrguitar commented Dec 3, 2018

@bgilbert we're being asked to include the qemu-guest-agent.

If we don't include any agents, how do we justify the reduced feature set and degraded experience? It looks like the qemu agent assists w/ guest actions like shutdown/reboot as well as file system quiesce actions.

I'm probably more concerned about this approach for the open-vm-tools agent which has many more capabilities. Thoughts?

@bgilbert
Copy link
Contributor Author

bgilbert commented Dec 3, 2018

If we don't include any agents, how do we justify the reduced feature set and degraded experience?

I'd argue the other way around: if we're going to include an agent, we'd need to be convinced that doing so is a net improvement. Looking at the QGA command schema, I'm seeing:

  • Commands redundant with ACPI (reboot, shutdown, suspend)
  • Commands supporting suspend (time sync, FS freeze)
  • Resource scaling commands (FS trim, scaling VCPUs, memory ballooning)
  • Security bypass commands (arbitrary file I/O, run arbitrary commands, list active users, set passwords)

The resource scaling commands are useful, as well as potentially the suspend support if that's a use case we're interested in. OTOH, the security bypass commands are exactly the sort of functionality that makes agents problematic.

@dustymabe
Copy link
Member

@mrguitar - We don't currently include qemu-guest-agent in fedora Atomic Host. I'm not necessarily opposed to including it in Fedora CoreOS for a good reason. We decided to open these tickets for every platform so we could deliberate, decide, and document the outcomes. Thanks for joining the discussion :)

@mrguitar
Copy link

mrguitar commented Dec 3, 2018

I'm going to pull that team into this discussion. I think that's probably the best next step. Thanks guys.

@dustymabe dustymabe added the cloud* related to public/private clouds label Dec 13, 2018
@ilyesAj
Copy link

ilyesAj commented Jul 3, 2020

any news here ? is there a solution for installing ovirt-guest-agent on fedora core os ?

@titou10titou10
Copy link

If qemu guest agent is not included in FCOS, is there a way to install it ?
In addition to what has been said before, it is usefull when performing live backups. ie Proxmox backup command issue fs-freeze, fs-thaw etc kind of commands at the beginning of a backup

@5aji
Copy link

5aji commented Aug 9, 2020

There seems to be a solution to this but it is behind a Red Hat subscription paywall. Solution Page. It seems like this was determined to not be important to CoreOS in non-Red Hat distributions. I would vouch for this being included either as a container or built in to the qcow image.

EDIT: an alternative is running a container and passing the device through. linuxkit/qemu-ga seems to be updated.

[Unit]
Description=QEMU Guest Agent
After=docker.service
Requires=docker.service

[Service]
TimeoutStartSec=0
Restart=always
ExecStartPre=-/usr/bin/docker stop %n
ExecStartPre=-/usr/bin/docker rm %n
ExecStartPre=/usr/bin/docker pull linuxkit/qemu-ga:v0.8
ExecStart=/usr/bin/docker run --rm --device=/dev/virtio-ports/org.qemu.guest_agent.0 --name test linuxkit/qemu-ga:v0.8 /usr/bin/qemu-ga -m virtio-serial -p /dev/virtio-ports/org.qemu.guest_agent.0

[Install]
WantedBy=multi-user.target

@titou10titou10
Copy link

Here is the "solution" from RH:

Issue
qemu-guest-agent is not included in Red Hat Enterprise Linux CoreOS for OpenShift 4
We need to install 'qemu-guest-agent' and on RHCOS nodes

Resolution
The qemu-guest-agent is not currently available nor supported on RHEL CoreOS (RHCOS) nodes in OpenShift Container Platform 4.x.

@mat1010
Copy link

mat1010 commented Aug 21, 2020

What about installing it through rpm-ostree install qemu-guest-agent? Seems to work like expected, running on Fedora CoreOS 31.20200407.3.0:

# rpm-ostree install qemu-guest-agent
Checking out tree 89e17cc... done
Enabled rpm-md repositories: updates fedora
Updating metadata for 'updates'... done
rpm-md repo 'updates'; generated: 2020-08-20T00:55:24Z
Updating metadata for 'fedora'... done
rpm-md repo 'fedora'; generated: 2019-10-23T22:52:47Z
Importing rpm-md... done
Resolving dependencies... done
Will download: 2 packages (403.6 kB)
Downloading from 'updates'... done
Downloading from 'fedora'... done
Importing packages... done
Checking out packages... done
Running pre scripts... done
Running post scripts... done
Running posttrans scripts... done
Writing rpmdb... done
Writing OSTree commit... done
Staging deployment... done
Added:
  pixman-0.38.4-1.fc31.x86_64
  qemu-guest-agent-2:4.1.1-1.fc31.x86_64
Run "systemctl reboot" to start a reboot
# systemctl status qemu-guest-agent.service
● qemu-guest-agent.service - QEMU Guest Agent
   Loaded: loaded (/usr/lib/systemd/system/qemu-guest-agent.service; enabled; vendor preset: enabled)
   Active: active (running) since Fri 2020-08-21 11:15:37 UTC; 8min ago
 Main PID: 772 (qemu-ga)
    Tasks: 1 (limit: 4625)
   Memory: 1.8M
   CGroup: /system.slice/qemu-guest-agent.service
           └─772 /usr/bin/qemu-ga --method=virtio-serial --path=/dev/virtio-ports/org.qemu.guest_agent.0 --blacklist= -F/etc/qemu-ga/fsfreeze-hook

Aug 21 11:15:37 ********* systemd[1]: Started QEMU Guest Agent.

We are using it on ovirt and the information are properly reported and populated in the ui.

@dustymabe
Copy link
Member

What about installing it through rpm-ostree install qemu-guest-agent? Seems to work like expected, running on Fedora CoreOS 31.20200407.3.0:

Yep. You can do it with package layering, but do note #400 - we're working on a solution to make the layering more reliable, but currently you might hit an issue, so keep that in mind.

@Nick2253
Copy link

I'm working on getting qemu-guest-agent running using your the container method suggested by @kschamplin in #74 (comment), but I'm struggling to get the reboot/shutdown agent commands working. I'm using Proxmox 6.2 as my hypervisor, if it matters.

I'm using the following exec statement:

docker run --rm --device=/dev/virtio-ports/org.qemu.guest_agent.0 --net=host --ipc=host --pid=host --name qemu-guest-agent linuxkit/qemu-ga:v0.8 /usr/bin/qemu-ga -m virtio-serial -p /dev/virtio-ports/org.qemu.guest_agent.0

When I issue a shutdown or reboot command through the Proxmox GUI or via qm agent <vid> shutdown (for example), I get the following error:

**
ERROR:/home/buildozer/aports/community/qemu/src/qemu-4.2.0/qga/main.c:532:send_response: assertion failed: (rsp && s->channel)
Bail out! ERROR:/home/buildozer/aports/community/qemu/src/qemu-4.2.0/qga/main.c:532:send_response: assertion failed: (rsp && s->channel)

If I add --privileged, I get the following error upon executing the container:

1603744671.54205: critical: error opening channel: No such file or directory
1603744671.54234: critical: error opening channel
1603744671.54241: critical: failed to create guest agent channel
1603744671.54246: critical: failed to initialize guest agent channel

Any suggestions for what I could do to make this work?

@ssams
Copy link

ssams commented Nov 23, 2020

I've also tried to get the containerized version running as it seems to be the cleanest approach (as there are no modifications of the base system), however there are multiple problems with this solution at the moment. The errors as reported by @Nick2253 in #74 (comment) are caused by multiple issues:

When I issue a shutdown or reboot command through the Proxmox GUI or via qm agent <vid> shutdown (for example), I get the following error:

**
ERROR:/home/buildozer/aports/community/qemu/src/qemu-4.2.0/qga/main.c:532:send_response: assertion failed: (rsp && s->channel)
Bail out! ERROR:/home/buildozer/aports/community/qemu/src/qemu-4.2.0/qga/main.c:532:send_response: assertion failed: (rsp && s->channel)

This seems to be caused by a bug in the guest agent itself - the docker image linuxkit/qemu-ga:v0.8 contains qemu-ga version 4.2.0 -> https://bugzilla.redhat.com/show_bug.cgi?id=1884531 (bug apparently introduced in 4.0.0 and fixed in 5.1.0). The agent does not crash when using linuxkit/qemu-ga:v0.7, which contains qemu-ga version 3.1.0.

If I add --privileged, I get the following error upon executing the container:

1603744671.54205: critical: error opening channel: No such file or directory
1603744671.54234: critical: error opening channel
1603744671.54241: critical: failed to create guest agent channel
1603744671.54246: critical: failed to initialize guest agent channel

I couldn't figure out the exact root of the problem, but this is related to the access of the virtio device. In Fedora CoreOS, /dev/virtio-ports/org.qemu.guest_agent.0 is actually visible as symlink:

$ ls -l /dev/virtio-ports/org.qemu.guest_agent.0 
lrwxrwxrwx. 1 root root 11 23. Nov 12:10 /dev/virtio-ports/org.qemu.guest_agent.0 -> ../vport2p1

When using --privileged and any other device than the original path (/dev/vport2p1 in my case), the agent failed. Thus the following two variants worked on my system:

  1. using the path of the "original" device in the container:
    podman run  --privileged --rm --pid=host --ipc=host --net=host --device=/dev/virtio-ports/org.qemu.guest_agent.0  linuxkit/qemu-ga:v0.7 /usr/bin/qemu-ga -m virtio-serial  -p /dev/vport2p1
    
    Note that I'm using podman, which automatically resolves the symlink and only makes the target available within the container. Manually setting the device path (by appending :/dev/other to the device parameter) and using that one with the agent however also did not work.
  2. Make the whole /dev path available within the container:
    podman run --privileged --rm --pid=host --ipc=host --net=host -v /dev:/dev   -it linuxkit/qemu-ga:v0.7 /usr/bin/qemu-ga -m virtio-serial
    

Any suggestions for what I could do to make this work?

None of the listed fixes are sufficient to provide a complete solution, as shutdown is still not possible from within the container - the agent simply does not crash or complain anymore, but it hangs upon requesting a shutdown. The reasons is probably the way the feature is implemented in the agent itself, as it tries to call /sbin/shutdown (https://github.com/qemu/qemu/blob/v3.1.0/qga/commands-posix.c#L110). This is not available in a standard container. I'd say shutting down the host from within a container is an interesting task in general, although possible in general it usually requires some extra tricks (e.g. by having systemd available inside the container and mounting appropriate sockets from the host, or via SysRq, see also https://stackoverflow.com/a/24759427 for some hints).

So I suppose this would need addition of several extra scripts or similar modifications to the guest agent container to make it work, deviating significantly from the premise of a simple setup via the container. Hence I don't think the container approach is worth the effort when requiring shutdown capabilities. I'll also try to install it via rpm-ostree for now, but I believe the best solution would be to get the agent integrated properly into the base image.

Note: if shutdown functionality is not needed, blacklisting the guest-shutdown command helps to avoid the hangup in case the hypervisor issues the command nevertheless (but keep in mind it will be discarded, may have unintended side effects in the hypervisor logic).

@Nick2253
Copy link

I am now knee deep in the process of trying to get qemu-guest-agent working through an Alpine base, but I'm running into problems with the shutdown command, and I'm assuming it has something to do with a lack of understanding on how Linux works this magic.

I'm building the container using:

FROM alpine:3.15.2

RUN apk add --update --no-cache qemu-guest-agent

ENTRYPOINT [ "/usr/bin/qemu-ga" ]
CMD ["-m", "virtio-serial", "-p", "/dev/virtio-ports/org.qemu.guest_agent.0"]

I then build and run the container as follows:

sudo docker build -f Dockerfile -t qemu-guest-agent:dev1
sudo docker run --rm --name qemu-ga --privileged -v /dev:/dev --ipc=host --net=host qemu-guest-agent:dev1

This gets me to a point where I have running guest agents, and I'm able to view IP addresses through the Proxmox interface, but as before, shutdown/reboot/etc commands don't work. However, unlike before, I don't get any errors; if I run the Shutdown command, the container kills the guest agents, though it otherwise keeps ticking.

Doing some research, it looks like Alpine's version of qemu-guest-agent is patched to execute a second "fallback" shutdown command:

     if (!has_mode || strcmp(mode, "powerdown") == 0) {
         shutdown_flag = "-P";
+        fallback_cmd = "/sbin/poweroff";
     } else if (strcmp(mode, "halt") == 0) {
         shutdown_flag = "-H";
+        fallback_cmd = "/sbin/halt";
     } else if (strcmp(mode, "reboot") == 0) {
         shutdown_flag = "-r";
+        fallback_cmd = "/sbin/reboot";
     } else {
         error_setg(errp,
                    "mode is invalid (valid values are: halt|powerdown|reboot");
@@ -111,6 +115,7 @@ void qmp_guest_shutdown(bool has_mode, c
 
         execle("/sbin/shutdown", "shutdown", "-h", shutdown_flag, "+0",
                "hypervisor initiated shutdown", (char *)NULL, environ);
+        execle(fallback_cmd, fallback_cmd, (char*)NULL, environ);

From poking around in Alpine, these commands are links to busybox, which must do some of the same kind of magic as systemd as far as handling symbolic links. However, this is just black magic to me, and I don't fully understand how this works.

Speaking of systemd, based on some ideas that I've seen elsewhere, I tried to force systemd into the container, but that didn't work:

Added the following to the Dockerfile:

RUN ln -sf /bin/systemctl /sbin/halt; \
    ln -sf /bin/systemctl /sbin/poweroff; \
    ln -sf /bin/systemctl /sbin/reboot; \
    ln -sf /bin/systemctl /sbin/runlevel; \
    ln -sf /bin/systemctl /sbin/shutdown; \
    ln -sf /bin/systemctl /sbin/telinit

Ran the new image with the following commands:

sudo docker run --rm --name qemu-ga --privileged -v /dev:/dev -v /bin/systemctl:/bin/systemctl -v /run/systemd/system:/run/systemd/system -v /var/run/dbus/system_bus_socket:/var/run/dbus/system_bus_socket -v /sys/fs/cgroup:/sys/fs/cgroup --ipc=host --net=host qemu-guest-agent:dev1

However, the Alpine container seems unable to even run these commands. When I shell into the container and try to directly execute /sbin/shutdown, I get an error that: sh: /sbin/shutdown not found. Ditto when I try to run systemd: sh: /bin/systemctl: not found. I don't fully understand why I'm getting this error.

My next approach is to replace all the relevant poweroff/reboot/shutdown/etc commands with scripts that make a call into host system through some socket. However, I'm at a loss here on how to do that.

@cgwalters
Copy link
Member

Just to xref in https://bugzilla.redhat.com/show_bug.cgi?id=1900759 we ended up adding this to RHEL CoreOS and I think didn't try to go through the outstanding concerns here (partly because it originated as a PR to the MCO?), so it's another confusing difference between the two today.

@qinqon
Copy link

qinqon commented May 5, 2023

Hi, I see the qemu-guest-agent is prenset here but is not present at https://quay.io/repository/fedora/fedora-coreos-kubevirt

Do you know why is that ?

@dustymabe
Copy link
Member

Hi, I see the qemu-guest-agent is prenset here but is not present at https://quay.io/repository/fedora/fedora-coreos-kubevirt

For the record (since it was being discussed in a second issue) this was established to not be accurate in #1126 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cloud* related to public/private clouds
Projects
None yet
Development

No branches or pull requests