-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
K8S kubevirt 'allocatable: devices.kubevirt.io/vhost-net: "0"' with Flatcar 3850.0.0+ #1336
Comments
My golang repro code: package main
import "fmt"
import "os"
func main() {
devnode, err := os.Open("/dev/vhost-net")
if err == nil {
fmt.Println("/dev/vhost-net opened")
devnode.Close()
} else {
fmt.Println("/dev/vhost-net failed to open")
}
} When I run it on the Flatcar env without the vhost-net module preloaded, I get |
@ader1990 are you running this from a systemd unit early in boot or something? $ ssh core
Warning: Permanently added '[localhost]:2222' (ED25519) to the list of known hosts.
Last login: Wed Jan 31 08:46:28 UTC 2024 on tty1
Flatcar Container Linux by Kinvolk alpha 3850.0.0 for QEMU
core@localhost ~ $ lsmod | grep vhost
core@localhost ~ $ ls -la /dev/vhost-net
crw-rw-rw-. 1 root kvm 10, 238 Jan 31 08:46 /dev/vhost-net
core@localhost ~ $ ./main
/dev/vhost-net opened
core@localhost ~ $ lsmod | grep vhost
vhost_net 36864 0
tun 69632 1 vhost_net
vhost 65536 1 vhost_net
vhost_iotlb 16384 1 vhost
tap 28672 1 vhost_net
core@localhost ~ $ |
Hello @jepio, I am running manually after the normal boot process, on a baremetal ARM64 server. I have also tried on a Hyper-V VM x64, and I get the same issue. When the VM is a QEMU-KVM, I think it gets automatically loaded, because of the underlying virtualization. I have used the https://alpha.release.flatcar-linux.net/arm64-usr/current/flatcar_production_image.bin.bz2 image. From my testing, only a Before opening an issue in the kubevirt repo, I will try the upstream master of the kubevirt, just to make sure the issue reliably reproduces. The kubevirt implementation relies on open/close of the file to trigger a module load, which does not seem to work: https://github.com/kubevirt/kubevirt/blob/main/pkg/virt-handler/device-manager/generic_device.go#L117 Thank you, |
What I also observed is that /dev/vhost-net can be present also if the vhost_net module is not loaded, because of the QEMU implementation. Can you also confirm this scenario on your environment? |
The way module autoloading work is:
The files involved are: $ systemctl cat kmod-static-nodes
# /usr/lib/systemd/system/kmod-static-nodes.service
# SPDX-License-Identifier: LGPL-2.1-or-later
#
# This file is part of systemd.
#
# systemd is free software; you can redistribute it and/or modify it
# under the terms of the GNU Lesser General Public License as published by
# the Free Software Foundation; either version 2.1 of the License, or
# (at your option) any later version.
[Unit]
Description=Create List of Static Device Nodes
DefaultDependencies=no
Before=sysinit.target systemd-tmpfiles-setup-dev.service
ConditionCapability=CAP_SYS_MODULE
ConditionFileNotEmpty=/lib/modules/%v/modules.devname
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/bin/kmod static-nodes --format=tmpfiles --output=/run/tmpfiles.d/static-nodes.conf $ cat /lib/modules/$(uname -r)/modules.devname
# Device nodes to trigger on-demand module loading.
fuse fuse c10:229
cuse cuse c10:203
btrfs btrfs-control c10:234
nvram nvram c10:144
loop loop-control c10:237
tun net/tun c10:200
ppp_generic ppp c108:0
dm_mod mapper/control c10:236
vfio vfio/vfio c10:196
vhost_net vhost-net c10:238
vhost_vsock vhost-vsock c10:241 $ cat /run/tmpfiles.d/static-nodes.conf
c! /dev/fuse 0600 - - - 10:229
c! /dev/cuse 0600 - - - 10:203
c! /dev/btrfs-control 0600 - - - 10:234
c! /dev/nvram 0600 - - - 10:144
c! /dev/loop-control 0600 - - - 10:237
d /dev/net 0755 - - -
c! /dev/net/tun 0600 - - - 10:200
c! /dev/ppp 0600 - - - 108:0
d /dev/mapper 0755 - - -
c! /dev/mapper/control 0600 - - - 10:236
d /dev/vfio 0755 - - -
c! /dev/vfio/vfio 0600 - - - 10:196
c! /dev/vhost-net 0600 - - - 10:238
c! /dev/vhost-vsock 0600 - - - 10:241 $ systemctl cat systemd-tmpfiles-setup-dev
# /usr/lib/systemd/system/systemd-tmpfiles-setup-dev.service
# SPDX-License-Identifier: LGPL-2.1-or-later
#
# This file is part of systemd.
#
# systemd is free software; you can redistribute it and/or modify it
# under the terms of the GNU Lesser General Public License as published by
# the Free Software Foundation; either version 2.1 of the License, or
# (at your option) any later version.
[Unit]
Description=Create Static Device Nodes in /dev
Documentation=man:tmpfiles.d(5) man:systemd-tmpfiles(8)
DefaultDependencies=no
After=systemd-sysusers.service
Before=sysinit.target local-fs-pre.target systemd-udevd.service
Conflicts=shutdown.target initrd-switch-root.target
Before=shutdown.target initrd-switch-root.target
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=systemd-tmpfiles --prefix=/dev --create --boot
SuccessExitStatus=DATAERR CANTCREAT
LoadCredential=tmpfiles.extra I'm not sure why this is failing, perhaps modules.devname is missing from the initrd? We might need a flatcar dev build that prints more info in before those services to see what the file contents are at that time. Both services are supposed to run from the initrd. |
Actually: i'm seeing that on qemu the services run a second time after switch root. Could you figure out why this doesn't happen in Azure/Vmware? |
On the baremetal ARM64 machine:
Seems that the mapping is correct. |
The file itself is correct, but something must be going wrong with the systemd units that create the dev nodes based on that file. I'll leave it to you to investigate. |
Hello, After various retries, I think I found out the culprit: systemd service If I restart the service, the links are correctly created and udev shows the correct events (module loaded) when trying to access the /dev/vhost-net with open/close (via the golang program). The problem is that the See bellow, file created at
Maybe set a systemd-tmpfiles-setup-dev.service |
That wouldn't be correct since |
Adrian found out that the way we pull in ¹ Edit: Note that even before my change we pulled it in for the PXE path to loop mount |
Wait, maybe we should be using Edit: That still leaves the question if we need to start the kmod service in the initrd, but if we don't start it at least systemd-tmpfiles-setup-dev.service will run in the final system to process its generated files |
In Fedora with a newer systemd version this is what I see as definition:
If |
Update the bootengine commit to use the fix from: flatcar/bootengine#85 Fixes: flatcar/Flatcar#1336 Signed-off-by: Adrian Vladu <avladu@cloudbasesolutions.com>
Update the bootengine commit to use the fix from: flatcar/bootengine#85 Fixes: flatcar/Flatcar#1336 Signed-off-by: Adrian Vladu <avladu@cloudbasesolutions.com>
Update the bootengine commit id to use the fix from: flatcar/bootengine#85 Fixes: flatcar/Flatcar#1336 Signed-off-by: Adrian Vladu <avladu@cloudbasesolutions.com>
Update the bootengine commit id to use the fix from: flatcar/bootengine#85 Fixes: flatcar/Flatcar#1336 Signed-off-by: Adrian Vladu <avladu@cloudbasesolutions.com>
Update the bootengine commit id to use the fix from: flatcar/bootengine#85 Fixes kubevirt vm creation by ensuring that /dev/vhost-net static node gets created Fixes: flatcar/Flatcar#1336 Signed-off-by: Adrian Vladu <avladu@cloudbasesolutions.com>
Update the bootengine commit id to use the fix from: flatcar/bootengine#85 Fixes kubevirt vm creation by ensuring that /dev/vhost-net static node gets created Fixes: flatcar/Flatcar#1336 Signed-off-by: Adrian Vladu <avladu@cloudbasesolutions.com>
Description
If K8S + kubevirt is installed on Flatcar 3850.0.0+, the allocatable vhost-net devices are 0.
Impact
This issue impacts creating k8s kubevirt vms (no vms can be created if there are no allocatable vhost-net devices).
Environment and steps to reproduce
My environment was a k8s created on baremetal ARM64 using Flatcar 3850.0.0 stock image and automation from https://github.com/cloudbase/BMK/tree/flatcar_sysext.
Expected behavior
$: kubect get node -A -o yaml | grep -i vhost-net allocatable: devices.kubevirt.io/vhost-net: 1k
Additional information
If the node is rebooted, the vhost-net allocatable devices are back to the expected size.
Sometimes, the issue cannot be reproduced, which means it is a race condition.
The issue is not present on the current stable or beta releases.
When trying to debug this issue, saw that the kubevirt implementation tries to open the /dev/vhost-net device file in order for the vhost-net kernel module to be autoloaded. I have created a small golang test script and I can confirm that opening the device file does not autoload the kernel module. More debug is needed to see if the 6.6 Linux kernel has module autoload disabled?
The text was updated successfully, but these errors were encountered: