Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guest has no networking for over five minutes #13806

Closed
6 tasks
basak opened this issue Jul 23, 2024 · 6 comments
Closed
6 tasks

Guest has no networking for over five minutes #13806

basak opened this issue Jul 23, 2024 · 6 comments

Comments

@basak
Copy link

basak commented Jul 23, 2024

Required information

  • Distribution: Ubuntu
  • Distribution version: 20.04
  • The output of "snap list --all lxd core20 core22 core24 snapd":
Name    Version        Rev    Tracking       Publisher   Notes
core20  20240416       2318   latest/stable  canonical✓  base
lxd     4.0.9-a29c6f1  24061  4.0/stable/…   canonical✓  -
snapd   2.63           21759  latest/stable  canonical✓  snapd
  • The output of "lxc info" or if that fails:
config: {}
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- macaroon_authentication
- network_sriov
- console
- restrict_devlxd
- migration_pre_copy
- infiniband
- maas_network
- devlxd_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- devlxd_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- candid_authentication
- backup_compression
- candid_config
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- candid_config_key
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- snapshot_schedule_aliases
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool
- snapshot_expiry_creation
- network_leases_location
- resources_cpu_socket
- resources_gpu
- resources_numa
- kernel_features
- id_map_current
- event_location
- storage_api_remote_volume_snapshots
- network_nat_address
- container_nic_routes
- rbac
- cluster_internal_copy
- seccomp_notify
- lxc_features
- container_nic_ipvlan
- network_vlan_sriov
- storage_cephfs
- container_nic_ipfilter
- resources_v2
- container_exec_user_group_cwd
- container_syscall_intercept
- container_disk_shift
- storage_shifted
- resources_infiniband
- daemon_storage
- instances
- image_types
- resources_disk_sata
- clustering_roles
- images_expiry
- resources_network_firmware
- backup_compression_algorithm
- ceph_data_pool_name
- container_syscall_intercept_mount
- compression_squashfs
- container_raw_mount
- container_nic_routed
- container_syscall_intercept_mount_fuse
- container_disk_ceph
- virtual-machines
- image_profiles
- clustering_architecture
- resources_disk_id
- storage_lvm_stripes
- vm_boot_priority
- unix_hotplug_devices
- api_filtering
- instance_nic_network
- clustering_sizing
- firewall_driver
- projects_limits
- container_syscall_intercept_hugetlbfs
- limits_hugepages
- container_nic_routed_gateway
- projects_restrictions
- custom_volume_snapshot_expiry
- volume_snapshot_scheduling
- trust_ca_certificates
- snapshot_disk_usage
- clustering_edit_roles
- container_nic_routed_host_address
- container_nic_ipvlan_gateway
- resources_usb_pci
- resources_cpu_threads_numa
- resources_cpu_core_die
- api_os
- resources_system
- usedby_consistency
- resources_gpu_mdev
- console_vga_type
- projects_limits_disk
- storage_rsync_compression
- gpu_mdev
- resources_pci_iommu
- resources_network_usb
- resources_disk_address
- network_state_vlan
- gpu_sriov
- migration_stateful
- disk_state_quota
- storage_ceph_features
- gpu_mig
- clustering_join_token
- clustering_description
- server_trusted_proxy
- clustering_update_cert
- storage_api_project
- server_instance_driver_operational
- server_supported_storage_drivers
- event_lifecycle_requestor_address
- resources_gpu_usb
- network_counters_errors_dropped
- image_source_project
- database_leader
- instance_all_projects
- ceph_rbd_du
- qemu_metrics
- gpu_mig_uuid
- event_project
- instance_allow_inconsistent_copy
- image_restrictions
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
environment:
  addresses: []
  architectures:
  - x86_64
  - i686
  certificate: |
    -----BEGIN CERTIFICATE-----
    MIICDTCCAZOgAwIBAgIRAOXahcfYtFH8jfU8YOuk7W0wCgYIKoZIzj0EAwMwNzEc
    MBoGA1UEChMTbGludXhjb250YWluZXJzLm9yZzEXMBUGA1UEAwwOcm9vdEByYmFz
    YWstZ3UwHhcNMjQwNzE3MTI1MDQ0WhcNMzQwNzE1MTI1MDQ0WjA3MRwwGgYDVQQK
    ExNsaW51eGNvbnRhaW5lcnMub3JnMRcwFQYDVQQDDA5yb290QHJiYXNhay1ndTB2
    MBAGByqGSM49AgEGBSuBBAAiA2IABFf7GfV68UQKmYTy8xt18QbYEft9M6GrNntW
    dOJxfQ7bvFovAl7LZVlNpQBjkFaJMvIBmSAQ269LGhHU6N8Qu1cphqMfdsJPINfy
    NkjDgZi9O4TkQDc3nMrvbyOxi/w+h6NjMGEwDgYDVR0PAQH/BAQDAgWgMBMGA1Ud
    JQQMMAoGCCsGAQUFBwMBMAwGA1UdEwEB/wQCMAAwLAYDVR0RBCUwI4IJcmJhc2Fr
    LWd1hwR/AAABhxAAAAAAAAAAAAAAAAAAAAABMAoGCCqGSM49BAMDA2gAMGUCMQDg
    bYwCNXcFr+bxykiQWkNX0G2kyc4V/IDBeIbTDVxI81UQOf+FduENoQmgitZ78q4C
    MHqQBUeZt7HcwNaXqmmwgoLsqqx2evkeqckTYbP3uB6xkC1tkt9Iz3K8038keVtl
    aA==
    -----END CERTIFICATE-----
  certificate_fingerprint: 0af2558a40594017a40c81ec6b4b5e60ec13db79bb7eb977c24031a0ca0aa1fd
  driver: lxc | qemu
  driver_version: 4.0.12 | 7.1.0
  firewall: nftables
  kernel: Linux
  kernel_architecture: x86_64
  kernel_features:
    netnsid_getifaddrs: "true"
    seccomp_listener: "true"
    seccomp_listener_continue: "true"
    shiftfs: "false"
    uevent_injection: "true"
    unpriv_fscaps: "true"
  kernel_version: 5.4.0-189-generic
  lxc_features:
    cgroup2: "true"
    core_scheduling: "true"
    devpts_fd: "true"
    idmapped_mounts_v2: "true"
    mount_injection_file: "true"
    network_gateway_device_route: "true"
    network_ipvlan: "true"
    network_l2proxy: "true"
    network_phys_macvlan_mtu: "true"
    network_veth_router: "true"
    pidfd: "true"
    seccomp_allow_deny_syntax: "true"
    seccomp_notify: "true"
    seccomp_proxy_send_notify_fd: "true"
  os_name: Ubuntu
  os_version: "20.04"
  project: default
  server: lxd
  server_clustered: false
  server_name: rbasak-gu
  server_pid: 16495
  server_version: 4.0.9
  storage: dir
  storage_version: "1"
  storage_supported_drivers:
  - name: lvm
    version: 2.03.07(2) (2019-11-30) / 1.02.167 (2019-11-30) / 4.41.0
    remote: false
  - name: zfs
    version: 0.8.3-1ubuntu12.17
    remote: false
  - name: ceph
    version: 15.2.17
    remote: true
  - name: btrfs
    version: 5.4.1
    remote: false
  - name: cephfs
    version: 15.2.17
    remote: true
  - name: dir
    version: "1"
    remote: false

Issue description

When I boot images:debian/sid (image fingerprint 67e221357f18), networking does not run in the guest. I see local link IPv6 addresses but no IPv4 address configured. I assume it's supposed to work with networkd since /etc/systemd/network/eth0.network exists, but systemd-networkd is not running.

I tried the same thing on a 24.04 host with lxd snap 6.1-0d4d89b and it works fine, so I assume this is an issue either with the older lxd or the older host OS.

A brief description of the problem. Should include what you were
attempting to do, what you did, what happened and what you expected to
see happen.

Steps to reproduce

  1. Configure the default profile with security.nesting=true (I'm using this because I'm also trying to use Oracular containers on this host system.
  2. lxc launch images:67e221357f18 ns
  3. lxc exec ns bash
  4. ip a, wait a while, retry, etc.

Expected results: IPv4 configured. Actual results: IPv4 is not configured.

NOTE: taking other actions inside the container seems to result in socket activation of systemd-networkd, and that fixes things. So does waiting about five minutes. In my test, systemd status systemd-networkd reports that the service started 6m43s after I started the instance. My issue is that automated use expects networking in the container to come up promptly with no further action.

Information to attach

  • Any relevant kernel output (dmesg)
Jul 22 11:35:51 rbasak-gu kernel: [427908.488584] lxdbr0: port 1(veth92757f2f) entered blocking state
Jul 22 11:35:51 rbasak-gu kernel: [427908.488585] lxdbr0: port 1(veth92757f2f) entered disabled state
Jul 22 11:35:51 rbasak-gu kernel: [427908.491402] device veth92757f2f entered promiscuous mode
Jul 22 11:35:51 rbasak-gu kernel: [427908.708017] audit: type=1400 audit(1721648151.993:5549): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lxd-ns_</var/snap/lxd/common/lxd>" pid=144136 comm="apparmor_parser"
Jul 22 11:35:52 rbasak-gu kernel: [427908.739468] phys9Vej0T: renamed from vethb4e2977f
Jul 22 11:35:52 rbasak-gu kernel: [427908.778049] eth0: renamed from phys9Vej0T
Jul 22 11:35:52 rbasak-gu kernel: [427908.779367] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Jul 22 11:35:52 rbasak-gu kernel: [427908.779397] lxdbr0: port 1(veth92757f2f) entered blocking state
Jul 22 11:35:52 rbasak-gu kernel: [427908.779399] lxdbr0: port 1(veth92757f2f) entered forwarding state
  • Container log (lxc info NAME --show-log)
Name: ns
Location: none
Remote: unix://
Architecture: x86_64
Created: 2024/07/22 11:35 UTC
Status: Running
Type: container
Profiles: default
Pid: 144137
Ips:
  eth0:	inet	10.69.70.168	veth92757f2f
  eth0:	inet6	fd42:6955:2443:e5c3:216:3eff:fe7e:7a76	veth92757f2f
  eth0:	inet6	fe80::216:3eff:fe7e:7a76	veth92757f2f
  lo:	inet	127.0.0.1
  lo:	inet6	::1
Resources:
  Processes: 8
  CPU usage:
    CPU usage (in seconds): 5
  Memory usage:
    Memory (current): 120.77MB
    Memory (peak): 182.50MB
  Network usage:
    eth0:
      Bytes received: 52.12kB
      Bytes sent: 64.24kB
      Packets received: 387
      Packets sent: 565
    lo:
      Bytes received: 0B
      Bytes sent: 0B
      Packets received: 0
      Packets sent: 0

Log:

lxc ns 20240722113552.621 WARN     conf - conf.c:lxc_map_ids:3592 - newuidmap binary is missing
lxc ns 20240722113552.627 WARN     conf - conf.c:lxc_map_ids:3598 - newgidmap binary is missing
lxc ns 20240722113552.961 WARN     conf - conf.c:lxc_map_ids:3592 - newuidmap binary is missing
lxc ns 20240722113552.965 WARN     conf - conf.c:lxc_map_ids:3598 - newgidmap binary is missing
lxc ns 20240722113552.100 WARN     cgfsng - cgroups/cgfsng.c:fchowmodat:1252 - No such file or directory - Failed to fchownat(40, memory.oom.group, 1000000000, 0, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW )
lxc ns 20240722120706.132 WARN     conf - conf.c:lxc_map_ids:3592 - newuidmap binary is missing
lxc ns 20240722120706.132 WARN     conf - conf.c:lxc_map_ids:3598 - newgidmap binary is missing
  • Container configuration (lxc config show NAME --expanded)
architecture: x86_64
config:
  image.architecture: amd64
  image.description: Debian sid amd64 (20240722_0002)
  image.os: Debian
  image.release: sid
  image.serial: "20240722_0002"
  image.type: squashfs
  image.variant: default
  security.nesting: "true"
  volatile.base_image: 67e221357f182d0f73c6e8ba1971d2d9dd8b18237c31ffbb959ac40eb9f43092
  volatile.eth0.host_name: veth92757f2f
  volatile.eth0.hwaddr: 00:16:3e:7e:7a:76
  volatile.idmap.base: "0"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.power: RUNNING
  volatile.uuid: 3b457acc-4747-46f5-b626-faa6b0e756ce
devices:
  eth0:
    name: eth0
    network: lxdbr0
    type: nic
  root:
    path: /
    pool: default
    type: disk
ephemeral: false
profiles:
- default
stateful: false
description: ""
  • Main daemon log (at /var/log/lxd/lxd.log or /var/snap/lxd/common/lxd/logs/lxd.log)

The only relevant line seems to be:

t=2024-07-22T11:35:52+0000 lvl=info msg="Started container" action=start created=2024-07-22T11:35:42+0000 ephemeral=false instance=ns instanceType=container project=default stateful=false used=1970-01-01T00:00:00+0000

I've skipped the following since it's trivially reproducible on a fresh Focal VM.

  • Output of the client with --debug
  • Output of the daemon with --debug (alternatively output of lxc monitor while reproducing the issue)
@tomponline
Copy link
Member

tomponline commented Jul 23, 2024

LXD 4.0.x is currently in security maintenance mode only. It won't be getting updates to support newer guest containers I'm afraid. Please can you switch to the latest LTS which is 5.21/stable.

@tomponline
Copy link
Member

tomponline commented Jul 23, 2024

Its could be an issue with cgroupv2 not being present on the host, which the newer image expects.

@basak
Copy link
Author

basak commented Jul 23, 2024

It won't be getting updates to support newer guest containers I'm afraid.

In that case, any chance we can predict the failure and refuse to launch such an image please, rather than fail in unpredictable ways afterwards?

Please can you switch to the latest LTS which is 5.21/stable.

I can do that, but I'm trying to use lxd to implement stable builds for git-ubuntu users, and I can't control what lxd they are running or change it. Is there any way I can detect that it's not going to work in advance, please, so I can give the user a useful error message?

@tomponline
Copy link
Member

In that case, any chance we can predict the failure and refuse to launch such an image please, rather than fail in unpredictable ways afterwards?

That would require feature development which is no longer occurring for the 4.0.x series.

We plan one more release to update the default remotes to the new image server and then its critical security fixes only.

@tomponline
Copy link
Member

I can do that, but I'm trying to use lxd to implement stable builds for git-ubuntu users, and I can't control what lxd they are running or change it. Is there any way I can detect that it's not going to work in advance, please, so I can give the user a useful error message?

Because containers share the host kernel and the cgroup layout, its a function of the host + container guest as to whether they function together. The big change was the switch to unified cgroups (cgroupv2) that mean running newer guests on older hosts upset some systemd services because they heavily rely on cgroups.

Does ubuntu:focal work for you on a Focal host with LXD 4.0?

@basak
Copy link
Author

basak commented Jul 23, 2024

Does ubuntu:focal work for you on a Focal host with LXD 4.0?

Yes that works fine. But it doesn't solve the general case of wanting to use lxd in defaults on (eg) an LTS release with the Ubuntu development release in a container to do builds.

I'll file a separate issue for the general case - thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants