Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing state for ignite linux nodes #909

Closed
robertvolkmann opened this issue Jun 9, 2022 · 16 comments · Fixed by #910
Closed

Missing state for ignite linux nodes #909

robertvolkmann opened this issue Jun 9, 2022 · 16 comments · Fixed by #910

Comments

@robertvolkmann
Copy link
Contributor

I'm struggling with the ignite runtime. Deploying the topology

name: clab
prefix: ""

mgmt:
  network: bridge

topology:
  kinds:
    linux:
      image: weaveworks/ignite-ubuntu:20.04
      kernel: weaveworks/ignite-kernel:5.10.51
      runtime: ignite
      sandbox: weaveworks/ignite:v0.10.0
  nodes:
    host1:
      kind: linux
    host2:
      kind: linux
  links:
    - endpoints: ["host1:eth1", "host2:eth1"]

results into

+---+-------+--------------+------------------------------------------+-------+-------+---------------+--------------+
| # | Name  | Container ID |                  Image                   | Kind  | State | IPv4 Address  | IPv6 Address |
+---+-------+--------------+------------------------------------------+-------+-------+---------------+--------------+
| 1 | host1 | 068eb650696f | docker.io/weaveworks/ignite-ubuntu:20.04 | linux |       | 172.17.0.3/24 | N/A          |
| 2 | host2 | 5e78e5702106 | docker.io/weaveworks/ignite-ubuntu:20.04 | linux |       | 172.17.0.2/24 | N/A          |
+---+-------+--------------+------------------------------------------+-------+-------+---------------+--------------+

I don't see the state of the containers. The interface eth1 is down in both ignite containers.

$ docker attach ignite-378c53bc930e70bf
# ip link show eth1
5: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 1000
    link/ether aa:c1:ab:e8:80:dc brd ff:ff:ff:ff:ff:ff

Any idea?

@hellt
Copy link
Member

hellt commented Jun 9, 2022

Hi
@networkop was the one delivering this feature to containerlab, maybe he has some bright ideas how to tshoot this further

@networkop
Copy link
Contributor

hey @robertvolkmann,
what's the reason why you want to run a standard linux image with the ignite runtime? Just want to make sure I understand your use case.
It looks like the docker containers never transition to running state or ignite fails before it can detect that transition. Can you please collect the following logs:

  • the output of running containerlab deploy with --debug flag
  • docker logs for both of the deployed containers (or docker inspect if they are not running)

@robertvolkmann
Copy link
Contributor Author

Hi @networkop,
I want to use a Linux image that needs systemd and docker. For the issue, I switched to ignite-ubuntu to exclude any side effects.

The containers transits to running state:

$ docker ps 
CONTAINER ID   IMAGE                       COMMAND                  CREATED         STATUS         PORTS     NAMES
8bd4ce0138f9   weaveworks/ignite:v0.10.0   "/usr/local/bin/igni…"   6 minutes ago   Up 6 minutes             ignite-74ecc4da1bea3877
829b210b93a1   weaveworks/ignite:v0.10.0   "/usr/local/bin/igni…"   6 minutes ago   Up 6 minutes             ignite-57624908c99174c8

But ignite reports some warnings:

$ sudo ignite ps --all
VM ID			IMAGE				KERNEL					SIZE	CPUS	MEMORY		CREATED		STATUS		IPS		PORTS	NAME
57624908c99174c8	weaveworks/ignite-ubuntu:20.04	weaveworks/ignite-kernel:5.10.51	4.0 GB	1	512.0 MB	2m53s ago	*Up <nil>	172.17.0.2		host1
74ecc4da1bea3877	weaveworks/ignite-ubuntu:20.04	weaveworks/ignite-kernel:5.10.51	4.0 GB	1	512.0 MB	2m50s ago	*Up <nil>	172.17.0.3		host2
WARN[0000] The symbol * on the VM status indicates that the VM manifest on disk may not be up-to-date with the actual VM status from the container runtime 

Below the collected logs:

$ sudo containerlab --topo topo.yaml deploy --debug
[sudo] password for robert: 
INFO[0000] Containerlab v0.27.1 started                 
DEBU[0000] envN runtime var value is                    
DEBU[0000] Running runtime.Init with params &{Timeout:2m0s GracefulShutdown:false Debug:true KeepMgmtNet:false} and &{Network: Bridge: IPv4Subnet: IPv4Gw: IPv6Subnet: IPv6Gw: MTU: ExternalAccess:<nil>} 
DEBU[0000] Runtime: Docker                              
DEBU[0000] detected docker network mtu value - 1500     
DEBU[0000] initialized a runtime with params &{config:{Timeout:120000000000 GracefulShutdown:false Debug:true KeepMgmtNet:false} Client:0xc00073af80 mgmt:0xc00073ae00} 
DEBU[0000] template variables: <nil>                    
DEBU[0000] topology:
name: clab
prefix: ""

mgmt:
  network: bridge

topology:
  kinds:
    linux:
      image: weaveworks/ignite-ubuntu:20.04
      kernel: weaveworks/ignite-kernel:5.10.51
      runtime: ignite
      sandbox: weaveworks/ignite:v0.10.0
  nodes:
    host1:
      kind: linux
    host2:
      kind: linux
  links:
    - endpoints: ["host1:eth1", "host2:eth1"]

 
DEBU[0000] method initMgmtNetwork was called mgmt params &{Network:bridge Bridge: IPv4Subnet: IPv4Gw: IPv6Subnet: IPv6Gw: MTU:1500 ExternalAccess:<nil>} 
DEBU[0000] New mgmt params are &{Network:bridge Bridge: IPv4Subnet:172.20.20.0/24 IPv4Gw: IPv6Subnet:2001:172:20:20::/64 IPv6Gw: MTU:1500 ExternalAccess:0xc000243d6f} 
INFO[0000] Parsing & checking topology file: topo.yaml  
DEBU[0000] Runtime: Docker                              
DEBU[0000] node config: &{ShortName:host1 LongName:host1 Fqdn:host1.clab.io LabDir:/home/robert/src/test/clab-clab/host1 Index:0 Group: Kind:linux StartupConfig: StartupDelay:0 EnforceStartupConfig:false ResStartupConfig: Config:<nil> ResConfig: NodeType: Position: License: Image:weaveworks/ignite-ubuntu:20.04 Sysctls:map[] User: Entrypoint: Cmd: Exec:[] Env:map[] Binds:[] PortBindings:map[] PortSet:map[] NetworkMode: MgmtNet: MgmtIntf: MgmtIPv4Address: MgmtIPv4PrefixLength:0 MgmtIPv6Address: MgmtIPv6PrefixLength:0 MacAddress: ContainerID: TLSCert: TLSKey: TLSAnchor: NSPath: Publish:[] ExtraHosts:[] Labels:map[] Endpoints:[] Sandbox:weaveworks/ignite:v0.10.0 Kernel:weaveworks/ignite-kernel:5.10.51 Runtime:ignite CPU:0 CPUSet: Memory: HostRequirements:{SSE3:false} DeploymentStatus: Extras:<nil>} 
DEBU[0000] node config: &{ShortName:host2 LongName:host2 Fqdn:host2.clab.io LabDir:/home/robert/src/test/clab-clab/host2 Index:1 Group: Kind:linux StartupConfig: StartupDelay:0 EnforceStartupConfig:false ResStartupConfig: Config:<nil> ResConfig: NodeType: Position: License: Image:weaveworks/ignite-ubuntu:20.04 Sysctls:map[] User: Entrypoint: Cmd: Exec:[] Env:map[] Binds:[] PortBindings:map[] PortSet:map[] NetworkMode: MgmtNet: MgmtIntf: MgmtIPv4Address: MgmtIPv4PrefixLength:0 MgmtIPv6Address: MgmtIPv6PrefixLength:0 MacAddress: ContainerID: TLSCert: TLSKey: TLSAnchor: NSPath: Publish:[] ExtraHosts:[] Labels:map[] Endpoints:[] Sandbox:weaveworks/ignite:v0.10.0 Kernel:weaveworks/ignite-kernel:5.10.51 Runtime:ignite CPU:0 CPUSet: Memory: HostRequirements:{SSE3:false} DeploymentStatus: Extras:<nil>} 
DEBU[0000] lab Conf: &{Name:clab Prefix:0xc00007da70 Mgmt:0xc00073ae00 Topology:0xc0001da390} 
DEBU[0000] Ensuring image weaveworks/ignite-ubuntu:20.04 exists, or importing it... 
DEBU[0000] Found image with UID 55833905a13a73a4        
DEBU[0000] Ensuring image weaveworks/ignite-kernel:5.10.51 exists, or importing it... 
DEBU[0000] Found image with UID e4dfc48fd3d52eb5        
DEBU[0000] Ensuring image weaveworks/ignite:v0.10.0 exists, or importing it... 
DEBU[0000] Found image with UID 3e97632f4f14abe8        
DEBU[0000] Number of vcpu: 12                           
INFO[0000] Creating lab directory: /home/robert/src/test/clab-clab 
DEBU[0000] error while trying to access file /root/.ssh/authorized_keys: stat /root/.ssh/authorized_keys: no such file or directory 
DEBU[0000] no public keys found                         
DEBU[0000] Checking if docker network "bridge" exists   
DEBU[0000] network "bridge" was found. Reusing it...    
DEBU[0000] Docker network "bridge", bridge name "docker0" 
DEBU[0000] Disable RPF check on the docker host         
DEBU[0000] Enable LLDP on the linux bridge docker0      
DEBU[0000] Disabling TX checksum offloading for the docker0 bridge interface... 
DEBU[0000] Installing iptables rules for bridge "docker0" 
DEBU[0000] scheduling nodes with dynamic IPs...         
DEBU[0000] Worker 2 received node: &{ShortName:host1 LongName:host1 Fqdn:host1.clab.io LabDir:/home/robert/src/test/clab-clab/host1 Index:0 Group: Kind:linux StartupConfig: StartupDelay:0 EnforceStartupConfig:false ResStartupConfig: Config:0xc000a4c720 ResConfig: NodeType: Position: License: Image:weaveworks/ignite-ubuntu:20.04 Sysctls:map[net.ipv6.conf.all.disable_ipv6:0] User: Entrypoint: Cmd: Exec:[] Env:map[CLAB_INTFS:1 CLAB_LABEL_CLAB_NODE_GROUP: CLAB_LABEL_CLAB_NODE_KIND:linux CLAB_LABEL_CLAB_NODE_LAB_DIR:/home/robert/src/test/clab-clab/host1 CLAB_LABEL_CLAB_NODE_NAME:host1 CLAB_LABEL_CLAB_NODE_TYPE: CLAB_LABEL_CLAB_TOPO_FILE:/home/robert/src/test/topo.yaml CLAB_LABEL_CONTAINERLAB:clab] Binds:[] PortBindings:map[] PortSet:map[] NetworkMode: MgmtNet: MgmtIntf: MgmtIPv4Address: MgmtIPv4PrefixLength:0 MgmtIPv6Address: MgmtIPv6PrefixLength:0 MacAddress: ContainerID: TLSCert: TLSKey: TLSAnchor: NSPath: Publish:[] ExtraHosts:[] Labels:map[clab-mgmt-net-bridge:docker0 clab-node-group: clab-node-kind:linux clab-node-lab-dir:/home/robert/src/test/clab-clab/host1 clab-node-name:host1 clab-node-type: clab-topo-file:/home/robert/src/test/topo.yaml containerlab:clab] Endpoints:[{Node:0xc00096c300 EndpointName:eth1 MAC:aa:c1:ab:1e:b6:da}] Sandbox:weaveworks/ignite:v0.10.0 Kernel:weaveworks/ignite-kernel:5.10.51 Runtime:ignite CPU:0 CPUSet: Memory: HostRequirements:{SSE3:false} DeploymentStatus: Extras:<nil>} 
DEBU[0000] Ensuring kernel weaveworks/ignite-kernel:5.10.51 exists, or importing it... 
DEBU[0000] Found kernel with UID 60b43c86f815f640       
DEBU[0000] Ensuring image weaveworks/ignite-ubuntu:20.04 exists, or importing it... 
DEBU[0000] Found image with UID 55833905a13a73a4        
INFO[0003] Networking is handled by "docker-bridge"     
INFO[0003] Started Firecracker VM "57624908c99174c8" in a container with ID "829b210b93a1857303c5a8ea311dcfd769f85953778a5f60e7abc63359820b41" 
DEBU[0003] creating links...                            
DEBU[0003] Worker 2 received node: &{ShortName:host2 LongName:host2 Fqdn:host2.clab.io LabDir:/home/robert/src/test/clab-clab/host2 Index:1 Group: Kind:linux StartupConfig: StartupDelay:0 EnforceStartupConfig:false ResStartupConfig: Config:0xc000a4c798 ResConfig: NodeType: Position: License: Image:weaveworks/ignite-ubuntu:20.04 Sysctls:map[net.ipv6.conf.all.disable_ipv6:0] User: Entrypoint: Cmd: Exec:[] Env:map[CLAB_INTFS:1 CLAB_LABEL_CLAB_NODE_GROUP: CLAB_LABEL_CLAB_NODE_KIND:linux CLAB_LABEL_CLAB_NODE_LAB_DIR:/home/robert/src/test/clab-clab/host2 CLAB_LABEL_CLAB_NODE_NAME:host2 CLAB_LABEL_CLAB_NODE_TYPE: CLAB_LABEL_CLAB_TOPO_FILE:/home/robert/src/test/topo.yaml CLAB_LABEL_CONTAINERLAB:clab] Binds:[] PortBindings:map[] PortSet:map[] NetworkMode: MgmtNet: MgmtIntf: MgmtIPv4Address: MgmtIPv4PrefixLength:0 MgmtIPv6Address: MgmtIPv6PrefixLength:0 MacAddress: ContainerID: TLSCert: TLSKey: TLSAnchor: NSPath: Publish:[] ExtraHosts:[] Labels:map[clab-mgmt-net-bridge:docker0 clab-node-group: clab-node-kind:linux clab-node-lab-dir:/home/robert/src/test/clab-clab/host2 clab-node-name:host2 clab-node-type: clab-topo-file:/home/robert/src/test/topo.yaml containerlab:clab] Endpoints:[{Node:0xc00096c900 EndpointName:eth1 MAC:aa:c1:ab:a1:a9:7c}] Sandbox:weaveworks/ignite:v0.10.0 Kernel:weaveworks/ignite-kernel:5.10.51 Runtime:ignite CPU:0 CPUSet: Memory: HostRequirements:{SSE3:false} DeploymentStatus: Extras:<nil>} 
DEBU[0003] Worker 1 terminating...                      
DEBU[0003] Worker 0 terminating...                      
DEBU[0003] Ensuring kernel weaveworks/ignite-kernel:5.10.51 exists, or importing it... 
DEBU[0003] Found kernel with UID 60b43c86f815f640       
DEBU[0003] Ensuring image weaveworks/ignite-ubuntu:20.04 exists, or importing it... 
DEBU[0003] Found image with UID 55833905a13a73a4        
INFO[0006] Networking is handled by "docker-bridge"     
INFO[0006] Started Firecracker VM "74ecc4da1bea3877" in a container with ID "8bd4ce0138f9bac99a83ccfe53a61a04db8340078171237a7c6e1e78871d28a8" 
DEBU[0006] Worker 2 terminating...                      
DEBU[0006] Link worker 0 received link: link [host1:eth1, host2:eth1] 
INFO[0006] Creating virtual wire: host1:eth1 <--> host2:eth1 
DEBU[0006] Link worker 0 terminating...                 
DEBU[0006] containers created, retrieving state and IP addresses... 
DEBU[0006] Filterstring: containerlab=clab              
DEBU[0006] enriching nodes with IP information...       
DEBU[0006] Exported topology data using /etc/containerlab/templates/export/auto.tmpl template 
DEBU[0006] Running postdeploy actions for Linux 'host2' node 
DEBU[0006] Running postdeploy actions for Linux 'host1' node 
DEBU[0006] Filterstring: containerlab=clab              
INFO[0006] Adding containerlab host entries to /etc/hosts file 
+---+-------+--------------+------------------------------------------+-------+-------+---------------+--------------+
| # | Name  | Container ID |                  Image                   | Kind  | State | IPv4 Address  | IPv6 Address |
+---+-------+--------------+------------------------------------------+-------+-------+---------------+--------------+
| 1 | host1 | 829b210b93a1 | docker.io/weaveworks/ignite-ubuntu:20.04 | linux |       | 172.17.0.2/24 | N/A          |
| 2 | host2 | 8bd4ce0138f9 | docker.io/weaveworks/ignite-ubuntu:20.04 | linux |       | 172.17.0.3/24 | N/A          |
+---+-------+--------------+------------------------------------------+-------+-------+---------------+--------------+
$ docker logs ignite-74ecc4da1bea3877 
WARN[0000] Got an error while trying to set up networking, but retrying: interface "eth1" (mode "tc-redirect") is still not found 
INFO[0001] Moving IP address 172.17.0.3/16 (255.255.0.0) with gateway 172.17.0.1 from container to VM 
INFO[0001] Adding tc-redirect for "eth1"                
INFO[0001] Starting DHCP server for interface "br_eth0" (172.17.0.3) 
INFO[0001] Called startVMM(), setting up a VMM on /var/lib/firecracker/vm/74ecc4da1bea3877/firecracker.sock 
DEBU[0001] Creating FIFO /var/lib/firecracker/vm/74ecc4da1bea3877/firecracker_metrics.fifo 
DEBU[0001] Creating FIFO /var/lib/firecracker/vm/74ecc4da1bea3877/firecracker_log.fifo 
INFO[0001] refreshMachineConfiguration: [GET /machine-config][200] getMachineConfigurationOK  &{CPUTemplate:Uninitialized HtEnabled:0xc000237863 MemSizeMib:0xc000237858 VcpuCount:0xc000237850} 
INFO[0001] PutGuestBootSource: [PUT /boot-source][204] putGuestBootSourceNoContent  
INFO[0001] Attaching drive /dev/mapper/ignite-74ecc4da1bea3877, slot 1, root true. 
INFO[0001] Attached drive /dev/mapper/ignite-74ecc4da1bea3877: [PUT /drives/{drive_id}][204] putGuestDriveByIdNoContent  
INFO[0001] Attaching NIC vm_eth0 (hwaddr 96:a1:9b:af:71:40) at index 1 
INFO[0001] Attaching NIC vm_eth1 (hwaddr aa:c1:ab:a1:a9:7c) at index 2 
INFO[0001] startInstance successful: [PUT /actions][204] createSyncActionNoContent 
... Skipped boot logs ...
Ubuntu 20.04.3 LTS 74ecc4da1bea3877 ttyS0

@networkop
Copy link
Contributor

got it, thanks. So far I can't see anything obvious. I'll try to reproduce this over the weekend.
just out of curiosity, what happens if you manually do ip link set eth1 up ?

@networkop
Copy link
Contributor

just managed to reproduce this. It looks like the only thing that's failed is that ignite did not detect the transition to running state and hence didn't report it back to containerlab.
However the interfaces are plugged in correctly and I was able to ping across eth1 after I've done ip link set up and ip addr add

@networkop
Copy link
Contributor

I'll troubleshoot this a bit over the weened and maybe do a PR in weaveworks/ignite

@hellt
Copy link
Member

hellt commented Jun 10, 2022

I noticed that ignite itself has been updated upstream
Maybe worth checking if bumping it up makes things more favorable?

@robertvolkmann
Copy link
Contributor Author

containerlab's already uses the latest ignite version. No new release of ignite since mid of 2021.

@hellt
Copy link
Member

hellt commented Jun 10, 2022

ah, crap.
I wonder if weaveworks doing ok these days...

@robertvolkmann
Copy link
Contributor Author

robertvolkmann commented Jun 10, 2022

One contributor switched mid of 2021 from weaveworks to VMware, but he still contributes.

@robertvolkmann
Copy link
Contributor Author

Maybe this behavior was introduced by:
weaveworks/ignite#787
weaveworks/ignite#808

@networkop
Copy link
Contributor

networkop commented Jun 11, 2022

I think I've worked out what's happening. When containerlab starts nodes in ignite runtime we execute:

https://github.com/weaveworks/ignite/blob/2dbcdd6637277eaef0d253a8c1774e5736c6289c/pkg/operations/start.go#L42

which is a function that runs a goroutine that sits in the background and waits for the right condition to occur before transitioning the state of ignite VM to running :

https://github.com/weaveworks/ignite/blob/2dbcdd6637277eaef0d253a8c1774e5736c6289c/pkg/operations/start.go#L214

Inside containerlab this translates to the need to parse both arguments returned by StartContainer method, like this is done for cvx:

intf, err := c.runtime.StartContainer(ctx, cID, c.cfg)

Since we don't do the same for linux nodes and we discard the returned value, Go's GC cleans up all dependent goroutines and we never get to check that ignite VMs have transitioned to running.

_, err = l.runtime.StartContainer(ctx, cID, l.cfg)

So I see there are three options:

  • ignore the fact that we can't properly check that state transition (probably worth documenting this)
  • create a new linux_ignite node that would not discard the first argument and would save it for later. This obv sucks since we
  • implement the following pattern for all node types. Maybe there's a way to generalize it to make it less ignite-specific but looks like no other runtime uses this (they all return nil as the first value in all cases)
    https://github.com/srl-labs/containerlab/blob/main/nodes/cvx/cvx.go#L86-L88

what do you all think?

@networkop
Copy link
Contributor

the core reason for all of the above is that there are a bunch of things that occur after the container is started and before the microVM is started and it's not safe to assume that a running container implies a successfully running micro VM.

@hellt
Copy link
Member

hellt commented Jun 11, 2022

@networkop since linux kind is the only one (so far) that can benefit from ignite runtime, can't we add

if vmChans, ok := intf.(*operations.VMChannels); ok {
		c.vmChans = vmChans
	}

for its Deploy stage and be good with it?

@networkop
Copy link
Contributor

Yep, good idea. I'll do a PR.

@hellt
Copy link
Member

hellt commented Jun 11, 2022

@robertvolkmann, @networkop fixed this particual issue, but during my tests I noticed that topology deletion doesn't work.

so if you're okay with manually deleting ignite containers, you can pull the beta build with ignite fix:

docker run --rm -v $(pwd):/workspace ghcr.io/oras-project/oras:v0.12.0 pull ghcr.io/srl-labs/clab-oci:05d8ddd7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants