When running container via containerd-driver plugin in the bridge mode, there is no ip address(127.0.0.1/8) in the lo network #10014

lisongmin · 2021-02-11T14:31:35Z

Nomad version

nomad --version
Nomad v1.0.2 (4c1d4fc6a5823ebc8c3e748daec7b4fda3f11037)

Operating system and Environment details

Ubuntu 18.04.3 LTS

Issue

When running container via containerd-driver plugin in the bridge mode, there is no ip address(127.0.0.1/8) in the lo network, so app can not listen on 127.0.0.1 .

root@bypass-route:/# ip addr
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: eth0@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 2e:f4:d0:00:17:83 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.26.64.2/20 brd 172.26.79.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::2cf4:d0ff:fe00:1783/64 scope link 
       valid_lft forever preferred_lft forever

when change to docker driver, the lo network is fine.

root@9b68f72d354e:/# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
3: eth0@if118: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 32:2c:db:6a:dc:f0 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.26.64.107/20 brd 172.26.79.255 scope global eth0
       valid_lft forever preferred_lft forever

I original issue the problem on nomad-driver-containerd, and @shishir-a412ed suggest me issue here.

As I test, the problem can be resolved by set the lo link in the netns up. Can we solve this problem in nomad? thanks.

sudo ip -netns 2eb4b098-e328-a8c5-6a5b-000f640a029f link set lo up

Reproduction steps

Job file (if appropriate)

job "test2" {
  datacenters = ["dc1"]

  group "test2" {

    network {
      mode = "bridge"
    }

    task "test2" {
      driver = "containerd-driver"
      config {
        image           = "docker.io/library/ubuntu:20.04"
        command         = "sleep"
        args            = ["600s"]
      }

      resources {
        cpu    = 500
        memory = 256
      }
    }
  }
}

The text was updated successfully, but these errors were encountered:

tgross · 2021-03-24T20:55:28Z

I was able to confirm this same behavior with the containerd plugin, along with an exec job. That suggests there's something about how we're setting up the network namespace in the shared executor with the libcontainer.Config or in the allocation runner's network_manager_linux code. I'll mark this as a bug for roadmapping.

Reproduction

Here's a reproduction with your containerd job:

$ sudo ip --netns 874be434-466f-9f58-dbf9-97104a28c99e addr
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: eth0@if33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether f6:1d:1b:4f:ff:52 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.26.64.133/20 brd 172.26.79.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::f41d:1bff:fe4f:ff52/64 scope link
       valid_lft forever preferred_lft forever

From an exec driver job:

jobspec

job "test3" {
  datacenters = ["dc1"]

  group "group" {

    network {
      mode = "bridge"
      port "www" {
        to = "8000"
      }
    }

    task "group" {
      driver = "exec"

      config {
        command = "python"
        args    = ["-m", "SimpleHTTPServer"]
      }
    }
  }
}

$ nomad alloc exec d45 ip addr
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: eth0@if34: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether 06:a1:13:0f:69:16 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.26.64.134/20 brd 172.26.79.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::4a1:13ff:fe0f:6916/64 scope link
       valid_lft forever preferred_lft forever

A job with the docker driver, from the nomad job init -short example job:

$ sudo nsenter --net=/proc/9464/ns/net
root@linux# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
35: eth0@if36: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether 02:42:ac:11:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.17.0.2/16 brd 172.17.255.255 scope global eth0
       valid_lft forever preferred_lft forever

Docker on its own:

$ docker run -it --rm busybox:1 ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
37: eth0@if38: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue
    link/ether 02:42:ac:11:00:03 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.3/16 brd 172.17.255.255 scope global eth0
       valid_lft forever preferred_lft forever

dcarbone · 2021-11-14T16:37:21Z

This issue persists with nomad v1.1.6.

victors2709 · 2022-04-19T08:54:48Z

This problem replicates also for podman plugin.
I think this is related to: containernetworking/cni#97
It works using a CNI network with loopback plugin:
{ "cniVersion": "0.4.0", "name": "brloopup", "plugins": [ { "type": "bridge", "bridge": "nomad", "ipMasq": true, "isGateway": true, "forceAddress": true, "ipam": { "type": "host-local", "ranges": [ [ { "subnet": "172.26.64.0/20" } ] ], "routes": [ { "dst": "0.0.0.0/0" } ] } }, { "type": "loopback" }, { "type": "firewall", "backend": "iptables", "iptablesAdminChainName": "NOMAD-ADMIN" }, { "type": "portmap", "capabilities": {"portMappings": true}, "snat": true } ] }

It works with service jobs but it is not supported with Consul Connect sidecar
* Consul Connect sidecar requires bridge network, found "cni/brloopup" in group "api"

primeos-work · 2022-05-02T16:12:24Z

I also hit this issue on "Rocky Linux 8.5 (Green Obsidian)" with the most recent Nomad version and drivers (tested with the "podman" and "exec" drivers and Consul Connect).
((But I guess this adds little value as the issue is already confirmed.))

I've seen two workarounds so far:

I also tested the first approach via a similar Python script.

Code

#!/usr/bin/env python3

# dnf install python3
# dnf install python3-inotify # https://github.com/seb-m/pyinotify
# An alternative package (code not compatible): https://pypi.org/project/inotify/

import os

import pyinotify


class EventHandler(pyinotify.ProcessEvent):
    def process_IN_CREATE(self, event):
        netns = event.name
        print (f"New netns: {netns}")
        os.system(f"ip -n {netns} link set lo up")


def main():
    handler = EventHandler()
    # Instanciate a new WatchManager (will be used to store watches).
    wm = pyinotify.WatchManager()
    # Associate this WatchManager with a Notifier (will be used to report and
    # process events).
    notifier = pyinotify.Notifier(wm, handler)
    # Add a new watch on $PATH for ALL_EVENTS.
    wm.add_watch('/run/netns/', pyinotify.IN_CREATE)
    # Loop forever and handle events.
    notifier.loop()


if __name__ == '__main__':
    main()

But of course those workarounds are super hacky and it'd be great if this could be fixed properly.

CNI changed how to bring up the interface in v0.2.0. Support was moved to a new loopback plugin. containernetworking/cni#121

CNI changed how to bring up the interface in v0.2.0. Support was moved to a new loopback plugin. containernetworking/cni#121 Fixes hashicorp#10014

CNI changed how to bring up the interface in v0.2.0. Support was moved to a new loopback plugin. containernetworking/cni#121 Fixes #10014

tgross · 2022-06-20T15:23:24Z

Fixed by #13428, which will ship in Nomad 1.3.2 (+ backports)

CNI changed how to bring up the interface in v0.2.0. Support was moved to a new loopback plugin. containernetworking/cni#121 Fixes #10014

github-actions · 2022-10-19T02:44:00Z

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

tgross added theme/networking stage/needs-investigation labels Feb 12, 2021

tgross added type/bug and removed stage/needs-investigation labels Mar 24, 2021

johnalotoski mentioned this issue Oct 14, 2021

Fixup for nomad bridge lo down issue input-output-hk/bitte#83

Merged

primeos-work mentioned this issue May 2, 2022

Support network bridge mode hashicorp/nomad-driver-podman#36

Closed

h0tw1r3 added a commit to h0tw1r3/nomad that referenced this issue Jun 19, 2022

cni: add loopback to linux bridge (hashicorp#10014)

24b9238

CNI changed how to bring up the interface in v0.2.0. Support was moved to a new loopback plugin. containernetworking/cni#121

h0tw1r3 mentioned this issue Jun 19, 2022

cni: add loopback to linux bridge #13428

Merged

h0tw1r3 added a commit to h0tw1r3/nomad that referenced this issue Jun 19, 2022

cni: add loopback to linux bridge

d7677af

CNI changed how to bring up the interface in v0.2.0. Support was moved to a new loopback plugin. containernetworking/cni#121 Fixes hashicorp#10014

tgross closed this as completed in #13428 Jun 20, 2022

tgross pushed a commit that referenced this issue Jun 20, 2022

cni: add loopback to linux bridge (#13428)

eff2c01

CNI changed how to bring up the interface in v0.2.0. Support was moved to a new loopback plugin. containernetworking/cni#121 Fixes #10014

tgross added this to the 1.3.2 milestone Jun 20, 2022

hc-github-team-nomad-core mentioned this issue Jun 20, 2022

Backport of cni: add loopback to linux bridge into release/1.1.x #13432

Merged

This was referenced Jun 20, 2022

Backport of cni: add loopback to linux bridge into release/1.2.x #13433

Merged

Backport of cni: add loopback to linux bridge into release/1.3.x #13434

Merged

tbehling pushed a commit that referenced this issue Jun 29, 2022

cni: add loopback to linux bridge (#13428)

6440f3e

CNI changed how to bring up the interface in v0.2.0. Support was moved to a new loopback plugin. containernetworking/cni#121 Fixes #10014

github-actions bot locked as resolved and limited conversation to collaborators Oct 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When running container via containerd-driver plugin in the bridge mode, there is no ip address(127.0.0.1/8) in the lo network #10014

When running container via containerd-driver plugin in the bridge mode, there is no ip address(127.0.0.1/8) in the lo network #10014

lisongmin commented Feb 11, 2021

tgross commented Mar 24, 2021

dcarbone commented Nov 14, 2021

victors2709 commented Apr 19, 2022

primeos-work commented May 2, 2022 •

edited

Loading

tgross commented Jun 20, 2022

github-actions bot commented Oct 19, 2022

When running container via containerd-driver plugin in the bridge mode, there is no ip address(127.0.0.1/8) in the lo network #10014

When running container via containerd-driver plugin in the bridge mode, there is no ip address(127.0.0.1/8) in the lo network #10014

Comments

lisongmin commented Feb 11, 2021

Nomad version

Operating system and Environment details

Issue

Reproduction steps

Job file (if appropriate)

tgross commented Mar 24, 2021

dcarbone commented Nov 14, 2021

victors2709 commented Apr 19, 2022

primeos-work commented May 2, 2022 • edited Loading

tgross commented Jun 20, 2022

github-actions bot commented Oct 19, 2022

primeos-work commented May 2, 2022 •

edited

Loading