Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When running container via containerd-driver plugin in the bridge mode, there is no ip address(127.0.0.1/8) in the lo network #10014

Closed
lisongmin opened this issue Feb 11, 2021 · 6 comments · Fixed by #13428

Comments

@lisongmin
Copy link

Nomad version

nomad --version
Nomad v1.0.2 (4c1d4fc6a5823ebc8c3e748daec7b4fda3f11037)

Operating system and Environment details

Ubuntu 18.04.3 LTS

Issue

When running container via containerd-driver plugin in the bridge mode, there is no ip address(127.0.0.1/8) in the lo network, so app can not listen on 127.0.0.1 .

root@bypass-route:/# ip addr
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: eth0@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 2e:f4:d0:00:17:83 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.26.64.2/20 brd 172.26.79.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::2cf4:d0ff:fe00:1783/64 scope link 
       valid_lft forever preferred_lft forever

when change to docker driver, the lo network is fine.

root@9b68f72d354e:/# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
3: eth0@if118: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 32:2c:db:6a:dc:f0 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.26.64.107/20 brd 172.26.79.255 scope global eth0
       valid_lft forever preferred_lft forever

I original issue the problem on nomad-driver-containerd, and @shishir-a412ed suggest me issue here.

As I test, the problem can be resolved by set the lo link in the netns up. Can we solve this problem in nomad? thanks.

sudo ip -netns 2eb4b098-e328-a8c5-6a5b-000f640a029f link set lo up

Reproduction steps

Job file (if appropriate)

job "test2" {
  datacenters = ["dc1"]

  group "test2" {

    network {
      mode = "bridge"
    }

    task "test2" {
      driver = "containerd-driver"
      config {
        image           = "docker.io/library/ubuntu:20.04"
        command         = "sleep"
        args            = ["600s"]
      }

      resources {
        cpu    = 500
        memory = 256
      }
    }
  }
}
@tgross
Copy link
Member

tgross commented Mar 24, 2021

I was able to confirm this same behavior with the containerd plugin, along with an exec job. That suggests there's something about how we're setting up the network namespace in the shared executor with the libcontainer.Config or in the allocation runner's network_manager_linux code. I'll mark this as a bug for roadmapping.


Reproduction

Here's a reproduction with your containerd job:

$ sudo ip --netns 874be434-466f-9f58-dbf9-97104a28c99e addr
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: eth0@if33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether f6:1d:1b:4f:ff:52 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.26.64.133/20 brd 172.26.79.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::f41d:1bff:fe4f:ff52/64 scope link
       valid_lft forever preferred_lft forever

From an exec driver job:

jobspec
job "test3" {
  datacenters = ["dc1"]

  group "group" {

    network {
      mode = "bridge"
      port "www" {
        to = "8000"
      }
    }

    task "group" {
      driver = "exec"

      config {
        command = "python"
        args    = ["-m", "SimpleHTTPServer"]
      }
    }
  }
}
$ nomad alloc exec d45 ip addr
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: eth0@if34: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether 06:a1:13:0f:69:16 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.26.64.134/20 brd 172.26.79.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::4a1:13ff:fe0f:6916/64 scope link
       valid_lft forever preferred_lft forever

A job with the docker driver, from the nomad job init -short example job:

$ sudo nsenter --net=/proc/9464/ns/net
root@linux# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
35: eth0@if36: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether 02:42:ac:11:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.17.0.2/16 brd 172.17.255.255 scope global eth0
       valid_lft forever preferred_lft forever

Docker on its own:

$ docker run -it --rm busybox:1 ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
37: eth0@if38: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue
    link/ether 02:42:ac:11:00:03 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.3/16 brd 172.17.255.255 scope global eth0
       valid_lft forever preferred_lft forever

@dcarbone
Copy link

This issue persists with nomad v1.1.6.

@victors2709
Copy link

This problem replicates also for podman plugin.
I think this is related to: containernetworking/cni#97
It works using a CNI network with loopback plugin:
{ "cniVersion": "0.4.0", "name": "brloopup", "plugins": [ { "type": "bridge", "bridge": "nomad", "ipMasq": true, "isGateway": true, "forceAddress": true, "ipam": { "type": "host-local", "ranges": [ [ { "subnet": "172.26.64.0/20" } ] ], "routes": [ { "dst": "0.0.0.0/0" } ] } }, { "type": "loopback" }, { "type": "firewall", "backend": "iptables", "iptablesAdminChainName": "NOMAD-ADMIN" }, { "type": "portmap", "capabilities": {"portMappings": true}, "snat": true } ] }

It works with service jobs but it is not supported with Consul Connect sidecar
* Consul Connect sidecar requires bridge network, found "cni/brloopup" in group "api"

@primeos-work
Copy link

primeos-work commented May 2, 2022

I also hit this issue on "Rocky Linux 8.5 (Green Obsidian)" with the most recent Nomad version and drivers (tested with the "podman" and "exec" drivers and Consul Connect).
((But I guess this adds little value as the issue is already confirmed.))

I've seen two workarounds so far:

  1. https://github.com/input-output-hk/bitte/pull/83/files
  2. https://discuss.hashicorp.com/t/consul-connect-envoy-without-docker/4824/7

I also tested the first approach via a similar Python script.

Code
#!/usr/bin/env python3

# dnf install python3
# dnf install python3-inotify # https://github.com/seb-m/pyinotify
# An alternative package (code not compatible): https://pypi.org/project/inotify/

import os

import pyinotify


class EventHandler(pyinotify.ProcessEvent):
    def process_IN_CREATE(self, event):
        netns = event.name
        print (f"New netns: {netns}")
        os.system(f"ip -n {netns} link set lo up")


def main():
    handler = EventHandler()
    # Instanciate a new WatchManager (will be used to store watches).
    wm = pyinotify.WatchManager()
    # Associate this WatchManager with a Notifier (will be used to report and
    # process events).
    notifier = pyinotify.Notifier(wm, handler)
    # Add a new watch on $PATH for ALL_EVENTS.
    wm.add_watch('/run/netns/', pyinotify.IN_CREATE)
    # Loop forever and handle events.
    notifier.loop()


if __name__ == '__main__':
    main()

But of course those workarounds are super hacky and it'd be great if this could be fixed properly.

h0tw1r3 added a commit to h0tw1r3/nomad that referenced this issue Jun 19, 2022
CNI changed how to bring up the interface in v0.2.0.
Support was moved to a new loopback plugin.

containernetworking/cni#121
h0tw1r3 added a commit to h0tw1r3/nomad that referenced this issue Jun 19, 2022
CNI changed how to bring up the interface in v0.2.0.
Support was moved to a new loopback plugin.

containernetworking/cni#121

Fixes hashicorp#10014
tgross pushed a commit that referenced this issue Jun 20, 2022
CNI changed how to bring up the interface in v0.2.0.
Support was moved to a new loopback plugin.

containernetworking/cni#121

Fixes #10014
@tgross tgross added this to the 1.3.2 milestone Jun 20, 2022
@tgross
Copy link
Member

tgross commented Jun 20, 2022

Fixed by #13428, which will ship in Nomad 1.3.2 (+ backports)

tbehling pushed a commit that referenced this issue Jun 29, 2022
CNI changed how to bring up the interface in v0.2.0.
Support was moved to a new loopback plugin.

containernetworking/cni#121

Fixes #10014
@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
Development

Successfully merging a pull request may close this issue.

5 participants