Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to set extra_hosts when using consul-connect (bridged networking) #7746

Closed
spuder opened this issue Apr 19, 2020 · 20 comments · Fixed by #10766
Closed

Unable to set extra_hosts when using consul-connect (bridged networking) #7746

spuder opened this issue Apr 19, 2020 · 20 comments · Fixed by #10766

Comments

@spuder
Copy link
Contributor

spuder commented Apr 19, 2020

Version

Nomad = 0.11.0
CNI Plugins = 0.8.4
Docker = 19.03.7, build 7141c199a2
Docker API = 1.40
OS = Ubuntu 18.04 (4.15.0-91-generic)

Problem

If you attempt to run a job that uses extra_hosts while using bridged networking, you will receive the following error.

      config {
        image = "bash"
        extra_hosts = [
          "foobar.example.com:127.0.0.1"
        ]
failed to create container: API error (400): conflicting options: 
custom host-to-IP mapping and the network mode

This is a major problem because it means any consul connect enabled job is not able to use custom hosts options.

I did find one related issue in docker where using --net=host and --add-hosts was mutually exclusive before docker api version 1.12. I'm not sure which docker api version nomad is using, but 1.40 is the latest

curl --unix-socket /var/run/docker.sock http://localhost/version | jq .ApiVersion
"1.40"

Steps to reproduce

Submit the following job

job "bash" {
  datacenters = ["dc1"]
  group "api" {
    network {
      mode = "bridge"
    }
    task "bash" {
      driver = "docker"
      config {
        image = "bash"
        args = ["/bin/sleep", "100000000"]
        extra_hosts = [
          "foobar.example.com:127.0.0.1"
        ]
      }
    }
  }
}

Workarounds

  1. Don't use extra hosts
  2. Bake the host entry into the docker container (Please add --add-host=[], --net options to docker build moby/moby#10324)
  3. Update nomad to docker api 1.12 or newer?

Possibly Related:

@spuder spuder changed the title Bridged networking not compatible with docker extra_hosts Unable to set extra_hosts when using consul-connect (bridged networking) Apr 19, 2020
@Gufran
Copy link

Gufran commented Apr 22, 2020

I can confirm that this problem in not limited to just the extra_hosts attribute but also to dns_servers and other dns options.

I'm running Nomad v0.10.5 and Docker v18.06.1 with API version 1.38 and minimum version 1.12.

It looks like the problem was fixed in Docker API version v1.12.0, see:

Nomad is developed against Docker version 1.8.2 and 1.9 (Official docs), meaning API version 1.20 and above (See Docker version matrix).

For the time being I am unable to run connect enabled jobs with custom DNS servers because of this problem. The error I get is conflicting options: dns and the network mode.

@Gufran
Copy link

Gufran commented Apr 22, 2020

I tried to run a docker container using the command line and I am able to use --net=bridge with --dns=<ip> on the same machine where Nomad throws an error:

docker run --rm -it --net bridge --dns 1.1.1.1 ubuntu:20.04 cat /etc/resolv.conf
search us-east-1.compute.internal
nameserver 1.1.1.1
options timeout:2 attempts:5

docker run --rm -it --net bridge --dns 8.8.8.8 ubuntu:20.04 cat /etc/resolv.conf
search us-east-1.compute.internal
nameserver 8.8.8.8
options timeout:2 attempts:5

docker run --rm -it --net bridge --dns 8.8.8.8 --dns 1.1.1.1 ubuntu:20.04 cat /etc/resolv.conf
search us-east-1.compute.internal
nameserver 8.8.8.8
nameserver 1.1.1.1
options timeout:2 attempts:5

@Gufran
Copy link

Gufran commented Apr 23, 2020

So I captured the traffic between Nomad and the Docker socket and it turns out that the network mode is container, not bridge.
I don't understand everything yet but I suspect it has to do with the fact that Nomad is using CNI plugins to setup networking and there is an intermediate container acting as the network bridge and gateway.

The new information for me is that the network mode specified in the jobspec is used for some other purpose. I tried to run a container with this new configuration e.g. --net container:container-id --dns 1.1.1.1 and it failed with the same error docker: Error response from daemon: conflicting options: dns and the network mode.

@Gufran
Copy link

Gufran commented Apr 23, 2020

Did some more digging. Now I'm certain that this is because of the CNI based network setup.

Here is the call trace of network setup before the allocation is started:

In my opinion the final call to cni.Setup() should also be given the DNS configuration if specified in the jobspec. something like

dnsConfig := cni.DNS{
  Servers: []string{"1.1.1.1"},
  Searches: []string{},
  Options: []string{},
}

b.cni.Setup(ctx,
            alloc.ID,
            spec.Path,
            cni.WithCapabilityPortMap(getPortMapping(alloc)),
            cni.WithCapabilityDNS(dnsConfig))

should do the job just fine.

I can try this change locally in a while, but it'd be great if someone who knows the codebase can verify the correctness of this patch in the meantime.

@Gufran
Copy link

Gufran commented Apr 23, 2020

@nickethier could you offer some insight here please?

@shoenig shoenig added theme/dependencies Pull requests that update a dependency file theme/driver/docker labels Apr 28, 2020
@nickethier
Copy link
Member

Hey all I missed this one when linking issues but the dns part of this issue is merged and will be in the next major release. See: #7661

We're still evaluating the extra_hosts option as its not something CNI supports directly. Under bridge mode, the docker tasks are using network-mode=container: which I don't think works with the docker extra_hosts flag. The linked issues is for network_mode=host specifically.

@nickethier
Copy link
Member

With regards to the extra_hosts option, would using a template block to write out an /etc/hosts file work? It's definitely not an ideal solution but might be a work around in the interim?

@spuder
Copy link
Contributor Author

spuder commented Apr 29, 2020

Thats a great idea. I think that may be a viable work around

127.0.0.1	localhost
::1	localhost ip6-localhost ip6-loopback
fe00::0	ip6-localnet
ff00::0	ip6-mcastprefix
ff02::1	ip6-allnodes
ff02::2	ip6-allrouters
task "app" {
      driver = "docker"
      config {
        image = "<%= ENV['CI_REGISTRY_IMAGE'] %>:<%= ENV['CI_COMMIT_SHA'] %>"
        volumes = [
          "local/etc/hosts:/etc/hosts",
.....


      template {
        data = <<EOH
127.0.0.1	localhost
127.0.1.1 dev-vault.example.com
::1	localhost ip6-localhost ip6-loopback
fe00::0	ip6-localnet
ff00::0	ip6-mcastprefix
ff02::1	ip6-allnodes
ff02::2	ip6-allrouters
        EOH
        destination = "local/etc/hosts"
      }

@donjon-matter
Copy link

May be my problem is some what similar so I hope I can post it here.
When running nomad with consul connect the /etc/hosts may look like this:

127.0.0.1	localhost
::1	localhost ip6-localhost ip6-loopback

But I expected something like:

127.0.0.1	localhost
**172.0.0.2	abcxyzaaa**
::1	localhost ip6-localhost ip6-loopback

The bold line is the host of Docker container. My application was running Java, and a lot of library is rely on Hostname which cause error when try to resolve abcxyzaaa

@tgross
Copy link
Member

tgross commented Jan 27, 2021

Looks like we've identified a workaround for the upstream issue. I'm going to mark this as a docs issue so that we can provide some official guidance in the networking and/or Connect docs for folks.

@eihli
Copy link

eihli commented Jan 29, 2021

The workaround requires another workaround in the case of wanting to use the special host-gateway string in Docker's add-host. moby/moby#40007

I'm wanting to use extra_hosts = ["host.docker.internal:host-gateway"] but I'm hitting this error. So although templating in /etc/hosts is a workaround, it comes with the additional complexity of getting the address of the host gateway into the container.

@Oloremo
Copy link
Contributor

Oloremo commented Feb 4, 2021

This is an issue for us, we need the app running inside the container to be able to resolve that randomly assigned container hostname. We're running containers in bridge mode and seems like nothing really working for that case.

We tried to template the /etc/hosts with {{ env "HOSTNAME" }} but it returns nothing for some reason while other ENV vars work just fine.

Any ideas or workarounds are welcome.

@Ilhicas
Copy link
Contributor

Ilhicas commented Feb 10, 2021

Just to leave a comment as this breaks a lot of Java based application which rely on hostname -i resolution which can't be done. We are hitting this issue and mixing it up with template with hostname -I to resolve and fix it in /etc/hosts, but this is not a viable/generic solution, requires a lot of tooling in the image running to make it available also changing entrypoint to run this at start time, which is far from ideal.

@Legogris
Copy link

What is the workaround actually? Templating into /etc doesn't work in docker.

@spuder
Copy link
Contributor Author

spuder commented Feb 20, 2021

The workaround is to create a new etc/hosts file at some arbitrary location like the nomad path '/local/etc/hosts' then doing a volume mount to overwrite '/etc/hosts' with '/local/etc/hosts'

 "local/etc/hosts:/etc/hosts",

@tgross tgross removed this from Needs Roadmapping in Nomad - Community Issues Triage Mar 4, 2021
@Oloremo
Copy link
Contributor

Oloremo commented Mar 8, 2021

containerd driver added that: Roblox/nomad-driver-containerd#69

@tgross tgross self-assigned this Jun 9, 2021
@tgross tgross linked a pull request Jun 16, 2021 that will close this issue
@tgross tgross added this to the 1.1.2 milestone Jun 16, 2021
@tgross tgross removed theme/dependencies Pull requests that update a dependency file theme/docs Documentation issues and enhancements labels Jun 16, 2021
@tgross
Copy link
Member

tgross commented Jun 16, 2021

The workaround is to create a new etc/hosts file at some arbitrary location like the nomad path '/local/etc/hosts' then doing a volume mount to overwrite '/etc/hosts' with '/local/etc/hosts'

#10766 will do that for the docker driver, and provides infrastructure for community task drivers to do the same. The exec/java driver has some complications on that (see #10768).

@DejfCold
Copy link

Hi, @tgross!
I've encountered this on v1.1.3.
What are all the requirements for the extra_hosts to be added? Just that it's group.network.mode=bridge and some task.config.extra_host=["hostname:ip"] is present?

I'm asking because I do have that and it's kinda not working.

To be exact, I have: (To be clear, I'm just asking what are the requirements for it to work, not how exactly should I fix my thing ... although that would be also appreciated :) )

// stuff
    group "freeipa" {
        network {
            mode = "bridge"
        }
        service {
            name = "freeipa"
            port = "443"
            connect {
                sidecar_service {}
            }
        }
        task "freeipa" {
            resources {
                memory = 2000
            }
            driver = "docker"
            config {
                image = "freeipa/freeipa-server:centos-8"
                args = [ "ipa-server-install", "-U", "-r", "DC1.CONSUL", "--no-ntp" ]
                sysctl = {
                    "net.ipv6.conf.all.disable_ipv6" = "0"
                }
                extra_hosts = ["freeipa.ingress.dc1.consul:127.0.0.1"]
            }
            env {
                HOSTNAME = "freeipa.ingress.dc1.consul"
                PASSWORD = "testtest"
            }
        }
    }
// stuff

which results in

[root@63ec9326ae5a /]# cat /etc/hosts
# this file was generated by Nomad
127.0.0.1 localhost
::1 localhost
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

# this entry is the IP address and hostname of the allocation
# shared with tasks in the task group's network
172.26.65.239 63ec9326ae5a
[root@63ec9326ae5a /]# 

@tgross
Copy link
Member

tgross commented Aug 14, 2021

Hi @DejfCold!

Just that it's group.network.mode=bridge and some task.config.extra_host=["hostname:ip"] is present?

The requirements from driver.go#L963-L972 are:

  • group.network.mode = "bridge"
  • task.config.extra_hosts = ["hostname:ip"]
  • task.config.network_mode is left unset.

Your jobspec there looks ok to me. The tests in mount_unix_test.go look to cover this use case well. So this looks like it might be a bug. While I wrote this feature I'm no longer at HashiCorp as a Nomad maintainer, so I'd recommend opening a new issue describing the problem so that the maintainers will be sure to see it. Thanks!

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 17, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.