Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

conflicting options: dns and the network mode (bridge) #11857

Closed
baxor opened this issue Jan 14, 2022 · 2 comments · Fixed by #12229
Closed

conflicting options: dns and the network mode (bridge) #11857

baxor opened this issue Jan 14, 2022 · 2 comments · Fixed by #12229
Assignees
Labels
theme/docs Documentation issues and enhancements theme/networking

Comments

@baxor
Copy link

baxor commented Jan 14, 2022

This seems to have been fixed in #8600 , but we are still running into the described behavior.

Nomad version

1.1.9+ent

Operating system and Environment details

Issue

failed to create container: API error (400): conflicting options: dns and the network mode

Reproduction steps

Jobspec is:

job "serviceA-server" {
  datacenters = ["us-east-1"]
  type        = "service"

  update {
    max_parallel     = 1
    health_check     = "checks"
    min_healthy_time = "10s"
    healthy_deadline = "10m"
    auto_revert      = true
    canary           = 0
    stagger          = "30s"
  }

  constraint {
    attribute = "${node.class}"
    value     = "applications"
  }

  group "serviceA-server" {
    count = 2

    network {
      mode = "bridge"
      port "http"{}

      port "check" {
        static = 9999
      }
   }

    # this task will actually run the nginx server
    task "serviceA" {
      driver = "docker"

      config {
        "dns_servers": [
            "172.17.0.1"
       ]
        "dns_search_domains": [
            "service.organization.env"
        ],
        image = "__IMAGE__"

        # the check port need to be static inside the container so our static VTS config (nginx/conf.d/health.conf) works out of the box.
        # this will not affect the external network port (on the host/outside the container)
        ports = ["http", "check"]
      }

      # default http{} nginx configuration
      # sets up logging and other sane defaults
      template {
        source        = "local::_infrastructure/nginx.conf"
        destination   = "/local/nginx/conf.d/http.conf"
        change_mode   = "signal"
        change_signal = "SIGHUP"
      }

      # HTTP health check

      resources {
        cpu    = 500
        memory = 256
      }
    }

    service {
      name = "serviceA"
      port = "http"

      connect {
        sidecar_service {}
      }

      check {
        name     = "http"
        type     = "http"
        port     = "http"
        path     = "/"
        interval = "5s"
        timeout  = "2s"
      }
    }
  }
}

Removing the dns_servers entry in Job.TaskGroup[0].Task[0].Config allows the job to be submitted without error.

Expected Result'

Running task

Actual Result

failed to create container: API error (400): conflicting options: dns and the network mode

Job file (if appropriate)

Nomad Server logs (if appropriate)

Nomad Client logs (if appropriate)

@tgross
Copy link
Member

tgross commented Mar 8, 2022

Hi @baxor! I was able to confirm this behavior but tl;dr this is a documentation miss and not a bug. Here's my Nomad 1.1.9+ent cluster:

$ nomad server members
Name                  Address        Port  Status  Leader  Protocol  Build      Datacenter  Region
nomad-server0.global  192.168.56.10  4648  alive   false   2         1.1.9+ent  dc1         global
nomad-server1.global  192.168.56.20  4648  alive   true    2         1.1.9+ent  dc1         global
nomad-server2.global  192.168.56.30  4648  alive   false   2         1.1.9+ent  dc1         global

It looks like the jobspec you have here has been translated from JSON, because it doesn't work as written (the argument names are quoted, are using the wrong argument assignment operator, and the service name is invalid).

But I've manage to reduce it to a minimal reproduction:

jobspec
job "example" {
  datacenters = ["dc1"]

  group "group" {

    network {
      mode = "bridge"
      port "http" {}
    }

    task "task" {
      driver = "docker"

      config {
        image       = "busybox:1"
        args        = ["httpd", "-v", "-f", "-p", "8001", "-h", "/local"]
        ports       = ["http"]
        dns_servers = ["172.17.0.1"]
      }

      template {
        data        = "<html>hello, world</html>"
        destination = "local/index.html"
      }

      resources {
        cpu    = 128
        memory = 128
      }
    }
  }
}

And after running that I get the following allocation status as you've described:

Recent Events:
Time                       Type            Description
2022-03-08T10:57:32-05:00  Killing         Sent interrupt. Waiting 5s before force killing
2022-03-08T10:57:32-05:00  Not Restarting  Error was unrecoverable
2022-03-08T10:57:32-05:00  Driver Failure  failed to create container: API error (400): conflicting options: dns and the network mode
2022-03-08T10:57:32-05:00  Task Setup      Building Task Directory
2022-03-08T10:57:24-05:00  Received        Task received by client

I've also stood up a current development build of Nomad and I see an identical behavior, so it doesn't look like this was an error in how we built Nomad 1.1.9+ent or something similarly embarassing. 😀

So what's going on? I took a look at the code in #8600 and realized that you've got a task with bridge networking but then you're trying to set the DNS options on the Docker task instead of on the network block. If instead we move the config to the network.dns:

job "example" {
  datacenters = ["dc1"]

  group "group" {

    network {
      mode = "bridge"
      port "http" {}
      dns {
        servers = ["172.17.0.1"]
      }
    }

    task "task" {
      driver = "docker"

      config {
        image       = "busybox:1"
        args        = ["httpd", "-v", "-f", "-p", "8001", "-h", "/local"]
        ports       = ["http"]
      }

      template {
        data        = "<html>hello, world</html>"
        destination = "local/index.html"
      }

      resources {
        cpu    = 128
        memory = 128
      }
    }
  }
}

Then everything works just fine:

$ nomad alloc status 675
...

Recent Events:
Time                       Type        Description
2022-03-08T11:21:31-05:00  Started     Task started by client
2022-03-08T11:21:31-05:00  Task Setup  Building Task Directory
2022-03-08T11:21:30-05:00  Received    Task received by client

$ nomad alloc exec 675 cat /etc/resolv.conf
search fios-router.home
nameserver 172.17.0.1

So this looks like it's a documentation issue. We need to make it more clear that the Docker DNS options should not be used when using bridge networking mode. I'll open a PR with some patches to the docs.

@tgross tgross self-assigned this Mar 8, 2022
@tgross tgross moved this from Needs Triage to In Progress in Nomad - Community Issues Triage Mar 8, 2022
@tgross tgross added theme/docs Documentation issues and enhancements and removed type/bug labels Mar 8, 2022
Nomad - Community Issues Triage automation moved this from In Progress to Done Mar 8, 2022
@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 10, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
theme/docs Documentation issues and enhancements theme/networking
Projects
Development

Successfully merging a pull request may close this issue.

2 participants