Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to announce CNI provided IP address in consul #8801

Closed
Tetha opened this issue Sep 1, 2020 · 4 comments · Fixed by #9095
Closed

Unable to announce CNI provided IP address in consul #8801

Tetha opened this issue Sep 1, 2020 · 4 comments · Fixed by #9095
Assignees
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/networking type/enhancement
Milestone

Comments

@Tetha
Copy link

Tetha commented Sep 1, 2020

Hello,

I'm currently trying to implement a private overlay network for our
containers based on weave and CNI and I'm currently entirely stuck with
the consul integration. This might be connected to #8698, but currently I
have too many moving parts I'm not familiar with to say with confidence.

Nomad version

0.12.3 (CLI, Servers, Clients)

Operating system and Environment details

  • Nomad-Servers based on Centos8
  • Nomad-Clients based on Centos7, running Docker + Weave

Issue

Bear with me if this is a bit longer, but this is confusing. ;)

Basically, my understanding of how this should work would be the following:

  • I deploy the weave net on the nomad clients and join the weave agents to
    form the weave mesh. This works. I can launch containers via the docker
    CLI and those containers get private IPs from the weave net. If I send traffic
    from one container to a container on another nomad client, I only see encrypted
    weave traffic in my packet dumps on the network. That's fine, since weave is out
    of scope here anyway.

  • Second, I configure nomad >= 0.12 to recognize the CNI plugin of weave. For this, I've
    deployed the standard CNI plugins, a weave-provided CNI configuration with the weave networ
    and port mapping into the directory configured in the nomad client:

     $ cat /etc/cni/net.d/10-weave.conflist
     {
         "cniVersion": "0.3.0",
         "name": "weave",
         "plugins": [
             {
                 "name": "weave",
                 "type": "weave-net",
                 "hairpinMode": true
             },
             {
                 "type": "portmap",
                 "capabilities": {"portMappings": true},
                 "snat": true
             }
         ]
     }
     $ cat /etc/nomad/client.hcl
     ...
     client {
         enabled = true
         cni_config_dir = "/etc/cni/net.d"
         cni_path = "/opt/cni/bin"
     }
     ...
    

From here, I can pretty much see that nomad is fingerprinting the network (and complains about
network speeds):

[DEBUG] client.fingerprint_mgr: detected CNI network: name=weave
    ...
    [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=weave
    [DEBUG] client.fingerprint_mgr.network: unable to read link speed: path=/sys/class/net/weave/speed
  • Third (which took a bit of trial and error), I can use the group-level networking stanza in order to create
    a weave-joined container by using the network mode "cni/weave". My job spec is rather simple at the moment for troubleshooting:

     job "networking-test" {
       datacenters = ["dc1"]
     
       group "test" {
         count = 6
         spread {
           attribute = "${node.unique.id}"
         }
         meta {
           FORCE = 14
         }
     
         network {
           mode = "cni/weave"
           //port "http" {}
         }
     
         //service {
           //name = "my-test-service"
           //address_mode = "driver"
           //port = 8080
         //}
     
         task "test" {
           driver = "docker"
     
           config {
             image= "debian",
             //network_mode="weave",
             entrypoint = [
                   "/bin/bash",
                   "-c",
                   "while true; do  sleep 100; done"
             ],
           }
     
     
         }
       }
     }
    

If I deploy this job into my cluster, I can nomad exec into the containers on different nomad clients.
Those containers have exactly one network interface with a private IP from the weave net assigned,
can ping / send traffic to each other, and the traffic is properly
encrypted with the weave net. As a negative test, I've also undeployed the entire weave net, and if
I keep the network mode as cni/weave, nomad just complains about missing networks and does not schedule
any allocations.

So, overall, at this point, nomad is correctly integrated with the weave net and correctly utilizes the
weave net CNI plugin in order to add the containers to the private weave net. And if I disregard the current
documentation situation, that was actually simple and painless.

And this is where my current issue starts:

Now, I want to register the weave private IP provided by the CNI plugin in consul as a service so the services
can find their private IPs and communicate protected by the weave encryption.

As usual, I'd have to add a service stanza (commented out in the earlier jobspec) with or without a name in
order to tell nomad to register a consul service, probably at a task group level (not 100% sure about this).
The port would be fixed, because each container gets their own IP so I don't need to worry about port-mapping
and packing ports closely:

   service {
       name = "my-test-service"
       port = 8080
   }

I'd have expected this to register one consul service per allocation containing the weave private IP and port 8080.

In reality, nomad registers the host IP and port 8080 for each allocation.

I have also attempted to fiddle around with the address_mode of the service, as well as the network mode
of the docker driver.

  • Modifying address_mode to "driver" or "auto" didn't really change the observable behavior of the
    registered consul services. They remain registered as the host IPs.
  • If I modify the network_mode in the config of the docker driver to "weave", my container ends up with
    two network interfaces - one registered by weave with a weave private IP, and another one with an IP of
    the docker / nomad bridge. The service in consul does not change. And even though I can attach
    the weave net like this, this is pushing more networking into the docker daemon, away from
    nomad + the CNI plugins.

At this point I'm confused, because in other parts of the documentation it would "just work" here.

Can you offer me some direction where I'm doing something wrong? Or, if you need further information
(nomad client logs, docker container details, ..) let me know.

@shoenig shoenig added stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/networking type/enhancement labels Sep 3, 2020
@shoenig
Copy link
Member

shoenig commented Sep 3, 2020

Thanks for the detailed write-up @Tetha ! Announcing CNI addresses in Consul is something we're already working on and plan to ship in an upcoming release. Stay tuned!

@shoenig shoenig added this to the 0.13 milestone Sep 3, 2020
@Tetha
Copy link
Author

Tetha commented Sep 4, 2020

Hello,

thanks for the information.

Do you have a rough estimation when 0.13 is going to drop? This might end up being a blocking issue for our internal nomad-prod rollout.

@tcdev0
Copy link

tcdev0 commented Sep 10, 2020

With a lot of trial and error i came up with this configuration.
All Services get an a private weave ip address which is registered in consul and reachable from Traefik LB.
I dont know if its meant to work that way, but for me it is.

job "service" {

  datacenters = ["dc1"]
  type = "service"

  group "service-web" {
    count = 5

    task "service-web" {

      driver = "docker"

      config {
        image = "nginx"
        network_mode = "weave"
        port_map {
          http = 80
        }
      }

      service {
        name = "service-web"
        tags = [
          "traefik.enable=true",
          "traefik.http.routers.service-web.entrypoints=websecure",
          "traefik.http.routers.service-web.rule=Host(`service-web.io`)",
          "traefik.http.routers.service-web.tls=true",
          "traefik.http.routers.service-web.middlewares=securedheaders@file",
        ]
        address_mode = "driver"
        port = "80"

        check {
          name     = "alive"
          type     = "tcp"
          interval = "4s"
          timeout  = "2s"
          address_mode="driver"
        }
      }

      resources {
        cpu    = 300 # MHz
        memory = 128 # MB
        network {
          mode = "cni/weave"
        }
      }

    }
  }
}

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 29, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/networking type/enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants