Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing the default host_network is broken in 1.7.2 #19726

Open
SamMousa opened this issue Jan 12, 2024 · 5 comments
Open

Changing the default host_network is broken in 1.7.2 #19726

SamMousa opened this issue Jan 12, 2024 · 5 comments

Comments

@SamMousa
Copy link
Contributor

SamMousa commented Jan 12, 2024

Nomad version

v1.7.2

Operating system and Environment details

Ubuntu 22.04.3 LTC

Issue

We have a tiny cluster of 3 nodes running both the client and the server.
There's a private network on a VLAN, which we configured as the host network default.
Each server also has a public IP which we configured as a host network called public.

In the configuration below I duplicated the config for the private network and named the copy private. This actually solves the problem, so it should also give an indication of where the issue is coming from. (Is default a special name perhaps?)

client {
  enabled = true
  servers = ["127.0.0.1"]
  host_network "default" {
    cidr = "192.168.40.0/24"
    reserved_ports = "4646-4648"
  }
  host_network "private" {
    cidr = "192.168.40.0/24"
    reserved_ports = "4646-4648"

  }
  host_network "public" {
    interface = "enp7s0"
  }
}

Reproduction steps

Use the above or similar configuration and try to expose a port on the host_network "default", note that it will get exposed on the public network instead.

Edit: note that our cluster runs several jobs with similar config and I'm not seeing this issue for all of them. (This could be because the others have not changed yet and they might therefore breaking when they get rescheduled or something)

Expected Result

It should work, or if changing the host network with name default is no longer supported it should crash hard instead of silently do the wrong thing.

Actual Result

It exposes the port(s) on the wrong network.

Job file (if appropriate)

job "whoami" {
  datacenters = ["dc1"]
  namespace = "cluster"
  type = "service"
  update {
    max_parallel      = 5
    health_check      = "checks"
    min_healthy_time  = "10s"
    healthy_deadline  = "5m"
    progress_deadline = "10m"
    auto_revert       = true
    #auto_promote      = true
    canary            = 0
    #stagger           = "0s"
  }

  // Only 1 proxy per host maximum.
  spread {
    attribute = node.unique.id
  }

  group "demo" {
    count = 2

    network {
       port "http" {
          static = 12344
          to = 80
          host_network = "private"

       }
    }

    ephemeral_disk {
      size = 15
    }


    service {
      name = "whoami-demo"
      port = "http"
      provider = "nomad"
      tags = [
        "admin-service",
        "traefik.enable=true",
        "traefik.http.routers.whoami.rule=Host(`whoami.somehost.eu`)",
        "traefik.http.routers.whoami.middlewares=sso@file"
      ]
      check {
        type     = "http"
        path     = "/"
        interval = "5s"
        timeout  = "1s"
        method   = "GET"
      }
    }

    task "server" {
      logs {
        max_files     = 10
        max_file_size = 1
      }
      driver = "docker"

      config {
        image = "traefik/whoami"
        ports = ["http"]
      }
      resources {
        cpu    = 100
        memory = 64
      }

    }

  }
}

Nomad Server logs (if appropriate)

Nomad Client logs (if appropriate)

@jrasell jrasell added this to Needs Triage in Nomad - Community Issues Triage via automation Jan 12, 2024
@tgross
Copy link
Member

tgross commented Jan 17, 2024

Hi @SamMousa!

This smells a little bit like #18097 which we fixed in #18096, but it's not quite the same thing. I could use a couple clarifications to help narrow it down:

  • The issue title says "broken in 1.7.2". Is this a regression from a previous version where this configuration worked?
  • The configuration you have that doesn't work is as follows, correct? If so, by any chance does the interface enp7s0 also have an IP address on the subnet 192.168.40.0/24?
client {
  enabled = true
  servers = ["127.0.0.1"]
  host_network "default" {
    cidr = "192.168.40.0/24"
    reserved_ports = "4646-4648"
  }

  host_network "public" {
    interface = "enp7s0"
  }
}

@SamMousa
Copy link
Contributor Author

Yes, I worked in (i think) 1.6.x.
Currently i see this issue when updating jobs, current jobs are not affected.

The local subnet is a vlan, so it is technically the same interface. But with a postfix. Something like enp7s0@private. The reason I use CIDR for selection is because I got no success when such an interface name.

@SamMousa
Copy link
Contributor Author

However it all works when I just name the network (in nomad agent config) something else than default, so I'm thinking it shouldn't be related to selection logic.

@hashworks
Copy link

hashworks commented May 29, 2024

I have the same issue on 1.7.7 and 1.8.0 clients. It doesn't matter if I use cidr or interface to change the default host_network.

This is quite a problem, since I now need to adjust all jobs so they don't use the default network.

@hashworks
Copy link

To fix this, one needs to change network_interface to the "default" interface one wants services to use if they don't specify a specific host_network.

Specifies the name of the interface to force network fingerprinting on. When run in dev mode, this defaults to the loopback interface. When not in dev mode, the interface attached to the default route is used.

Note: This does not affect outgoing connections.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

3 participants