Changing the default host_network is broken in 1.7.2 #19726

SamMousa · 2024-01-12T09:57:47Z

Nomad version

v1.7.2

Operating system and Environment details

Ubuntu 22.04.3 LTC

Issue

We have a tiny cluster of 3 nodes running both the client and the server.
There's a private network on a VLAN, which we configured as the host network default.
Each server also has a public IP which we configured as a host network called public.

In the configuration below I duplicated the config for the private network and named the copy private. This actually solves the problem, so it should also give an indication of where the issue is coming from. (Is default a special name perhaps?)

client {
  enabled = true
  servers = ["127.0.0.1"]
  host_network "default" {
    cidr = "192.168.40.0/24"
    reserved_ports = "4646-4648"
  }
  host_network "private" {
    cidr = "192.168.40.0/24"
    reserved_ports = "4646-4648"

  }
  host_network "public" {
    interface = "enp7s0"
  }
}

Reproduction steps

Use the above or similar configuration and try to expose a port on the host_network "default", note that it will get exposed on the public network instead.

Edit: note that our cluster runs several jobs with similar config and I'm not seeing this issue for all of them. (This could be because the others have not changed yet and they might therefore breaking when they get rescheduled or something)

Expected Result

It should work, or if changing the host network with name default is no longer supported it should crash hard instead of silently do the wrong thing.

Actual Result

It exposes the port(s) on the wrong network.

Job file (if appropriate)

job "whoami" {
  datacenters = ["dc1"]
  namespace = "cluster"
  type = "service"
  update {
    max_parallel      = 5
    health_check      = "checks"
    min_healthy_time  = "10s"
    healthy_deadline  = "5m"
    progress_deadline = "10m"
    auto_revert       = true
    #auto_promote      = true
    canary            = 0
    #stagger           = "0s"
  }

  // Only 1 proxy per host maximum.
  spread {
    attribute = node.unique.id
  }

  group "demo" {
    count = 2

    network {
       port "http" {
          static = 12344
          to = 80
          host_network = "private"

       }
    }

    ephemeral_disk {
      size = 15
    }


    service {
      name = "whoami-demo"
      port = "http"
      provider = "nomad"
      tags = [
        "admin-service",
        "traefik.enable=true",
        "traefik.http.routers.whoami.rule=Host(`whoami.somehost.eu`)",
        "traefik.http.routers.whoami.middlewares=sso@file"
      ]
      check {
        type     = "http"
        path     = "/"
        interval = "5s"
        timeout  = "1s"
        method   = "GET"
      }
    }

    task "server" {
      logs {
        max_files     = 10
        max_file_size = 1
      }
      driver = "docker"

      config {
        image = "traefik/whoami"
        ports = ["http"]
      }
      resources {
        cpu    = 100
        memory = 64
      }

    }

  }
}

Nomad Server logs (if appropriate)

Nomad Client logs (if appropriate)

The text was updated successfully, but these errors were encountered:

tgross · 2024-01-17T16:51:22Z

Hi @SamMousa!

This smells a little bit like #18097 which we fixed in #18096, but it's not quite the same thing. I could use a couple clarifications to help narrow it down:

The issue title says "broken in 1.7.2". Is this a regression from a previous version where this configuration worked?
The configuration you have that doesn't work is as follows, correct? If so, by any chance does the interface enp7s0 also have an IP address on the subnet 192.168.40.0/24?

client {
  enabled = true
  servers = ["127.0.0.1"]
  host_network "default" {
    cidr = "192.168.40.0/24"
    reserved_ports = "4646-4648"
  }

  host_network "public" {
    interface = "enp7s0"
  }
}

SamMousa · 2024-01-17T17:46:24Z

Yes, I worked in (i think) 1.6.x.
Currently i see this issue when updating jobs, current jobs are not affected.

The local subnet is a vlan, so it is technically the same interface. But with a postfix. Something like enp7s0@private. The reason I use CIDR for selection is because I got no success when such an interface name.

SamMousa · 2024-01-17T17:47:15Z

However it all works when I just name the network (in nomad agent config) something else than default, so I'm thinking it shouldn't be related to selection logic.

hashworks · 2024-05-29T17:18:48Z

I have the same issue on 1.7.7 and 1.8.0 clients. It doesn't matter if I use cidr or interface to change the default host_network.

This is quite a problem, since I now need to adjust all jobs so they don't use the default network.

hashworks · 2024-05-29T18:13:33Z

To fix this, one needs to change network_interface to the "default" interface one wants services to use if they don't specify a specific host_network.

Specifies the name of the interface to force network fingerprinting on. When run in dev mode, this defaults to the loopback interface. When not in dev mode, the interface attached to the default route is used.

Note: This does not affect outgoing connections.

SamMousa added the type/bug label Jan 12, 2024

jrasell added this to Needs Triage in Nomad - Community Issues Triage via automation Jan 12, 2024

tgross added theme/networking theme/client stage/waiting-reply labels Jan 17, 2024

tgross self-assigned this Jan 17, 2024

tgross moved this from Needs Triage to Triaging in Nomad - Community Issues Triage Jan 17, 2024

tgross removed the stage/waiting-reply label Jan 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changing the default host_network is broken in 1.7.2 #19726

Changing the default host_network is broken in 1.7.2 #19726

SamMousa commented Jan 12, 2024 •

edited

Loading

tgross commented Jan 17, 2024

SamMousa commented Jan 17, 2024

SamMousa commented Jan 17, 2024

hashworks commented May 29, 2024 •

edited

Loading

hashworks commented May 29, 2024

Changing the default host_network is broken in 1.7.2 #19726

Changing the default host_network is broken in 1.7.2 #19726

Comments

SamMousa commented Jan 12, 2024 • edited Loading

Nomad version

Operating system and Environment details

Issue

Reproduction steps

Expected Result

Actual Result

Job file (if appropriate)

Nomad Server logs (if appropriate)

Nomad Client logs (if appropriate)

tgross commented Jan 17, 2024

SamMousa commented Jan 17, 2024

SamMousa commented Jan 17, 2024

hashworks commented May 29, 2024 • edited Loading

hashworks commented May 29, 2024

SamMousa commented Jan 12, 2024 •

edited

Loading

hashworks commented May 29, 2024 •

edited

Loading