Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating job with host_network, no effect #9728

Closed
cpl opened this issue Jan 5, 2021 · 2 comments · Fixed by #9937
Closed

Updating job with host_network, no effect #9728

cpl opened this issue Jan 5, 2021 · 2 comments · Fixed by #9937
Assignees
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/networking type/bug
Milestone

Comments

@cpl
Copy link

cpl commented Jan 5, 2021

Nomad version

Nomad v1.0.1 (c9c68aa)

Operating system and Environment details

Linux 5.4.0-54-generic Ubuntu 20.04 x86_64 GNU/Linux
Hetzner Cloud Instance

Issue

Updating a job with host_network does not seem to have any effect.

It works if the job is first stopped.

Reproduction steps

Setup a Nomad + Consul cluster.

With the following Nomad client config.

datacenter = "de01"
data_dir = "/opt/nomad/data"

bind_addr = "{{ GetPrivateInterfaces | include \"name\" \"ens10\" | attr \"address\" }}"


client {
  network_interface = "ens10"

  host_network "public" {
    interface = "eth0"
    reserved_ports = "22,80,443,8080"
  }

  enabled = true
}
  1. Start the countdash job with #host_network = "public" commented.
  2. Uncomment host_network = "public"
  3. Run/Update job nomad job run countdash.nomad

(step 1,2 work the other way around too)

Job file (if appropriate)

job "countdash" {
   datacenters = ["de01"]
   group "api" {
     network {
       mode = "bridge"
     }

     service {
       name = "count-api"
       port = "9001"

       connect {
         sidecar_service {}
       }
     }

     task "web" {
       driver = "docker"
       config {
         image = "hashicorpnomad/counter-api:latest"
       }
     }
   }

   group "dashboard" {
     network {
       mode = "bridge"
       port "http" {
         #host_network = "public"
         static = 9002
         to     = 9002
       }
     }

     service {
       name = "count-dashboard"
       port = "9002"

       connect {
         sidecar_service {
           proxy {
             upstreams {
               destination_name = "count-api"
               local_bind_port = 8080
             }
           }
         }
       }
     }

     task "dashboard" {
       driver = "docker"
       env {
         COUNTING_SERVICE_URL = "http://${NOMAD_UPSTREAM_ADDR_count_api}"
       }
       config {
         image = "hashicorpnomad/counter-dashboard:latest"
       }
     }
   }
 }
@tgross
Copy link
Member

tgross commented Jan 27, 2021

I was able to reproduce this. I know we've seen similar bugs with Consul Connect-related updates (ex. #9029) so it wouldn't surprise me if it's something like that here too.

Just as @cpl says, it doesn't make a difference whether we're adding or removing the field, and the field does work if it's there in the first place, so it's only the change that's the problem.


Reproduced on the current HEAD of 1.0.3-dev, on our Vagrant development file (after applying the following diff to get more network interfaces):

diff --git a/Vagrantfile b/Vagrantfile
index f55b623b7..e64a99006 100644
--- a/Vagrantfile
+++ b/Vagrantfile
@@ -44,6 +44,10 @@ Vagrant.configure(2) do |config|
                        privileged: false,
                        path: './scripts/vagrant-linux-unpriv-bootstrap.sh'

+               vmCfg.vm.provider "virtualbox" do |_|
+                       vmCfg.vm.network :private_network, ip: LINUX_IP_ADDRESS
+               end
+
                # Expose the nomad api and ui to the host
                vmCfg.vm.network :forwarded_port, guest: 4646, host: 4646, auto_correct: true
                vmCfg.vm.network :forwarded_port, guest: 8500, host: 8500, auto_correct: true

Using the following Nomad configuration:

name = "standalone"

log_level    = "DEBUG"
enable_debug = true
data_dir     = "/var/nomad"

server {
  enabled          = true
  bootstrap_expect = 1
}

client {
  enabled = true

  host_network "public" {
    cidr           = "10.199.0.200/24"
    reserved_ports = "22"
  }
}

The example job that @cpl provided works as-is (thank you for giving us a working reproduction job!). Run the job:

$ nomad job run ./example.nomad
==> Monitoring evaluation "90e3555a"
    Evaluation triggered by job "countdash"
==> Monitoring evaluation "90e3555a"
    Evaluation within deployment: "053316c8"
    Allocation "1038b74d" created: node "d45a0381", group "api"
    Allocation "5fa7526c" created: node "d45a0381", group "dashboard"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "90e3555a" finished with status "complete"

See the resulting address:

$ nomad alloc status 5fa
...
Allocation Addresses (mode = "bridge")
Label                           Dynamic  Address
*http                           yes      10.0.2.15:9002 -> 9002
*connect-proxy-count-dashboard  yes      10.0.2.15:21001 -> 21001

Note the resulting containers so that we can verify tasks are not changed:

$ docker ps
CONTAINER ID   IMAGE                                      COMMAND                  CREATED          STATUS         PORTS     NAMES
3bd7d3556b5d   hashicorpnomad/counter-api:latest          "./counting-service"     7 seconds ago    Up 7 seconds             web-1038b74d-1f4f-f4db-60f8-729603066114
e18aecb5e827   envoyproxy/envoy:v1.11.2                   "/docker-entrypoint.…"   8 seconds ago    Up 7 seconds             connect-proxy-count-api-1038b74d-1f4f-f4db-60f8-729603066114
bf2cc6b05863   hashicorpnomad/counter-dashboard:latest    "./dashboard-service"    8 seconds ago    Up 7 seconds             dashboard-5fa7526c-abaf-360d-842c-6cc406cca55d
8e03d74a8807   envoyproxy/envoy:v1.11.2                   "/docker-entrypoint.…"   9 seconds ago    Up 8 seconds             connect-proxy-count-dashboard-5fa7526c-abaf-360d-842c-6cc406cca55d
071da8cdcdff   gcr.io/google_containers/pause-amd64:3.1   "/pause"                 10 seconds ago   Up 9 seconds             nomad_init_5fa7526c-abaf-360d-842c-6cc406cca55d
a85419e88f40   gcr.io/google_containers/pause-amd64:3.1   "/pause"                 10 seconds ago   Up 9 seconds             nomad_init_1038b74d-1f4f-f4db-60f8-729603066114

Uncomment the host network field and run the job again:

$ nomad job run ./example.nomad
==> Monitoring evaluation "4a528566"
    Evaluation triggered by job "countdash"
==> Monitoring evaluation "4a528566"
    Evaluation within deployment: "40fa584e"
    Allocation "1038b74d" modified: node "d45a0381", group "api"
    Allocation "5fa7526c" modified: node "d45a0381", group "dashboard"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "4a528566" finished with status "complete"

We're told the allocations are being modified in place. But if we check the allocation status we see no change to the network address:

$ nomad alloc status 5fa
...
Allocation Addresses (mode = "bridge")
Label                           Dynamic  Address
*http                           yes      10.0.2.15:9002 -> 9002
*connect-proxy-count-dashboard  yes      10.0.2.15:21001 -> 21001

The containers are unchanged as well:

$ docker ps
CONTAINER ID   IMAGE                                      COMMAND                  CREATED              STATUS              PORTS     NAMES
3bd7d3556b5d   hashicorpnomad/counter-api:latest          "./counting-service"     About a minute ago   Up About a minute             web-1038b74d-1f4f-f4db-60f8-729603066114
e18aecb5e827   envoyproxy/envoy:v1.11.2                   "/docker-entrypoint.…"   About a minute ago   Up About a minute             connect-proxy-count-api-1038b74d-1f4f-f4db-60f8-729603066114
bf2cc6b05863   hashicorpnomad/counter-dashboard:latest    "./dashboard-service"    About a minute ago   Up About a minute             dashboard-5fa7526c-abaf-360d-842c-6cc406cca55d
8e03d74a8807   envoyproxy/envoy:v1.11.2                   "/docker-entrypoint.…"   About a minute ago   Up About a minute             connect-proxy-count-dashboard-5fa7526c-abaf-360d-842c-6cc406cca55d
071da8cdcdff   gcr.io/google_containers/pause-amd64:3.1   "/pause"                 About a minute ago   Up About a minute             nomad_init_5fa7526c-abaf-360d-842c-6cc406cca55d
a85419e88f40   gcr.io/google_containers/pause-amd64:3.1   "/pause"                 About a minute ago   Up About a minute             nomad_init_1038b74d-1f4f-f4db-60f8-729603066114

The server logs show the job be evaluated as an in-place update:

2021-01-27T20:19:23.657Z [DEBUG] http: request complete: method=PUT path=/v1/jobs duration=3.589379ms
2021-01-27T20:19:23.658Z [DEBUG] worker.service_sched: reconciled current state with desired state: eval_id=4a528566-5fdf-6b15-4356-a6757cede82c job_id=countdash namespace=default results="Total changes: (place 0) (destructive 0) (inplace 2) (stop 0)
Created Deployment: "40fa584e-7a41-a2d8-5e07-acb3b388a2a4"
Desired Changes for "api": (place 0) (inplace 1) (destructive 0) (stop 0) (migrate 0) (ignore 0) (canary 0)
Desired Changes for "dashboard": (place 0) (inplace 1) (destructive 0) (stop 0) (migrate 0) (ignore 0) (canary 0)"

But there are no changes logged on the client.

@tgross tgross added stage/accepted Confirmed, and intend to work on. No timeline committment though. and removed stage/needs-investigation labels Jan 27, 2021
@tgross tgross removed their assignment Jan 27, 2021
@nickethier nickethier self-assigned this Feb 1, 2021
@tgross tgross added this to the 1.0.4 milestone Feb 2, 2021
@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/networking type/bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants