Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose stanza is removed from job spec on stop/start. #15871

Closed
dpewsey opened this issue Jan 25, 2023 · 2 comments
Closed

Expose stanza is removed from job spec on stop/start. #15871

dpewsey opened this issue Jan 25, 2023 · 2 comments

Comments

@dpewsey
Copy link

dpewsey commented Jan 25, 2023

Nomad version

Output from nomad version
Nomad v1.4.3 (f464aca)

Operating system and Environment details

Distributor ID: Ubuntu
Description:    Ubuntu 20.04.4 LTS
Release:        20.04
Codename:       focal

Nomad config :

datacenter                   = "eu1"
data_dir                     = "/opt/nomad"
log_level                    = "DEBUG"
log_file                     = "/var/log/nomad.log"
log_json                     = true
server {
  enabled                    = true
  bootstrap_expect           = 5
}

client {
  enabled                    = true
}

server_join {
  retry_join                 = ["192.168.1.1","192.168.1.2","192.168.1.13","192.168.1.4","192.168.1.5"]
  retry_max                  = 3
  retry_interval             = "15s"
}

acl {
  enabled                    = true
}

consul {
  address                    = "127.0.0.1:8500"
  grpc_address               = "127.0.0.1:8502"
  server_service_name        = "nomad"
  client_service_name        = "nomad-client"
  auto_advertise             = true
  server_auto_join           = true
  client_auto_join           = true
  token                      = ""
}

vault {
  enabled                    = true
  address                    = "https://vault.domain.com"
  create_from_role           = "nomad-cluster"
  token                      = ""
}

ui {
  enabled                    =  true

  consul {
    ui_url                   = "https://consul.domain.com/ui"
  }

  vault {
    ui_url                   = "https://vault.domain.com/ui"
  }
}

plugin "docker" {
  config {
    logging {
      type = "loki"
    }
    auth {
      config = "/root/.docker/config.json"
    }
  }
}

telemetry {
  collection_interval        = "5s",
  publish_allocation_metrics = true,
  publish_node_metrics       = true,
  prometheus_metrics         = true
}

Consul config

datacenter                 = "eu1"
data_dir                   = "/opt/consul"
log_level                  = "DEBUG"
node_name                  = "server-1"
advertise_addr             = "192.168.1.1"
encrypt                    = ""

tls {
  defaults {
    ca_file                = ""
    ca_path                = ""
    cert_file              = ""
    key_file               = ""
    verify_incoming        = true
    verify_outgoing        = true
  }
  internal_rpc {
    verify_server_hostname = true
  }
}

auto_encrypt {
  allow_tls                = true
}

retry_join                 =  ["192.168.1.1","192.168.1.2","192.168.1.13","192.168.1.4","192.168.1.5"]

acl {
  enabled                  = true
  default_policy           = "allow"
  enable_token_persistence = true
}

performance {
  raft_multiplier          = 1
}

server                     = true
bootstrap_expect           = 5
bind_addr                  = "192.168.1.1"
client_addr                = "0.0.0.0"

# Enable service mesh
connect {
  enabled                  = true
}

# Addresses and ports
addresses {
  grpc                     = "127.0.0.1"
  https                    = "0.0.0.0"
  dns                      = "127.0.0.1"
}

ports {
  grpc                     = 8502
  grpc_tls                 = 8503
  http                     = 8500
  https                    = 8443
  dns                      = 8600
}

# DNS Recursion
recursors = ["1.1.1.1"]

ui_config {
  enabled                  = true
}

Issue

When stopping/starting the nomad job with an expose stanza and a sidecar proxy the expose stanza get removed from the job spec in nomad and no longer exposes the /metrics. The only way to "fix" the issue is to redeploy the job.

Reproduction steps

Deploying the job. Stopping and then restarting the service

Expected Result

The /metrics endpoint to be exposed and accessible, still in the nomad jobspec

Actual Result

The expose block gets removed from the job spec

image

Job file (if appropriate)

job "service" {
  datacenters = ["eu1"]

  group "frontends" {
    count = 2

    network {
      mode = "bridge"
      port "http" { to = "8080"}
      port "metrics" {}
    }    
    service {
      name = "service"
      port = "http"
      tags = ["http","addr:${NOMAD_HOST_ADDR_metrics}","prometheus"]
      meta {
        metrics_port = "${NOMAD_HOST_PORT_metrics}"
        nomad_alloc_index = "${NOMAD_ALLOC_INDEX}"
        nomad_job_name = "${NOMAD_JOB_NAME}"
      }
      check {
        type     = "http"
        path     = "/ping"
        interval = "10s"
        timeout  = "2s"
      }
      connect {
        sidecar_service {
          tags = ["service-frontend"]
          proxy {
            expose {
              path {
                path            = "/metrics"
                protocol        = "http"
                local_path_port = 8080
                listener_port   = "metrics"
              }
            }
          }
        }
      }
    }

    task "service-frontend" {
      driver = "docker"

      config {
        image = ""
        command = "bundle"
        args = ["exec", "puma", "-C", "config/puma.rb"]
        ports = ["http"]
      }

      resources {
        memory = 513
      }
    }
  }

  group "sidekiq" {
    count = 4

    update {
      max_parallel = 1
    }
    network {
      mode = "bridge"
      port "http" { to = "9359"}
      port "metrics" {}
    }
    service {
      name = "service-sidekiq"
      port = "http"
      tags = ["http","addr:${NOMAD_HOST_ADDR_metrics}","prometheus"]
      meta {
        metrics_port = "${NOMAD_HOST_PORT_metrics}"
        nomad_alloc_index = "${NOMAD_ALLOC_INDEX}"
        nomad_job_name = "${NOMAD_JOB_NAME}"
      }
      connect {
        sidecar_service {
          tags = ["service"]
          proxy {
            expose {
              path {
                path            = "/metrics"
                protocol        = "http"
                local_path_port = 9359
                listener_port   = "metrics"
              }
            }
          }
        }
      }
     }

    task "sidekiq" {
      driver = "docker"

      kill_timeout = "15s"
      
      config {
        image = ""
        command = "bundle"
        args = ["exec", "sidekiq", "-t", "10"]   
        ports = ["http"] 
      }

      resources {
        memory = 256
      }
    }
  }
}

Nomad Server logs (if appropriate)

Nomad Client logs (if appropriate)

No relevant logs discovered

@lgfa29
Copy link
Contributor

lgfa29 commented Jan 26, 2023

Hi @dpewsey 👋

Thanks for the report. It seems like we may be dropping this field as the allocation data routes around the client and server. I will see if I can find where this is happening.

@lgfa29
Copy link
Contributor

lgfa29 commented Jan 26, 2023

Oh, I think this is a duplicate of #11304 and #12174, so I'm going to close this one. Feel free to add any more context in those issue 🙂

@lgfa29 lgfa29 closed this as not planned Won't fix, can't repro, duplicate, stale Jan 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants