Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

health checks fail when using variables in service tags #2969

Closed
hsmade opened this issue Aug 4, 2017 · 12 comments
Closed

health checks fail when using variables in service tags #2969

hsmade opened this issue Aug 4, 2017 · 12 comments

Comments

@hsmade
Copy link
Contributor

hsmade commented Aug 4, 2017

Nomad version

Nomad v0.6.0

Operating system and Environment details

Linux ubuntu-xenial 4.4.0-87-generic (vagrant)

Issue

When using variables in service tags, jobs will never become healthy

Reproduction steps

Run the job file and check nomad status myjob. It doesn't get healthy. If you remove or replace the tag with something that is not a variable, it works. In consul, the service is healthy nevertheless.

Nomad Server logs (if appropriate)

Nomad Client logs (if appropriate)

Job file (if appropriate)

job "myjob" {
  type = "service"
  datacenters = ["dc1"]
  update { 
    max_parallel     = 1
    health_check     = "checks" 
    min_healthy_time = "3s"
    healthy_deadline = "10m" 
    auto_revert      = true
  }
  group "frontend" {
    count = 5
    task "webserver" {
      driver = "docker"
      config {
        image = "abiosoft/caddy"
        volumes = ["/demo/webroot:/srv:ro"]
        args = ["log stdout"]
        port_map { web = 2015 }
      }
      resources {
        network {
          port "web" {}
        }
        memory = 10
      }
      meta {
        test = "test"
      }
      service {
        name = "simple-service"
        port = "web"
        tags = ["${NOMAD_DC}"]
        check {
          type = "http"
          path = "/"
          port = "web"
          timeout = "1s"
          interval = "10s"
        }
      }
    }
  }
}
@schmichael
Copy link
Member

Caddy appears to be returning a 404 which causes the health check to fail, but there does appear to be a bug around service tags that can break checks as well. Investigating.

@hsmade
Copy link
Contributor Author

hsmade commented Aug 4, 2017

That's because you don't have the mount I specified, sorry for that. In my setup I have a index.html in that mount.

@dadgar
Copy link
Contributor

dadgar commented Aug 5, 2017

Thanks for the reported. Reproduced and will have a fix for 0.6.1!

dadgar added a commit that referenced this issue Aug 7, 2017
Fixes an issue in which the allocation health watcher was checking for
allocations health based on un-interpolated services and checks. Change
the interface for retrieving check information from Consul to retrieving
all registered services and checks by allocation. In the future this
will allow us to output nicer messages.

Fixes #2969
@dadgar
Copy link
Contributor

dadgar commented Aug 10, 2017

For those who are running into this issue, you can use these binaries or wait till 0.6.1 which will be out in a a week or two.

darwin_amd64.zip
linux_amd64.zip
windows_amd64.zip

@stevenscg
Copy link

@dadgar FYI. I was having problems with checks when using ${NOMAD_ADDR_fpm} in check > args with 0.6.0. No such problems when running with the new binary on a test system. 👍

@tino
Copy link

tino commented Aug 12, 2017

@dadgar this seems to work indeed. However, the status command has disappeared?

⌘ ./nomad status
Usage: nomad [-version] [-help] [-autocomplete-(un)install] <command> [<args>]

Available commands are:
    agent                 Runs a Nomad agent
    agent-info            Display status information about the local agent
    alloc-status          Display allocation status information and metadata
    client-config         View or modify client configuration details
    deployment            Interact with deployments
    eval-status           Display evaluation status and placement failure reasons
    fs                    Inspect the contents of an allocation directory
    init                  Create an example job file
    inspect               Inspect a submitted job
    job                   Interact with jobs
    keygen                Generates a new encryption key
    keyring               Manages gossip layer encryption keys
    logs                  Streams the logs of a task.
    node-drain            Toggle drain mode on a given node
    node-status           Display status information about nodes
    operator              Provides cluster-level tools for Nomad operators
    plan                  Dry-run a job update to determine its effects
    run                   Run a new job or update an existing job
    server-force-leave    Force a server into the 'left' state
    server-join           Join server nodes together
    server-members        Display a list of known servers and their status
    stop                  Stop a running job
    validate              Checks if a given job specification is valid
    version               Prints the Nomad version

⌘ ./nomad -v
Nomad v0.6.0-dev (1f3966e65e6faa5f3395f7d85a6ec5ffa03d8a80+CHANGES)

@stevenscg
Copy link

@tino Instead of "nomad status {stuff}", do "nomad job status {stuff}". My fingers really didn't want to make the transition, but I like where it's headed.

@tino
Copy link

tino commented Aug 13, 2017

@stevenscg ah thanks. Guess that needs some documentation.

I might be able to understand where it is headed, but I'm not sure I like it. After run, I already need multiple commands to find out what's going on when things don't go smoothly. Moving essential commands below a subcommand doesn't make that easier...🤔.

@shantanugadgil
Copy link
Contributor

changing nomad status to nomad job status is indeed a bit of muscle memory to retrain 😀

@dadgar
Copy link
Contributor

dadgar commented Aug 14, 2017

@stevenscg @tino @shantanugadgil You all are on the bleeding edge :) nomad status will be coming back. It will change from just showing job status to becoming a router into the appropriate status command. So if you paste a job it will show job status, alloc -> alloc-status etc.

@hsmade
Copy link
Contributor Author

hsmade commented Aug 15, 2017 via email

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 10, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants