Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't run script health-checks for Consul-connect enabled Nomad jobs. #8952

Closed
zhelezovartem opened this issue Sep 23, 2020 · 3 comments · Fixed by #8977
Closed

Can't run script health-checks for Consul-connect enabled Nomad jobs. #8952

zhelezovartem opened this issue Sep 23, 2020 · 3 comments · Fixed by #8977
Assignees
Labels
theme/consul theme/docs Documentation issues and enhancements type/enhancement

Comments

@zhelezovartem
Copy link

Nomad version

Nomad v0.12.5 (514b0d6)

Operating system and Environment details

Distributor ID: Ubuntu
Description: Ubuntu 18.04.5 LTS
Release: 18.04
Codename: bionic

Also tested on Amazon Linux 2 - the same behavior

Issue

I wanted to setup script health-checks for Consul-connect enabled jobs.
I started with simple job with script healthcheck and without consul-connect feature. Check passed. Then I enabled Consul-connect feature for this job and after that health-check become fail.

Reproduction steps

  1. Run job without consul-connect - check passed
  2. Run job with Consul-connect - check failed

Job file

job file, Consul-connect disabled:

job "example" {
  datacenters = ["eu-central-1"]
  group "cache" {
    task "redis" {
      driver = "docker"
      config {
        image = "redis:3.2"
        port_map {
          db = 6379
        }
      }
      service {
        name = example
        port = "db"
        check {
          type     = "script"
          name     = "test"
          command  = "true"
          interval = "60s"
          timeout  = "5s"
        }
      }
      resources {
        cpu    = 500
        memory = 256
        network {
          mbits = 10
          port  "db"  {}
        }
      }
    }
  }
}

job file, Consul-connect enabled:

job "example" {
  datacenters = ["eu-central-1"]
  group "cache" {
    network {
      mode = "bridge"
    }
    service {
      name = "example"
      port = "6379"
      connect {
        sidecar_service {}
      }
      check {
        type     = "script"
        name     = "test"
        command  = "true"
        interval = "60s"
        timeout  = "5s"
      }
    }
    task "redis" {
      driver = "docker"
      config {
        image = "redis:3.2"
      }
      resources {
        cpu    = 500
        memory = 256
      }
    }
  }
}

Nomad Client logs

There is nothing about this health-check in Nomad client logs, but i can see warning in Consul client logs:

[WARN]  agent: Check missed TTL, is now critical: check=_nomad-check-c39120b3a292cd4511306eac4c8e425dfe70872a

Nomad Server logs

There is nothing about this health-check in Nomad server logs

@shoenig
Copy link
Member

shoenig commented Sep 23, 2020

Hi @zhelezovartem thanks for reporting. Can you try setting the task parameter for the check in the Connect case?

I think when the service is defined at the group level, setting the task is required for script checks - otherwise Nomad does not know which task driver to use to execute the check. Since the check never executes, the result never gets reported to Consul, and Consul complains about missing the TTL.

Nomad should both document and provide validation on job submission to check for this case.

@shoenig shoenig added theme/docs Documentation issues and enhancements type/enhancement labels Sep 23, 2020
@zhelezovartem
Copy link
Author

Hi @shoenig, thanks a lot for great explanation. I followed your advice and now it works. Working job config looks like:

job "example" {
  datacenters = ["eu-central-1"]
  group "cache" {
    network {
      mode = "bridge"
    }
    service {
      name = "example"
      port = "6379"
      connect {
        sidecar_service {}
      }
      check {
        type     = "script"
        task     = "redis"
        name     = "test"
        command  = "true"
        interval = "60s"
        timeout  = "5s"
      }
    }
    task "redis" {
      driver = "docker"
      config {
        image = "redis:3.2"
      }
      resources {
        cpu    = 500
        memory = 256
      }
    }
  }
}

It was a bit unclear for me that check was not executed when inspecting Nomad logs. Maybe will be good to add some sort of warning to Nomad logs for such cases? Something like healthcheck defined but will never be executed, what do you think?

@shoenig shoenig self-assigned this Sep 28, 2020
shoenig added a commit that referenced this issue Sep 28, 2020
When defining a script-check in a group-level service, Nomad needs to
know which task is associated with the check so that it can use the
correct task driver to execute the check.

This PR fixes two bugs:
1) validate service.task or service.check.task is configured
2) make service.check.task inherit service.task if it is itself unset

Fixes #8952
shoenig added a commit that referenced this issue Sep 28, 2020
When defining a script-check in a group-level service, Nomad needs to
know which task is associated with the check so that it can use the
correct task driver to execute the check.

This PR fixes two bugs:
1) validate service.task or service.check.task is configured
2) make service.check.task inherit service.task if it is itself unset

Fixes #8952
roaks3 pushed a commit that referenced this issue Oct 7, 2020
When defining a script-check in a group-level service, Nomad needs to
know which task is associated with the check so that it can use the
correct task driver to execute the check.

This PR fixes two bugs:
1) validate service.task or service.check.task is configured
2) make service.check.task inherit service.task if it is itself unset

Fixes #8952
fredrikhgrelland pushed a commit to fredrikhgrelland/nomad that referenced this issue Oct 22, 2020
When defining a script-check in a group-level service, Nomad needs to
know which task is associated with the check so that it can use the
correct task driver to execute the check.

This PR fixes two bugs:
1) validate service.task or service.check.task is configured
2) make service.check.task inherit service.task if it is itself unset

Fixes hashicorp#8952
@github-actions
Copy link

github-actions bot commented Nov 1, 2022

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 1, 2022
jorgemarey pushed a commit to jorgemarey/nomad that referenced this issue Nov 1, 2023
When defining a script-check in a group-level service, Nomad needs to
know which task is associated with the check so that it can use the
correct task driver to execute the check.

This PR fixes two bugs:
1) validate service.task or service.check.task is configured
2) make service.check.task inherit service.task if it is itself unset

Fixes hashicorp#8952
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
theme/consul theme/docs Documentation issues and enhancements type/enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants