Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

health: detect missing task checks #7366

Closed
wants to merge 3 commits into from

Commits on Mar 17, 2020

  1. tests: add a check for failing service checks

    Add tests to check for failing or missing service checks in consul
    update.
    Mahmood Ali committed Mar 17, 2020
    Configuration menu
    Copy the full SHA
    b9e3e12 View commit details
    Browse the repository at this point in the history
  2. health: detect missing task checks

    Fixes a bug where an allocation is considered healthy if some of the
    tasks are being restarted and as such, their checks aren't tracked by
    consul agent client.
    
    The underlying problem is that allocation registration in consul
    agent/client code is mutable: tasks get removed as services from consul,
    prior to stopping/restarting to allow for graceful removal from LBs.
    The downside is that the health tracker may consider the allocation as
    healthy if one of the task is down.
    
    This uses the simplest approach to patch the problem by detecting the
    number of expected checks against the registered checks.
    
    I don't anticipate disrepency of counters.  `sreg.Checks` should only
    contain checks that nomad agent explicitly registered and filter out
    unexpected or unrelated checks:
    https://github.com/hashicorp/nomad/blob/0ecda992317d3300e1c1da05170f8bba18410357/command/agent/consul/client.go#L1138-L1147
    .
    
    A better approach would have been to strictly compare the found check
    IDs against an immutable list of expected IDs.  This sadly requires
    significant code changes both to task runner service hooks and consul
    hooks, that I'm not comfortable so close to cutting a new release.
    Mahmood Ali committed Mar 17, 2020
    Configuration menu
    Copy the full SHA
    0047882 View commit details
    Browse the repository at this point in the history

Commits on Mar 18, 2020

  1. more targetted health tests

    Mahmood Ali committed Mar 18, 2020
    Configuration menu
    Copy the full SHA
    a263a4e View commit details
    Browse the repository at this point in the history