Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix health checking for ephemeral poststart tasks #11945

Merged
merged 5 commits into from
Feb 2, 2022

Conversation

beautifulentropy
Copy link
Contributor

@beautifulentropy beautifulentropy commented Jan 27, 2022

This PR updates the logic in the Nomad client's alloc health tracker which
erroneously marks existing healthy allocations with dead poststart ephemeral
tasks as unhealthy even if they were already successful during a previous
deployment. For repro see: #10058 (comment)

Currently, users are having to insert sleep after short-lived ephemeral tasks.
This change ensures that poststart lifecycle tasks which have succeeded,
even before 'min_healthy_time', will not result in the whole allocation being
marked as unhealthy.

I've attempted to include test coverage that sticks to your existing conventions.

  • Tracker will not attempt to evaluate the health of poststart ephemeral tasks
    which have already succeeded with any running duration.
  • Add a mock and helper for deployments of allocations with poststart tasks

Fixes #9254
Fixes #10058

@hashicorp-cla
Copy link

hashicorp-cla commented Jan 27, 2022

CLA assistant check
All committers have signed the CLA.

@vercel vercel bot temporarily deployed to Preview – nomad January 27, 2022 21:14 Inactive
@beautifulentropy beautifulentropy changed the title Fix health checking for ephemeral tasks Fix health checking for ephemeral poststart tasks Jan 27, 2022
Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @beautifulentropy!

@tgross
Copy link
Member

tgross commented Feb 2, 2022

This will ship in the next major release (1.3.0) and get backported to 1.2.x and 1.1.x. Thanks again!

@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
backport/1.1.x backport to 1.1.x release line backport/1.2.x backport to 1.1.x release line
Projects
None yet
Development

Successfully merging this pull request may close these issues.

unhealthy deployment with poststart lifecycle Both tasks marked as unhealthy if only one fails.
4 participants