Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement alloc runner task restart hook #9869

Merged
merged 3 commits into from
Jan 22, 2021
Merged

Conversation

tgross
Copy link
Member

@tgross tgross commented Jan 21, 2021

Fixes #9176

Most allocation hooks don't need to know when a single task within the
allocation is restarted. The check watcher for group services triggers the
alloc runner to restart all tasks, but the alloc runner's Restart method
doesn't trigger any of the alloc hooks, including the group service hook. The
result is that after the first time a check triggers a restart, we'll never
restart the tasks of an allocation again.

This commit adds a RunnerTaskRestartHook interface so that alloc runner
hooks can act if a task within the alloc is restarted. The only implementation
is in the group service hook, which will force a re-registration of the
alloc's services and fix check restarts.

@vercel vercel bot temporarily deployed to Preview – nomad-storybook-and-ui January 21, 2021 19:54 Inactive
@vercel vercel bot temporarily deployed to Preview – nomad January 21, 2021 19:54 Inactive
@vercel vercel bot temporarily deployed to Preview – nomad January 21, 2021 21:05 Inactive
@vercel vercel bot temporarily deployed to Preview – nomad-storybook-and-ui January 21, 2021 21:05 Inactive
@tgross tgross marked this pull request as ready for review January 21, 2021 21:16
@vercel vercel bot temporarily deployed to Preview – nomad-storybook-and-ui January 21, 2021 21:16 Inactive
@vercel vercel bot temporarily deployed to Preview – nomad January 21, 2021 21:16 Inactive
@tgross tgross requested a review from shoenig January 21, 2021 21:16
@vercel vercel bot temporarily deployed to Preview – nomad January 21, 2021 21:17 Inactive
@vercel vercel bot temporarily deployed to Preview – nomad-storybook-and-ui January 21, 2021 21:17 Inactive
Most allocation hooks don't need to know when a single task within the
allocation is restarted. The check watcher for group services triggers the
alloc runner to restart all tasks, but the alloc runner's `Restart` method
doesn't trigger any of the alloc hooks, including the group service hook. The
result is that after the first time a check triggers a restart, we'll never
restart the tasks of an allocation again.

This commit adds a `RunnerTaskRestartHook` interface so that alloc runner
hooks can act if a task within the alloc is restarted. The only implementation
is in the group service hook, which will force a re-registration of the
alloc's services and fix check restarts.
@vercel vercel bot temporarily deployed to Preview – nomad January 21, 2021 21:37 Inactive
@vercel vercel bot temporarily deployed to Preview – nomad-storybook-and-ui January 21, 2021 21:37 Inactive
Copy link
Member

@shoenig shoenig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean the warning can be removed from https://www.nomadproject.io/docs/job-specification/check_restart ?

e2e/consul/check_restart.go Outdated Show resolved Hide resolved
@vercel vercel bot temporarily deployed to Preview – nomad-storybook-and-ui January 22, 2021 15:34 Inactive
@tgross tgross merged commit a5b926c into master Jan 22, 2021
@tgross tgross deleted the b-group-check-restart branch January 22, 2021 15:55
@tgross tgross added this to the 1.0.3 milestone Jan 22, 2021
@carmstrong
Copy link

@tgross We see this was flagged as part of the 1.0.3 milestone, but the release notes suggest 1.0.3 is only security fixes. Should we expect this in 1.0.4?

@tgross
Copy link
Member Author

tgross commented Feb 1, 2021

The 1.0.3 milestone got re-named to 1.0.4 once the 1.0.3 release went out. Because that was a security patch we weren't able to rename the milestone ahead of disclosing it. So this will go out in 1.0.4.

@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 30, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unhealthy task is only restarted once despite restart policy
3 participants