Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allocrunner: fix health check monitoring for Consul services #16402

Merged
merged 6 commits into from
Mar 10, 2023

Conversation

lgfa29
Copy link
Contributor

@lgfa29 lgfa29 commented Mar 9, 2023

Services must be canonicalized and have its values interpolated before comparing with the values returned by Consul.

Close #16382
Close #16307

@lgfa29 lgfa29 added this to the 1.5.1 milestone Mar 9, 2023
@lgfa29 lgfa29 requested review from shoenig and jrasell March 9, 2023 00:18
@lgfa29 lgfa29 force-pushed the b-fix-consul-health-check-interpolation branch from 3394881 to 0765b15 Compare March 9, 2023 00:19
lgfa29 added a commit that referenced this pull request Mar 9, 2023
nomad/structs/structs.go Outdated Show resolved Hide resolved
client/allochealth/tracker.go Outdated Show resolved Hide resolved
Services must be interpolated to replace runtime variables before they
can be compared against the values returned by Consul.
@lgfa29
Copy link
Contributor Author

lgfa29 commented Mar 10, 2023

After all my failed attempts to refactor the service interpolation so it's closer to the Service struct I decided to take a step back and understand this part of the code better.

Then I realized that my initial implementation was quite wrong. As the name suggests, each TaskEnv holds information for a specific task (or nothing task-related for a group). Since allocHealthWatcherHook is an alloc runner hook we actually need one TaskEnv per task (+1 for the group).

allocHealthWatcherHook also implements the Update() hook, so the allocation may change while the hook is active, in which case we need to update the allocation in the taskenv.Builder and recreate the TaskEnv.

And since taskenv.Builder mutates itself in several different ways, we can't share instances across hooks. taskenv.Builder also mutate themselves in ways that cannot be undone (for example, calling UpdateAlloc(alloc, nil) will not reset the task to empty, it will just ignore the change).

So allocHealthWatcherHook now takes a taskenv.Builder factory to creates a fresh builder when the allocation updates to prevent spurious task information at the group level TaskEnv.

@lgfa29 lgfa29 requested a review from tgross March 10, 2023 15:23
Copy link
Member

@shoenig shoenig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM; just one suggestion you could ignore. my thinking is we want folks to complain about bugs if we are validating / canonicalizing things incorrectly. if we try to do the right thing they may just ignore what is otherwise a serious bug

Comment on lines 525 to 531
if env == nil {
// This is not expected to happen, but guard against a nil
// task environment by using the group environment since it has
// most of the same values.
t.logger.Warn("task environment not found, using group level environment for variable interpolation",
"alloc_id", t.alloc.ID, "task", service.TaskName)
env = t.taskEnvs[""]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we [should] already guard against this in Validate/Canonicalize; instead of trying to make things maybe work in an unexpected situation I think we should just do nothing, e.g.

if env != nil {
 // assign
} else {
  logger.Error("you found a bug!")
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum...good point. Changed to this in 925d347

Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.4.x backport to 1.4.x release line backport/1.5.x backport to 1.5.x release line
Projects
None yet
3 participants