Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport of scheduler: fix reconciliation of reconnecting allocs into release/1.5.x #16647

Conversation

hc-github-team-nomad-core
Copy link
Contributor

Backport

This PR is auto-generated from #16609 to be assessed for backporting due to the inclusion of the label backport/1.5.x.

The below text is copied from the body of the original PR.


When a disconnect client reconnects the allocReconciler must find the
allocations that were created to replace the original disconnected
allocations.

This process was being done in only a subset of non-terminal untainted
allocations, meaning that, if the replacement allocations were not in
this state the reconciler didn't stop them, leaving the job in an
inconsistent state.

This inconsistency is only solved in a future job evaluation, but at
that point the allocation is considered reconnected and so the specific
reconnection logic was not applied, leading to unexpected outcomes.

This commit fixes the problem by running reconnecting allocation
reconciliation logic earlier into the process, leaving the rest of the
reconciler oblivious of reconnecting allocations.

It also uses the full set of allocations to search for replacements,
stopping them even if they are not in the untainted set.

The system SystemScheduler is not affected by this bug because
disconnected clients don't trigger replacements: every eligible client
is already running an allocation.

Closes #15483

@hc-github-team-nomad-core hc-github-team-nomad-core force-pushed the backport/b-reconciler-reconnect-fail/physically-key-sunfish branch from 0caaf5a to 6c528b9 Compare March 24, 2023 23:39
@hc-github-team-nomad-core hc-github-team-nomad-core merged commit 325a930 into release/1.5.x Mar 24, 2023
@hc-github-team-nomad-core hc-github-team-nomad-core deleted the backport/b-reconciler-reconnect-fail/physically-key-sunfish branch March 24, 2023 23:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants