Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport of scheduler: ensure dup alloc names are fixed before plan submit. into release/1.6.x #18891

Merged

Conversation

hc-github-team-nomad-core
Copy link
Contributor

Backport

This PR is auto-generated from #18873 to be assessed for backporting due to the inclusion of the label backport/1.6.x.

🚨

Warning automatic cherry-pick of commits failed. If the first commit failed,
you will see a blank no-op commit below. If at least one commit succeeded, you
will see the cherry-picked commits up to, not including, the commit where
the merge conflict occurred.

The person who merged in the original PR is:
@jrasell
This person should manually cherry-pick the original PR into a new backport PR,
and close this one when the manual backport PR is merged in.

merge conflict error: POST https://api.github.com/repos/hashicorp/nomad/merges: 409 Merge conflict []

The below text is copied from the body of the original PR.


This change fixes a bug within the generic scheduler which meant duplicate alloc indexes (names) could be submitted to the plan applier and written to state. The bug originates from the placements calculation notion that names of allocations being replaced are blindly copied to their replacement. This is not correct in all cases, particularly when dealing with canaries.

The fix updates the alloc name index tracker to include minor duplicate tracking. This can be used when computing placements to ensure duplicate are found, and a new name picked before the plan is submitted. The name index tracking is now passed from the reconciler to the generic scheduler via the results, so this does not have to be regenerated, or another data structure used.

closes #10727

Reviewer Notes

The new test TestServiceSched_JobModify_ProposedDuplicateAllocIndex mimics the reproduction behaviour and can be used to see how the code change has fixed the bug.

This reproduction can be used to test the code change in a manual way. The reproduction (in my testing) worked 100% of the time.


Overview of commits

@hashicorp-cla
Copy link

hashicorp-cla commented Oct 27, 2023

CLA assistant check
All committers have signed the CLA.

@vercel vercel bot temporarily deployed to Preview – nomad October 27, 2023 13:21 Inactive
@vercel vercel bot temporarily deployed to Preview – nomad-storybook-and-ui October 27, 2023 13:28 Inactive
This change fixes a bug within the generic scheduler which meant
duplicate alloc indexes (names) could be submitted to the plan
applier and written to state. The bug originates from the
placements calculation notion that names of allocations being
replaced are blindly copied to their replacement. This is not
correct in all cases, particularly when dealing with canaries.

The fix updates the alloc name index tracker to include minor
duplicate tracking. This can be used when computing placements to
ensure duplicate are found, and a new name picked before the plan
is submitted. The name index tracking is now passed from the
reconciler to the generic scheduler via the results, so this does
not have to be regenerated, or another data structure used.
@jrasell jrasell force-pushed the backport/gh-10727-sched-fix-method/duly-awake-calf branch from 3212b1a to 32a2c38 Compare October 27, 2023 15:25
@jrasell jrasell marked this pull request as ready for review October 27, 2023 15:26
@jrasell jrasell merged commit 41339b3 into release/1.6.x Oct 27, 2023
23 of 24 checks passed
@jrasell jrasell deleted the backport/gh-10727-sched-fix-method/duly-awake-calf branch October 27, 2023 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants