Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

system: re-evaluate node on feasability changes #11007

Merged
merged 4 commits into from
Aug 10, 2021

Conversation

notnoop
Copy link
Contributor

@notnoop notnoop commented Aug 5, 2021

Fix a bug where system jobs may fail to be placed on a node that
initially was not eligible for system job placement.

This changes causes the reschedule to re-evaluate the node if any
attribute used in feasability checks changes.

Fixes #8448

Fix a bug where system jobs may fail to be placed on a node that
initially was not eligible for system job placement.

This changes causes the reschedule to re-evaluate the node if any
attribute used in feasability checks changes.
@notnoop notnoop self-assigned this Aug 5, 2021
Copy link
Member

@schmichael schmichael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great fix! Sad how long this has been outstanding but happy to see it closed. I think a few more fields need to included and/or documented why they aren't included.

It's unfortunate how this logic depends on logic faraway in feasibility checking... wish there was some way to test them together.

Comment on lines 229 to 237
// check fields used by the feasability checkers, through direct
// or interpolated constraints.
return !(original.ID == updated.ID &&
original.Datacenter == updated.Datacenter &&
original.Name == updated.Name &&
original.NodeClass == updated.NodeClass &&
reflect.DeepEqual(original.Attributes, updated.Attributes) &&
reflect.DeepEqual(original.Meta, updated.Meta) &&
reflect.DeepEqual(original.Drivers, updated.Drivers))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, some other fields that might need to be considered:

  • NodeResources
  • ReservedResources
  • HostVolumes

It might also be worth documenting that DrainStrategy and SchedulingEligibility do not need to be considered here since they're only mutated through other endpoints that will emit node evals of their own.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch on HostVolumes!.

Resource constraints are handled differently: they are handled by blocking evals that got addressed in #5900 .

@vercel vercel bot temporarily deployed to Preview – nomad August 10, 2021 20:19 Inactive
@notnoop notnoop merged commit aad6d40 into main Aug 10, 2021
@notnoop notnoop deleted the b-system-reeval-node-attrs branch August 10, 2021 21:17
notnoop pushed a commit that referenced this pull request Aug 26, 2021
Fix a bug where system jobs may fail to be placed on a node that
initially was not eligible for system job placement.

This changes causes the reschedule to re-evaluate the node if any
attribute used in feasibility checks changes.

Fixes #8448
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

System jobs don't get re-evaluated when node attributes change
2 participants