Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: emit node evals only for sys jobs in dc #12955

Merged
merged 1 commit into from
Jul 6, 2022
Merged

Conversation

schmichael
Copy link
Member

Whenever a node joins the cluster, either for the first time or after
being down, we emit a evaluation for every system job to ensure all
applicable system jobs are running on the node.

This patch adds an optimization to skip creating evaluations for system
jobs not in the current node's DC. While the scheduler performs the same
feasability check, skipping the creation of the evaluation altogether
saves disk, network, and memory.

Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! This feels like a good place to look for additional optimizations in reducing evals too!

@schmichael
Copy link
Member Author

@tgross I was hoping to just get a PR up, but an issue is probably a good idea: #12981

I'm tempted to base further work off this PR and get them all merged in a group (hence the Draft status)... but that might just be complicating things for no reason. Let me know if you have a preference for how to handle a batch of small eval improvements.

@tgross
Copy link
Member

tgross commented May 13, 2022

I'm tempted to base further work off this PR and get them all merged in a group (hence the Draft status)... but that might just be complicating things for no reason. Let me know if you have a preference for how to handle a batch of small eval improvements.

Yeah sorry I may have jumped the gun in reviewing here knowing that you've got an active investigation underway. But batching them up seems fine. Ideally each fix is a discrete commit within the PR to help reviewability?

I'll dismiss my review for now and re-review as needed.

@tgross tgross self-requested a review May 13, 2022 12:43
Whenever a node joins the cluster, either for the first time or after
being `down`, we emit a evaluation for every system job to ensure all
applicable system jobs are running on the node.

This patch adds an optimization to skip creating evaluations for system
jobs not in the current node's DC. While the scheduler performs the same
feasability check, skipping the creation of the evaluation altogether
saves disk, network, and memory.
@schmichael schmichael marked this pull request as ready for review July 6, 2022 17:36
@schmichael schmichael added backport/1.2.x backport to 1.1.x release line backport/1.3.x backport to 1.3.x release line backport/1.1.x backport to 1.1.x release line and removed backport/1.1.x backport to 1.1.x release line backport/1.2.x backport to 1.1.x release line labels Jul 6, 2022
@schmichael schmichael added this to the 1.3.2 milestone Jul 6, 2022
@schmichael
Copy link
Member Author

schmichael commented Jul 6, 2022

Well I went off to focus on #10446 and never pushed harder on eval optimizations, so let's just get this one shipped. At least past me had the foresight to file an issue for the other idea. #12981 will make a nice bite sized optimization for somebody to sneak in someday.

Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@robloxrob
Copy link

Hooray! Christmas in July!

@github-actions
Copy link

github-actions bot commented Nov 4, 2022

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 4, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
backport/1.3.x backport to 1.3.x release line
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants