Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluation broker control API #11638

Closed
tgross opened this issue Dec 7, 2021 · 3 comments
Closed

Evaluation broker control API #11638

tgross opened this issue Dec 7, 2021 · 3 comments

Comments

@tgross
Copy link
Member

tgross commented Dec 7, 2021

The Nomad engineering team would love to get some community feedback on a proposed set of features!

During incident response for Nomad or for Nomad workloads, operators may find that the scheduler can compound the ongoing incident by pushing forward with evaluations as best it can. We've recently been working on a bunch of improvements for incident response like bypassing shutdown_delay (#11448), making num_schedulers tunable via API (#11449), overriding evaluation priority on job registration (#11434), and an API to halt job registration (#11450).

Operators may want to have some sort of top-level "maintenance mode" where evaluation processing is paused or otherwise controlled while a problem is being debugged.

Eval Broker Pause/Resume

Evaluations are the unit of work for the Nomad scheduler. They are created...

  • When you register, dispatch, or scale a job.
  • When clients update the server with information about failed allocations.
  • When the scheduler can't place all allocations in a job and creates a "blocked" eval to try again later.
  • When the leader schedules a periodic job or garbage collection internal job.

The eval broker is the component on the leader that takes evaluations that have been written to raft and queues them for scheduling on one of the scheduler workers. Pausing the eval broker would prevent the scheduler queues from receiving new work and let the workers catch up so they can reduce CPU, memory, and disk resources. But note that this also includes any scheduling work necessary to reschedule failed allocations and periodic tasks! This isn't something to be done lightly, but as an emergency measure by operators.

For implementing Pause/Resume, there's an existing enabled flag which gets toggled whenever a server steps up/down from leadership (the eval broker only runs on the leader). We check this flag in a few places:

  • When Evals get upserted, they're enqueued from the FSM on the leader, that is when enabled = true (ref fsm.go#L746)
  • Likewise when an Eval is dequeued we check that enabled flag (ref eval_broker.go#L374-L377)
  • The leader has a eval restorer that enqueues all the pending evals (ref leader.go#L491-L517)
  • The leader has a eval reaper that dequeues dead eval that'll probably need to be turned off so that it doesn't generate new evals (ref leader.go#L781)

So if we write a new Scheduler Configuration entry to raft, then whenever we get that RPC or whenever a server assumes leadership, it can check that value and call SetEnabled only if the eval broker should be enabled.

Evaluation Purge/Delete

Another idea that's been considered is a nomad eval purge command, but this has a few sharp edges. If we flush the eval broker's queue, it'll immediately get filled back up again by the eval restorer on the leader. But if we delete all the pending/blocked evals from the state store, what happens to those that are in flight? So I think to do this safely we'd need to lock the eval broker (pausing it), find all the evals on the queue, delete them from the state store, and then flush the queue before unlocking it (unpausing it). That'll leave whatever evals are still in-flight in the scheduler, and any new evals/reblocks that come from the scheduler will block on Enqueue because we're holding the lock.

Another option (or perhaps in addition?) would be to only allow deleting a single eval with a command like nomad eval delete :eval_id. If we ensure that nil evaluations are handled safely, we could delete the eval from the state store and the eval broker's queue, and it would be a no-op for the scheduler. This would be useful in the case where a particular job's evals are a "poison pill" that generate high scheduler workloads (ex. a job with a very large jobspec or dispatch payload).

Evaluation Force/Priority

A third idea we've considered is the option to force an evaluation to be bumped in the priority queue so that it's being evaluated ahead of other evaluations. This could be a nomad eval priority command that updates the Priority field and raft indexes and forces it to be re-enqueued. A hypothetical nomad eval force command would be the same thing except less fine grained; it would simply update to the highest priority.

@pikeas
Copy link

pikeas commented Jun 21, 2022

Some eval control would be nice. I have an eval scheduled for the future:

$ nomad eval list|grep pending
ffb463ed  50        alloc-failure       <job>           pending   false

I've resolved the issue that caused this allocation to fail and would like a way to unblock re-deploying the job. Currently, there's no way to purge this pending eval, re-submitting the job is a no-op, setting count = 0 on the group is invalid, and stopping/starting the job via UI only restarts the other groups

@jrasell
Copy link
Member

jrasell commented Jul 6, 2022

Closing this issue as eval broker pause/un-pause as well as eval delete will ship in the next release.

@jrasell jrasell closed this as completed Jul 6, 2022
@lgfa29 lgfa29 modified the milestones: 1.3.x, 1.3.2 Aug 24, 2022
@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 22, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants