[question] What is the best way to run a job on all nodes while draining them? #9857

ghost · 2021-01-20T04:30:33Z

There is a job I would like to run on all machines from time to time; a simple maintenance script that will ensure the dependencies on my nodes are always up to date. To do this, I will first need to drain all nodes of all my production jobs.

While drained, I would like to run one instance of my maintenance job on every one of the nodes. When this is done, I will allow traffic back.

This question has two parts:

Is it possible to veto only a specific type of job when draining, so I can still schedule the maintenance job?
What is the best practice to run one instance of a parametrized job across all nodes?

NOTE: I believe a system job is not what I need here. I need to be able to run it on demand, not triggered by any events like the node becoming ready (e.g. a restart).

tgross · 2021-01-20T13:46:50Z

Is it possible to veto only a specific type of job when draining, so I can still schedule the maintenance job?

Normally there are two "knobs" you have: drain and eligibility. Eligibility is useful to toggle if you want to prevent scheduling of new tasks, but not drain the ones that are currently running. Whereas the case you have here is that you want to run tasks on a node that isn't otherwise eligible for scheduling. The scheduler isn't going to want to run workloads on ineligible nodes.

But what you're trying to do here might be possible with a little bit of cleverness, especially given the scenario you're trying to do. You could give jobs a constraint for a node's meta data, and change the meta as part of the update procedure you're running. So something like:

drain the node
update meta { ready_to_use = "0" }
restart the client
disable drain on the node
run the update job
update meta { ready_to_use = "0" }
restart the client

What is the best practice to run one instance of a parametrized job across all nodes?

There's a "system batch" scheduler type that's being worked on in #9160, likely to ship in Nomad 1.1. In the meantime you might be able to workaround that with a batch job that has a count == the number nodes, and the distinct_hosts field.

ghost · 2021-01-20T21:29:33Z

Thank you, that makes sense.

I’m also wondering if there is a way to do this by shifting part of the administrative responsibilities to the node itself. In that case,

the administrative job could be run with highest priority,
this job would read the allocation, infer the node ID and mark "itself" as ineligible,
wait until all jobs in progress are finished,
complete the maintenance and
mark itself as eligible again.

If my understanding is correct, this requires that each node have rather broad management permissions on the cluster, but leaving that aside for a second, does this sound like a plausible scenario?

tgross · 2021-01-21T13:34:29Z

If my understanding is correct, this requires that each node have rather broad management permissions on the cluster

You can scope this down a bit by giving the administrative job a Nomad ACL token that has only read-job, list-jobs, and node:write (sourcing the ACL token from a secrets store like Vault or whatever you're using). Those permissions aren't much worse than the information the node itself already has access to.

does this sound like a plausible scenario?

That could definitely work! You're relying on the notion that the jobs you care about will all finish, which is only going to be the case with batch workloads. But if that's the case then you're all set and don't need to worry about draining.

ghost · 2021-01-21T17:58:16Z

Thank you very much. Once again, very insightful!

github-actions · 2022-10-25T02:44:16Z

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

tgross added the type/question label Jan 20, 2021

tgross added stage/waiting-reply theme/batch Issues related to batch jobs and scheduling theme/scheduling theme/system-scheduler labels Jan 20, 2021

ghost closed this as completed Jan 21, 2021

github-actions bot locked as resolved and limited conversation to collaborators Oct 25, 2022

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[question] What is the best way to run a job on all nodes while draining them? #9857

[question] What is the best way to run a job on all nodes while draining them? #9857

ghost commented Jan 20, 2021 •

edited by ghost

Loading

tgross commented Jan 20, 2021

ghost commented Jan 20, 2021

tgross commented Jan 21, 2021

ghost commented Jan 21, 2021

github-actions bot commented Oct 25, 2022

[question] What is the best way to run a job on all nodes while draining them? #9857

[question] What is the best way to run a job on all nodes while draining them? #9857

Comments

ghost commented Jan 20, 2021 • edited by ghost Loading

tgross commented Jan 20, 2021

ghost commented Jan 20, 2021

tgross commented Jan 21, 2021

ghost commented Jan 21, 2021

github-actions bot commented Oct 25, 2022

ghost commented Jan 20, 2021 •

edited by ghost

Loading