set killmode/killsignal in systemd example #4305

insanejudge · 2018-05-17T01:50:52Z

The default (control-group) kill mode in systemd will kill the associated executors, leading to a commonly seen behavior of nomad client restarts losing all current allocations.

setting the KillSignal to SIGINT seems to be a reasonable default, and allows things like leave_on_interrupt to function

The default (control-group) kill mode in systemd will kill the associated executors, leading to a commonly seen behavior of nomad client restarts losing all current allocations. setting the KillSignal to SIGINT seems to be a reasonable default, and allows things like leave_on_interrupt to function

dadgar · 2018-05-21T17:26:55Z

Thanks @insanejudge

onlyjob · 2018-08-14T18:54:03Z

I think KillMode=process can be very dangerous: in my case I need to ensure that there is no more than one particular job is running at any time (on any node) because it is an application that accesses a shared network folder assuming exclusive access (without locks). With KillMode=process when I stop nomad (while its jobs are continue to run) then other nomad nodes start another job somewhere else having more than one running at the same time. Without KillMode=process job is nicely terminated when nomad is stopped and remaining nomad nodes promptly restart the job on another node, as expected.

Certain apps just must not be started more than once with the same shared data foder (e.g. MySQL, MariaDB that cause data (binary log) corruption) to avoid race conditions. With KillMode=process is it possible to guarantee job exclusivity - i.e. that the job is not started without stopping currently running one first?

insanejudge · 2018-08-14T21:46:11Z

There are excellent facilities in nomad for draining nodes in a controlled way, and you should definitely be doing that instead of unceremoniously killing all of the executors - that is not 'nicely terminated' and you have no guarantees that the job is actually exited before rescheduling.

Having a nomad node drain -enable -self as part of an ExecStop= could be argued for a default, but this would require a corresponding node drain -disable -self in ExecStartPost= and start becoming prescriptive about how you're gating scheduling.

In short - with KillMode=process you have control over how scheduling is gated, without it you are requiring awkward termination of every job when restarting for any reason (e.g agent upgrade)

schmichael · 2018-08-14T22:40:28Z

@insanejudge did a good job of describing why KillMode=process is the desired behavior.

it is an application that accesses a shared network folder assuming exclusive access (without locks)

If you're also using Consul this is a perfect use case for distributed locks / leader election as there are a variety of circumstances in which Nomad could be running a single job more than once: https://www.consul.io/docs/guides/leader-election.html

In the future we're planning on adding the ability to automatically drain-on-shutdown which should also address your use case.

Also please open new issues instead of commenting on old closed pull requests.

Hope this helps! Thanks!

ahjohannessen · 2022-10-27T11:36:40Z

@schmichael Anything new with regards to automatically drain-on-shutdown for clients. Here I think about distros like Flatcar / Fedora CoreOS that automatically reboot. Perhaps until Nomad I could do as @insanejudge suggested:

Having a nomad node drain -enable -self as part of an ExecStop= could be argued for a default, but this would require a corresponding node drain -disable -self in ExecStartPost= and start becoming prescriptive about how you're gating scheduling.

github-actions · 2023-02-25T02:18:00Z

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

insanejudge added 2 commits January 16, 2018 22:22

Merge remote-tracking branch 'origin' into systemd-killmode-patch-1

9d9cd73

dadgar merged commit 56c5ff0 into hashicorp:master May 21, 2018

onlyjob mentioned this pull request Aug 14, 2018

systemd unit file kills executors when nomad agent is stopped #4302

Closed

onlyjob mentioned this pull request Aug 15, 2018

Need a way to cleanly shut down nodes #2052

Closed

github-actions bot locked as resolved and limited conversation to collaborators Feb 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

set killmode/killsignal in systemd example #4305

set killmode/killsignal in systemd example #4305

insanejudge commented May 17, 2018 •

edited by dadgar

Loading

dadgar commented May 21, 2018

onlyjob commented Aug 14, 2018

insanejudge commented Aug 14, 2018

schmichael commented Aug 14, 2018

ahjohannessen commented Oct 27, 2022

github-actions bot commented Feb 25, 2023

set killmode/killsignal in systemd example #4305

set killmode/killsignal in systemd example #4305

Conversation

insanejudge commented May 17, 2018 • edited by dadgar Loading

dadgar commented May 21, 2018

onlyjob commented Aug 14, 2018

insanejudge commented Aug 14, 2018

schmichael commented Aug 14, 2018

ahjohannessen commented Oct 27, 2022

github-actions bot commented Feb 25, 2023

insanejudge commented May 17, 2018 •

edited by dadgar

Loading