Template: Allow rolling restart for restart change_mode. #2202

cyrilgdn · 2017-01-16T17:42:21Z

Nomad version

Nomad v0.5.2

Issue

We started using template stenza with consul.
It works well but the restart change_mode causes (a big) downtime even with multiple instances of the task.

Especially with the docker driver which removes the docker image and download it again during the restart (seems to be linked to #1530).

It would be great if the restarts can be done (optionally) more smoothly, ideally based on update strategy.

What do you think? Did I missed something ?

Thank you!

dadgar · 2017-01-17T18:07:13Z

@cyrilgdn Have you all set the splay to something larger than the default value? That can be used to avoid the thundering herd behavior you are experiencing!

cyrilgdn · 2017-01-18T13:04:38Z

@dadgar I've already try to use splay but with multiple instances, all the instances of the task will restart at the same time.

Here is my test Job:

job "template-test" {
    datacenters = ["dc1"]

    type = "batch"

    group "template-test" {
        count = 2
        task "template-test" {
            driver = "exec"

            config {
                command = "sh"
                args = ["-c", "sleep 5000; cat local/test.conf; exit 0"]
            }

            template {
                destination = "local/test.conf"
                data = "{{ key \"configtest\" }}"
                splay = "2m"
            }
        }
    }
}

In this case, when the value change in consul , Nomad will wait for a random time (but less than 2 minutes) and restart both instances at the time.

akaspin · 2017-01-20T16:48:15Z

@cyrilgdn The issue is about rolling restart. This means restart one-by-one not random.

cyrilgdn · 2017-01-20T18:28:29Z

@akaspin Yes I know (given that I created the issue :)), that's what I'd like to do.

I only tried the splay option for an alternative way to avoid downtime.

akaspin · 2017-01-22T03:49:01Z

Ok. Finally I implement solution (https://hub.docker.com/r/akaspin/docker-backstab/). This designed for docker.

Design:

Backstab reacts on changes in provided consul "trigger" template.
Then trigger template changes backstab acquires lock in Consul and restarts managed container.
After restart backstab may wait some time or/and wait for managed container health check.
After wait backstab releases lock.

For now this implementation tested under CoreOS cluster with Nomad where I'm running Mesos.

dadgar · 2017-01-22T21:52:40Z

@cyrilgdn You didn't miss something I did 😄 The splay wasn't being applied correctly. Fixed and will be in 0.5.3

cyrilgdn · 2017-01-23T10:06:42Z

@dadgar Thanks for fixing the splay option!

But for me the question of this issue remains valid.

In my case, if I have 2 instances of the same task (count=2), I can set a big splay hopping both will not restart at the same time but this is not guaranteed.
It will be great to have a rolling restart option that guarantees, like job upgrade, that all instances will be restarted sequentially, to avoid any downtime.

dadgar · 2017-01-31T19:04:23Z

@cyrilgdn Unfortunately we will not be doing coordinated restarts because of template changes. If this is a requirement for you I would suggest building some external tooling to manage the restarts.

cyrilgdn · 2017-02-01T08:53:31Z

@dadgar Thanks for your answer!

Not be doing, like never?

Maybe I missed or misunderstood something (as we started to use Nomad few weeks ago and just tested template stenza).
Am I the only one with this kind of problematic? Does every other users have an external tool to avoid downtimes on configuration changes?

multani · 2017-02-01T08:54:12Z

@dadgar Wouldn't it be possible to reuse the the restart {} stanza in case the template's change_mode is set to restart?

It's probably possible to do something with external tooling, but restart in its current form is very basic compare to what's provided at the group level with the restart {} stanza for example. I would even dare to say that it's actually too limited and will confuse users as:

if change_mode = "restart" and splay is too low:
- a task could be restarted as another task is already starting
- all tasks could be restarted at the same time
- in all cases, this will most probably cause down time
to mitigate this, the only way would be to increase the value of splay to be high enough so that the probability that all the tasks are restarting at time T diminishes, but:
- that makes the template less attractive as the time it will take for the task to pick up the new configuration may be as high as this (higher) value of splay
- it mitigates the problem we can have with a lower splay value, but the problem is still here actually, and we may still end up by lack of luck in a situation where all the tasks are being restarted at the same time.

All in all, it seems that this defeats the purpose of having the template stanza in the first place, if the solution to actually control what's happening is not to use it and to package Consul Template or some other external tooling inside the task itself.
Especially since Nomad already supports coordinated restarts, which will get even better in the future if I believe correctly.

dadgar · 2017-02-01T18:22:38Z

In its current form we can not support that type of behavior. The plugin is executing locally on every client and there is no coordination. I have referenced a GitHub issue on consul-template that could solve this.

If consul-template supported locking then we could limit parallelism.

github-actions · 2022-12-16T02:12:26Z

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

dadgar added stage/waiting-reply fixed-waiting-confirmation and removed stage/waiting-reply labels Jan 17, 2017

dadgar mentioned this issue Jan 22, 2017

Actually randomize the splay in the template #2227

Merged

dadgar closed this as completed in #2227 Jan 22, 2017

dadgar mentioned this issue Feb 1, 2017

Build in support for "consul lock" hashicorp/consul-template#224

Closed

alexesDev mentioned this issue Sep 16, 2019

[question] Why rolling update dont work with template stanza? #6329

Closed

Oloremo mentioned this issue Jul 28, 2021

Template: Allow rolling restart for restart change_mode #10957

Closed

github-actions bot locked as resolved and limited conversation to collaborators Dec 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Template: Allow rolling restart for restart change_mode. #2202

Template: Allow rolling restart for restart change_mode. #2202

cyrilgdn commented Jan 16, 2017 •

edited

Loading

dadgar commented Jan 17, 2017

cyrilgdn commented Jan 18, 2017

akaspin commented Jan 20, 2017

cyrilgdn commented Jan 20, 2017

akaspin commented Jan 22, 2017

dadgar commented Jan 22, 2017

cyrilgdn commented Jan 23, 2017

dadgar commented Jan 31, 2017

cyrilgdn commented Feb 1, 2017 •

edited

Loading

multani commented Feb 1, 2017

dadgar commented Feb 1, 2017

github-actions bot commented Dec 16, 2022

Template: Allow rolling restart for restart change_mode. #2202

Template: Allow rolling restart for restart change_mode. #2202

Comments

cyrilgdn commented Jan 16, 2017 • edited Loading

Nomad version

Issue

dadgar commented Jan 17, 2017

cyrilgdn commented Jan 18, 2017

akaspin commented Jan 20, 2017

cyrilgdn commented Jan 20, 2017

akaspin commented Jan 22, 2017

dadgar commented Jan 22, 2017

cyrilgdn commented Jan 23, 2017

dadgar commented Jan 31, 2017

cyrilgdn commented Feb 1, 2017 • edited Loading

multani commented Feb 1, 2017

dadgar commented Feb 1, 2017

github-actions bot commented Dec 16, 2022

cyrilgdn commented Jan 16, 2017 •

edited

Loading

cyrilgdn commented Feb 1, 2017 •

edited

Loading