Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nomad recreates service check when updating tag causing short service outage #4972

Closed
atillamas opened this issue Dec 7, 2018 · 3 comments
Closed

Comments

@atillamas
Copy link

atillamas commented Dec 7, 2018

Nomad version

Nomad 0.8.6
Consul 1.4.0

Issue

When updating tags in service definition nomad removes the service from consul and adds it again causing a second or two of outage for the service when using a service aware loadbalancer in front of the service like traefik or fabio.

Reproduction steps

Launch a job and then update any tag and submit the job, watch consul remove the service and readd it again.
Output from nomad plan:

+/- Job: "http-echo"
+/- Task Group: "service" (3 in-place update)
  +/- Task: "http-echo" (forces in-place update)
    +/- Service {
        AddressMode: "auto"
        Name:        "http-echo"
        PortLabel:   "echo"
      - Tags {
          Tags: "http-echo"
        - Tags: "test"
        }
        }

Consul Client logs (if appropriate)

2018/12/07 16:57:23 [INFO] agent: Synced service "_nomad-task-xuzyvb67mvf6q5leq3yi5btumkotgox5"
2018/12/07 16:57:23 [INFO] agent: Synced service "_nomad-client-zry333abvkqsm7t2rkqisqad6repvom2"
2018/12/07 16:57:23 [INFO] agent: Synced service "_nomad-task-fs5wop7j37rrv4knqqc7xjbgi53iwi4t"
2018/12/07 16:58:56 [INFO] serf: EventMemberJoin: i-0c523360225ea4eff 10.0.60.42
2018/12/07 16:59:16 [INFO] agent: Synced service "_nomad-task-fs5wop7j37rrv4knqqc7xjbgi53iwi4t"
2018/12/07 16:59:16 [INFO] agent: Synced service "_nomad-task-xuzyvb67mvf6q5leq3yi5btumkotgox5"
2018/12/07 16:59:16 [INFO] agent: Synced service "_nomad-client-zry333abvkqsm7t2rkqisqad6repvom2"
2018/12/07 17:00:18 [INFO] agent: Synced service "_nomad-client-zry333abvkqsm7t2rkqisqad6repvom2"
2018/12/07 17:00:18 [INFO] agent: Synced service "_nomad-task-fs5wop7j37rrv4knqqc7xjbgi53iwi4t"
2018/12/07 17:00:18 [INFO] agent: Synced service "_nomad-task-xuzyvb67mvf6q5leq3yi5btumkotgox5"
2018/12/07 17:01:56 [INFO] agent: Deregistered service "_nomad-task-xuzyvb67mvf6q5leq3yi5btumkotgox5"
2018/12/07 17:01:56 [INFO] agent: Deregistered check "9393ef1fe371f81d2bbbd07c6ba250fec5ea120a"
2018/12/07 17:01:56 [INFO] agent: Synced service "_nomad-task-ybnuuldm4tfboct2n4osc56timxbfzys"
2018/12/07 17:01:56 [INFO] agent: Synced check "e39ea8ff2fe1cd62be832bb15ffeef27f774f655"
2018/12/07 17:01:57 [INFO] serf: EventMemberLeave: i-052ba8d86858b2ebf 10.0.60.179
2018/12/07 17:01:58 [INFO] agent: Synced check "e39ea8ff2fe1cd62be832bb15ffeef27f774f655"

Job file (if appropriate)

job "http-echo" {
    datacenters = ["eu-west-1"]
    group "service" {
        count = 3
    constraint {
        operator = "distinct_hosts"
        value = "true"
    }
        update {
                max_parallel = 1
                min_healthy_time = "10s"
                healthy_deadline = "5m"
        }
        task "http-echo" {
            driver = "docker"
            config {
                image = "hashicorp/http-echo"
                port_map {
                        echo = 5678
                }
                args  = ["-text", " version0: ${node.unique.name}"]
            }
            env {
        	TEST   = "something"
            }
            service {
                name = "http-echo"
		tags = [ 
			"http-echo",
			"test",
		]

                port = "echo"
                check {
                    type     = "http"
                    path     = "/"
                    interval = "5s"
                    timeout  = "2s"
                    port     = "echo"
                }
            }
            resources {
                cpu = 100
                network {
                    port "echo" { }
                }
            }
        }
    }
}
@atillamas atillamas changed the title Nomad recreates service check when updating tag causing outage Nomad recreates service check when updating tag causing short service outage Dec 7, 2018
@dadgar
Copy link
Contributor

dadgar commented Dec 7, 2018

Thanks for reporting @atillamas!

This is known and we are hoping to have a fix early in the 0.9.X cycle. I am going to close this in favor of #4566

@dadgar dadgar closed this as completed Dec 7, 2018
@atillamas
Copy link
Author

@dadgar thank for the information. Glad to know it's beeing worked on.
Until then it seems possible to circumevent the problem with updating something that recreates the allocation. Like a dummy ENV. I cannot see any outage when updating the tag and a new/dummy env at the same time.

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 27, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants