Downtime when using canary deployments, but none when using rolling deploys #4672

mathematician · 2018-09-12T21:04:57Z

Nomad version

Output from nomad version
nomad v0.8.4 (dbee1d7)

Operating system and Environment details

Ubuntu 16.04

Issue

We use Kong as our gateway to access our applications running on nomad. This is done by referencing the consulDNS SRV record for a service. We receive downtime when using a canary deployment with the following config:

update {
    stagger           = "10s"
    max_parallel      = 1
    canary            = 2
    min_healthy_time  = "30s"
    healthy_deadline  = "1m"
    progress_deadline = "2m"
    auto_revert       = true
}

The downtime happens when we try to promote the deployment. Although when we use a normal stagger update without canary deployment we receive zero downtime even when going up to 500 rps.

Also before we promote we receive both versions which we believed only one version would be accessible at a time from consul.

This seems somewhat related to this github issue: #4566

Reproduction steps

Run http application on nomad
Execute command to make connections to app watch -n 0.1 curl application-url
Run new version of application with canary deployment strategy
Once healthy initiate promotion and receive name resolution issues

Job file (if appropriate)

Job file used for canary deployment:

job "playground" {
  region      = "us-east-1"
  datacenters = ["us-east-1a", "us-east-1b", "us-east-1c"]
  type        = "service"

  update {
    stagger           = "10s"
    max_parallel      = 1
    canary            = 2
    min_healthy_time  = "30s"
    healthy_deadline  = "1m"
    progress_deadline = "2m"
    auto_revert       = true
  }

  group "playground" {
    count = 2

    restart {
      attempts = 10
      interval = "5m"
      delay    = "25s"
      mode     = "delay"
    }

    task "playground" {
      driver         = "docker"

      config {
        image = "playground:d15f57e"

        port_map {
          http = 3000
        }

      resources {
        cpu    = 500
        memory = 512

        network {
          mbits = 10
          port  "http"{}
        }
      }

      service {
        name = "playground"
        tags = ["us-east-1", "playground", "web"]
        port = "http"

        check {
          type     = "http"
          interval = "10s"
          timeout  = "2s"
          path     = "/"
        }
      }
    }
  }
}

The text was updated successfully, but these errors were encountered:

dadgar · 2018-09-12T21:06:58Z

@mathematician I am going to close since it appears to be the same issue as the one you linked. You will have both versions in Consul. You can differentiate them using the canary_tags.

Thanks for the extra detail, it will be helpful as we fix the feature 👍

github-actions · 2022-11-28T02:20:02Z

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

dadgar closed this as completed Sep 12, 2018

github-actions bot locked as resolved and limited conversation to collaborators Nov 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Downtime when using canary deployments, but none when using rolling deploys #4672

Downtime when using canary deployments, but none when using rolling deploys #4672

mathematician commented Sep 12, 2018

dadgar commented Sep 12, 2018

github-actions bot commented Nov 28, 2022

Downtime when using canary deployments, but none when using rolling deploys #4672

Downtime when using canary deployments, but none when using rolling deploys #4672

Comments

mathematician commented Sep 12, 2018

Nomad version

Operating system and Environment details

Issue

Reproduction steps

Job file (if appropriate)

dadgar commented Sep 12, 2018

github-actions bot commented Nov 28, 2022