Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downtime when using canary deployments, but none when using rolling deploys #4672

Closed
mathematician opened this issue Sep 12, 2018 · 2 comments

Comments

@mathematician
Copy link

Nomad version

Output from nomad version
nomad v0.8.4 (dbee1d7)

Operating system and Environment details

Ubuntu 16.04

Issue

We use Kong as our gateway to access our applications running on nomad. This is done by referencing the consulDNS SRV record for a service. We receive downtime when using a canary deployment with the following config:

update {
    stagger           = "10s"
    max_parallel      = 1
    canary            = 2
    min_healthy_time  = "30s"
    healthy_deadline  = "1m"
    progress_deadline = "2m"
    auto_revert       = true
}

The downtime happens when we try to promote the deployment. Although when we use a normal stagger update without canary deployment we receive zero downtime even when going up to 500 rps.

Also before we promote we receive both versions which we believed only one version would be accessible at a time from consul.

This seems somewhat related to this github issue: #4566

Reproduction steps

  1. Run http application on nomad
  2. Execute command to make connections to app watch -n 0.1 curl application-url
  3. Run new version of application with canary deployment strategy
  4. Once healthy initiate promotion and receive name resolution issues

Job file (if appropriate)

Job file used for canary deployment:

job "playground" {
  region      = "us-east-1"
  datacenters = ["us-east-1a", "us-east-1b", "us-east-1c"]
  type        = "service"

  update {
    stagger           = "10s"
    max_parallel      = 1
    canary            = 2
    min_healthy_time  = "30s"
    healthy_deadline  = "1m"
    progress_deadline = "2m"
    auto_revert       = true
  }

  group "playground" {
    count = 2

    restart {
      attempts = 10
      interval = "5m"
      delay    = "25s"
      mode     = "delay"
    }

    task "playground" {
      driver         = "docker"

      config {
        image = "playground:d15f57e"

        port_map {
          http = 3000
        }

      resources {
        cpu    = 500
        memory = 512

        network {
          mbits = 10
          port  "http"{}
        }
      }

      service {
        name = "playground"
        tags = ["us-east-1", "playground", "web"]
        port = "http"

        check {
          type     = "http"
          interval = "10s"
          timeout  = "2s"
          path     = "/"
        }
      }
    }
  }
}
@dadgar
Copy link
Contributor

dadgar commented Sep 12, 2018

@mathematician I am going to close since it appears to be the same issue as the one you linked. You will have both versions in Consul. You can differentiate them using the canary_tags.

Thanks for the extra detail, it will be helpful as we fix the feature 👍

@dadgar dadgar closed this as completed Sep 12, 2018
@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants