Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replacement allocations during canary deployments are placed in the wrong datacenter #17651

Closed
lgfa29 opened this issue Jun 21, 2023 · 2 comments
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/scheduling type/bug

Comments

@lgfa29
Copy link
Contributor

lgfa29 commented Jun 21, 2023

Nomad version

Nomad v1.5.6
BuildDate 2023-05-19T18:26:13Z
Revision 8af70885c02ab921dedbdf6bc406a1e886866f80

But it likely happens in previous versions as well.

Operating system and Environment details

N/A

Issue

When a job with canary deployments changes its datacenters and, during deployment, an allocation for the previous version fails, the replacement allocations are created in nodes for the new datacenters value instead of the original one.

Reproduction steps

  1. Start a Nomad cluster with clients in two datacenters. You start a nomad agent -dev and run the job below to create some extra clients.
Nomad clients jobfile
locals {
  # Adjust to the appropriate path.
  nomad_path = "/opt/hashicorp/nomad/1.5.6/nomad"

  client_config = <<EOF
data_dir   = "{{env "NOMAD_TASK_DIR"}}/data"
name       = "%s"
datacenter = "%s"

client {
  enabled = true

  server_join {
    retry_join = ["127.0.0.1"]
  }
}

server {
  enabled = false
}

ports {
  http = "46%d6"
  rpc  = "46%[3]d7"
  serf = "46%[3]d8"
}

plugin "raw_exec" {
  config {
    enabled = true
  }
}
EOF
}

job "nomad" {
  group "clients" {
    task "client-dc2" {
      driver = "raw_exec"

      config {
        command = local.nomad_path
        args    = ["agent", "-config", "local/config.hcl"]
      }

      template {
        data        = format(local.client_config, "client-dc2", "dc2", 5)
        destination = "local/config.hcl"
      }
    }

    task "client-dc3" {
      driver = "raw_exec"

      config {
        command = local.nomad_path
        args    = ["agent", "-config", "local/config.hcl"]
      }

      template {
        data        = format(local.client_config, "client-dc3", "dc3", 6)
        destination = "local/config.hcl"
      }
    }
  }
}
  1. Run sample job below. One of the allocations will keep failing.

  2. Update the job's datacenter.

    job "sleep" {
    - datacenters = ["dc2"]
    + datacenters = ["dc3"]
  3. Run job again and monitor its allocations

    $ watch -n1 nomad job allocs sleep
    ID        Node ID   Task Group  Version  Desired  Status   Created    Modified
    5db9c92b  74384522  sleep       0        run      pending  11s ago    11s ago
    191fcde0  74384522  sleep       1        run      running  58s ago    47s ago
    605c2ac1  a8201b46  sleep       0        run      running  1m16s ago  1m5s ago
    beb6b693  a8201b46  sleep       0        stop     failed   1m16s ago  11s ago
    

Expected Result

The replacement allocation is created in the same node and in the same datacenter as it was originally.

Actual Result

The replacement allocation is created in the new datacenter.

Job file (if appropriate)

job "sleep" {
  datacenters = ["dc2"]

  update {
    max_parallel = 1
    canary       = 1
    auto_revert  = true
  }

  group "sleep" {
    count = 2
    task "sleep" {
      driver = "raw_exec"

      config {
        command = "/bin/bash"
        args    = ["${NOMAD_TASK_DIR}/script.sh"]
      }

      template {
        data        = <<EOF
while true; do
  if [ "$NOMAD_ALLOC_INDEX" -gt "0" ]; then
    echo "Boom"
    exit 1
  fi
  sleep 3
done
EOF
        destination = "${NOMAD_TASK_DIR}/script.sh"
      }
    }
  }
}
@tgross
Copy link
Member

tgross commented Jun 23, 2023

@lgfa29 should this be closed by #17598, #17654, #17653, and #17652?

@lgfa29
Copy link
Contributor Author

lgfa29 commented Jun 23, 2023

Yes, sorry. I had a typo in FIxes 😅

@lgfa29 lgfa29 closed this as completed Jun 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/scheduling type/bug
Projects
None yet
Development

No branches or pull requests

2 participants