Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nomad 1.1.3 Issues with Handling Namespaces #11002

Closed
espey opened this issue Aug 4, 2021 · 7 comments · Fixed by #11010
Closed

Nomad 1.1.3 Issues with Handling Namespaces #11002

espey opened this issue Aug 4, 2021 · 7 comments · Fixed by #11010

Comments

@espey
Copy link

espey commented Aug 4, 2021

Nomad version

Nomad v.1.1.3+ent

Issue

We have a nomad job that has a defined namespace and other jobs that live in the default namespace. When we deployed the job that has the defined namespace it moved the job back to the default namespace and then we had 2 of the same jobs running. Something in the new version seems to be overriding the jobspec we have defined. We downgraded Nomad to v 1.1.2 and the issue went away and we were able to deploy our job and it was deployed into the correct namespace. We suspect that this change caused the issue: #10875.

Reproduction steps

Have a 1.1.3 nomad cluster running and deploy a job with a defined namespace.

Expected Result

Nomad job running properly in the defined namespace.

Actual Result

Nomad job running in the default and defined namespace.

Job file (if appropriate)

job "example-job" {
namespace = "example-job"

{etc. job def}
}

@schmichael
Copy link
Member

I was unable to reproduce the bug with either 1.1.3 OSS or Enterprise with the following jobspec:

job "example" {
  datacenters = ["dc1"]
  namespace = "foo"

  group "cache" {
    network {
      port "db" {
        to = 6379
      }
    }

    task "redis" {
      driver = "docker"

      config {
        image = "redis:3.2"

        ports = ["db"]
      }

      resources {
        cpu    = 500
        memory = 256
      }
    }
  }
}

I created the foo namespace with nomad namespace apply foo.

❓ Are you submitting the job via nomad run or the API?

nomad run explicitly sets the ?namespace=... query parameter which ends up taking precedence over NOMAD_NAMESPACE and -namespace=... via: https://github.com/hashicorp/nomad/blob/v1.1.3/command/job_run.go#L216-L219

In hindsight maybe one or two ways of specifying the namespace would have been better than 3. 😬

@RussellRollins
Copy link
Contributor

I also helped out with the failure this caused, so I can provide a bit more information:

❓ Are you submitting the job via nomad run or the API?

The API. Specifically, we have a Go chatbot that submits the job via EnforceRegister. That code looks something like this (simplified to remove excess implementation details):

spec, _ := jobspec.Parse(strings.NewReader(jobSpecRaw))
plan, _, _ := client.Jobs().Plan(spec, true, nil)
_, _, _ := client.Jobs().EnforceRegister(spec, plan.JobModifyIndex, nil)

At the time of the incident, the go.mod versions were:

	github.com/hashicorp/nomad v1.0.5
	github.com/hashicorp/nomad/api v0.0.0-20210504145400-61a3b73d44a6

Unfortunately, like you, I have not been able to reproduce the problem outside of the full system, so at least some of the elided details are significant, but I'm not sure which ones. We can share the source code for that chatbot with you privately and I can keep hacking on a more minimal reproduction of the issue.

@schmichael
Copy link
Member

So api.DefaultConfig() uses NOMAD_NAMESPACE if it is set. If config.Namespace is set then it is set in the query parameter.

The query parameter then takes precedence over whatever is in jobspec due to https://github.com/hashicorp/nomad/pull/10875/files#diff-56b3c82fcbc857f8fb93a903f1610f6e6859b3610a4eddf92bad9ea27fdc85ecR782

This follows the behavior of region as well but is definitely a backward compatibility issue! We should absolutely list it in the changelog and docs and do not! https://www.nomadproject.io/docs/upgrade/upgrade-specific

If this explanation makes sense I'll treat this as a documentation issue and get the changelog and upgrade guide fixed up ASAP.

@schmichael
Copy link
Member

I used this little program with the above job file to exercise the behavior. When NOMAD_NAMESPACE is set then that is the namespace the job and allocs are created in. Without the env var set the job's namespace is used.

https://gist.github.com/schmichael/e03fe40871edb64dacd6da9f7db4a152

@RussellRollins
Copy link
Contributor

Aha! The aforementioned chatbot is also running in Nomad and is in the default namespace.

https://www.nomadproject.io/docs/runtime/environment

And Nomad sets the NOMAD_NAMESPACE environment variable in the chatbot's container automatically. And since the chatbot is in default, so is NOMAD_NAMESPACE. And because of the precedence as you explained it, that environment variable ends up being preferred to the jobspec. I think that definitely explains the issue and why it only happens on 1.1.3.

I think this gives us a path forward, we will simply nil the Namespace out of DefaultConfig().

@schmichael
Copy link
Member

Thanks for the quick and thorough report! Sorry for the significant backward compat issue!

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 17, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants