Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic when referencing GCE instance external IP in HCL2 mode #10316

Closed
danmrichards opened this issue Apr 7, 2021 · 4 comments · Fixed by #10326
Closed

Panic when referencing GCE instance external IP in HCL2 mode #10316

danmrichards opened this issue Apr 7, 2021 · 4 comments · Fixed by #10326
Assignees
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/hcl type/bug

Comments

@danmrichards
Copy link

Nomad version

Nomad v1.0.3 (08741d9f2003ec26e44c72a2c0e27cdf0eadb6ee

Operating system and Environment details

Linux nomad-server-dev-us-east1-zgc5 5.4.0-1036-gcp #39-Ubuntu SMP Thu Jan 14 18:41:17 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Issue

I am seeing Nomad panic when attempting to reference the external IP address of a GCE instance operating as a Nomad client. The output of nomad node status -self -verbose on the client shows the existence of an attribute called unique.platform.gce.network.dev-us-east1-relay-vpc.external-ip.0 which is the external IP address of the instance.

When run in the default HCL2 mode, we see a panic from nomad job validate, nomad job plan and nomad job run. If we add the -hcl1 flag then we do not see panics and the job runs as expected, the attribute is correctly parsed and passed to the container.

A similar panic has been seen on another issue (albeit referencing env usage) - #9956

Reproduction steps

  1. Use the job spec below
  2. nomad job run foo.nomad
  3. See panic

Expected Result

I expect the attribute to be parsed correctly, the job spec should validate and then be planned and run.

Actual Result

Nomad panics if we do not add the -hcl1 flag.

Job file (if appropriate)

job "http-echo" {
  datacenters = ["us-east1-b","us-east1-c","us-east1-d"]

  group "echo" {
    task "server" {
      driver = "docker"

      config {
        image = "hashicorp/http-echo:latest"
        args = [
          "-listen", "${attr.unique.platform.gce.network.dev-us-east1-relay-vpc.external-ip.0}:3000",
        ]
      }
    }
  }
}

Note the network name here (dev-us-east1-relay-vpc) is completely bespoke to our infrastructure, so would need to be edited to be tested elsewhere.

@notnoop notnoop self-assigned this Apr 7, 2021
@notnoop notnoop added stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/hcl labels Apr 7, 2021
@danmrichards
Copy link
Author

@notnoop I see you've raised a PR to fix the panic, thank you for that :)

In the meantime can you help me to understand why this works for HCL1 but not HCL2? Your PR notes reference an undefined variable, but to me it seems that the variable is defined. It's present in the node status output and works with the HCL1 parser.

Should I be using a different format or reference to get this value to work with HCL2?

@notnoop
Copy link
Contributor

notnoop commented Apr 8, 2021

Hi @danmrichards ! Thank you for reporting the bug, I'm glad I could help.

I'm sorry for the confusion - the format you specified is good. Undefined variables is a misnomer and a bit of technicality here, and refers to them being undefined at the HCLv2 parsing phase. The HCLv2 introduced a variables concept for templating purposes, with ability to set them in the command line (e.g. nomad job run -var env=staging job.hcl and accessible through ${var.env}). Unfortunately, the variable reference syntax here conflicts with normal env var and attribute interpolation that Nomad has long used; additionally, the environment and attribute info aren't known at parsing time on the CLI before it's even submitted to the server. As such, I coined these "undefined variables" that the HCLv2 parser should leave intact for further processing by the scheduler/client at runtime. Hope that clarifies it.

@danmrichards
Copy link
Author

Thanks for the explanation, that makes total sense. As you say, the variable I'm using here is referencing the external IP of the client node where the job is scheduled, so the server should leave it intact.

We're working around the issue by falling back to -hcl1 for now. Thanks for your help, I look forward to the patch making it into a release soon 🤞

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 21, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/hcl type/bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants