Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Network.dns variable interpolation does not actually work #12780

Closed
ghost opened this issue Apr 26, 2022 · 9 comments · Fixed by #12817
Closed

Network.dns variable interpolation does not actually work #12780

ghost opened this issue Apr 26, 2022 · 9 comments · Fixed by #12817
Assignees
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/hcl type/bug
Milestone

Comments

@ghost
Copy link

ghost commented Apr 26, 2022

Nomad version

Output from nomad version
Nomad v1.3.0-beta.1 (2eba643)

Operating system and Environment details

Linux (Tested on Debian Bullseye on ChromeOS and also Amazon Linux 2 in AWS)

Issue

#12021 does not appear to actually work. Looking at the PR it appears that the interpolated DNS values are never used
EDIT: network dns variable interpolation works when placed within the config stanza, but not within the group.network stanza

Reproduction steps

nomad job run example.nomad (the modified version below)
exec into the container
cat /etc/resolv.conf
No variable interpolation takes place, the variable is just interpretted literally

Expected Result

# cat /etc/resolv.conf
search lxd
nameserver 100.115.92.202 # This should be a local private ip that nomad is bound to.

Actual Result

# cat /etc/resolv.conf
search lxd
nameserver ${attr.unique.network.ip-address}

Instead I just get the variable interpreted as a literal.

Job file (if appropriate)

Using a slightly modified version of example.nomad, the differences being
network.dns.servers = ["${attr.unique.network.ip-address}"]
and network_mode = "bridge"

locals {
  dns_servers = ["${attr.unique.network.ip-address}"]
}
# There can only be a single job definition per file. This job is named
# "example" so it will create a job with the ID and Name "example".

# The "job" stanza is the top-most configuration option in the job
# specification. A job is a declarative specification of tasks that Nomad
# should run. Jobs have a globally unique name, one or many task groups, which
# are themselves collections of one or many tasks.
#
# For more information and examples on the "job" stanza, please see
# the online documentation at:
#
#     https://www.nomadproject.io/docs/job-specification/job
#
job "example" {
  # The "region" parameter specifies the region in which to execute the job.
  # If omitted, this inherits the default region name of "global".
  # region = "global"
  #
  # The "datacenters" parameter specifies the list of datacenters which should
  # be considered when placing this task. This must be provided.
  datacenters = ["dc1"]

  # The "type" parameter controls the type of job, which impacts the scheduler's
  # decision on placement. This configuration is optional and defaults to
  # "service". For a full list of job types and their differences, please see
  # the online documentation.
  #
  # For more information, please see the online documentation at:
  #
  #     https://www.nomadproject.io/docs/schedulers
  #
  type = "service"

  # The "constraint" stanza defines additional constraints for placing this job,
  # in addition to any resource or driver constraints. This stanza may be placed
  # at the "job", "group", or "task" level, and supports variable interpolation.
  #
  # For more information and examples on the "constraint" stanza, please see
  # the online documentation at:
  #
  #     https://www.nomadproject.io/docs/job-specification/constraint
  #
  # constraint {
  #   attribute = "${attr.kernel.name}"
  #   value     = "linux"
  # }

  # The "update" stanza specifies the update strategy of task groups. The update
  # strategy is used to control things like rolling upgrades, canaries, and
  # blue/green deployments. If omitted, no update strategy is enforced. The
  # "update" stanza may be placed at the job or task group. When placed at the
  # job, it applies to all groups within the job. When placed at both the job and
  # group level, the stanzas are merged with the group's taking precedence.
  #
  # For more information and examples on the "update" stanza, please see
  # the online documentation at:
  #
  #     https://www.nomadproject.io/docs/job-specification/update
  #
  update {
    # The "max_parallel" parameter specifies the maximum number of updates to
    # perform in parallel. In this case, this specifies to update a single task
    # at a time.
    max_parallel = 1

    # The "min_healthy_time" parameter specifies the minimum time the allocation
    # must be in the healthy state before it is marked as healthy and unblocks
    # further allocations from being updated.
    min_healthy_time = "10s"

    # The "healthy_deadline" parameter specifies the deadline in which the
    # allocation must be marked as healthy after which the allocation is
    # automatically transitioned to unhealthy. Transitioning to unhealthy will
    # fail the deployment and potentially roll back the job if "auto_revert" is
    # set to true.
    healthy_deadline = "3m"

    # The "progress_deadline" parameter specifies the deadline in which an
    # allocation must be marked as healthy. The deadline begins when the first
    # allocation for the deployment is created and is reset whenever an allocation
    # as part of the deployment transitions to a healthy state. If no allocation
    # transitions to the healthy state before the progress deadline, the
    # deployment is marked as failed.
    progress_deadline = "10m"

    # The "auto_revert" parameter specifies if the job should auto-revert to the
    # last stable job on deployment failure. A job is marked as stable if all the
    # allocations as part of its deployment were marked healthy.
    auto_revert = false

    # The "canary" parameter specifies that changes to the job that would result
    # in destructive updates should create the specified number of canaries
    # without stopping any previous allocations. Once the operator determines the
    # canaries are healthy, they can be promoted which unblocks a rolling update
    # of the remaining allocations at a rate of "max_parallel".
    #
    # Further, setting "canary" equal to the count of the task group allows
    # blue/green deployments. When the job is updated, a full set of the new
    # version is deployed and upon promotion the old version is stopped.
    canary = 0
  }
  # The migrate stanza specifies the group's strategy for migrating off of
  # draining nodes. If omitted, a default migration strategy is applied.
  #
  # For more information on the "migrate" stanza, please see
  # the online documentation at:
  #
  #     https://www.nomadproject.io/docs/job-specification/migrate
  #
  migrate {
    # Specifies the number of task groups that can be migrated at the same
    # time. This number must be less than the total count for the group as
    # (count - max_parallel) will be left running during migrations.
    max_parallel = 1

    # Specifies the mechanism in which allocations health is determined. The
    # potential values are "checks" or "task_states".
    health_check = "checks"

    # Specifies the minimum time the allocation must be in the healthy state
    # before it is marked as healthy and unblocks further allocations from being
    # migrated. This is specified using a label suffix like "30s" or "15m".
    min_healthy_time = "10s"

    # Specifies the deadline in which the allocation must be marked as healthy
    # after which the allocation is automatically transitioned to unhealthy. This
    # is specified using a label suffix like "2m" or "1h".
    healthy_deadline = "5m"
  }
  # The "group" stanza defines a series of tasks that should be co-located on
  # the same Nomad client. Any task within a group will be placed on the same
  # client.
  #
  # For more information and examples on the "group" stanza, please see
  # the online documentation at:
  #
  #     https://www.nomadproject.io/docs/job-specification/group
  #
  group "cache" {
    # The "count" parameter specifies the number of the task groups that should
    # be running under this group. This value must be non-negative and defaults
    # to 1.
    count = 1

    # The "network" stanza specifies the network configuration for the allocation
    # including requesting port bindings.
    #
    # For more information and examples on the "network" stanza, please see
    # the online documentation at:
    #
    #     https://www.nomadproject.io/docs/job-specification/network
    #
    network {
      mode = "bridge"
      port "db" {
        to = 6379
      }
      dns {
        servers = local.dns_servers
      }
    }

    # The "service" stanza instructs Nomad to register this task as a service
    # in the service discovery engine, which is currently Consul. This will
    # make the service addressable after Nomad has placed it on a host and
    # port.
    #
    # For more information and examples on the "service" stanza, please see
    # the online documentation at:
    #
    #     https://www.nomadproject.io/docs/job-specification/service
    #
    service {
      name = "redis-cache"
      tags = ["global", "cache"]
      port = "db"

      # The "check" stanza instructs Nomad to create a Consul health check for
      # this service. A sample check is provided here for your convenience;
      # uncomment it to enable it. The "check" stanza is documented in the
      # "service" stanza documentation.

      # check {
      #   name     = "alive"
      #   type     = "tcp"
      #   interval = "10s"
      #   timeout  = "2s"
      # }

    }

    # The "restart" stanza configures a group's behavior on task failure. If
    # left unspecified, a default restart policy is used based on the job type.
    #
    # For more information and examples on the "restart" stanza, please see
    # the online documentation at:
    #
    #     https://www.nomadproject.io/docs/job-specification/restart
    #
    restart {
      # The number of attempts to run the job within the specified interval.
      attempts = 2
      interval = "30m"

      # The "delay" parameter specifies the duration to wait before restarting
      # a task after it has failed.
      delay = "15s"

      # The "mode" parameter controls what happens when a task has restarted
      # "attempts" times within the interval. "delay" mode delays the next
      # restart until the next interval. "fail" mode does not restart the task
      # if "attempts" has been hit within the interval.
      mode = "fail"
    }

    # The "ephemeral_disk" stanza instructs Nomad to utilize an ephemeral disk
    # instead of a hard disk requirement. Clients using this stanza should
    # not specify disk requirements in the resources stanza of the task. All
    # tasks in this group will share the same ephemeral disk.
    #
    # For more information and examples on the "ephemeral_disk" stanza, please
    # see the online documentation at:
    #
    #     https://www.nomadproject.io/docs/job-specification/ephemeral_disk
    #
    ephemeral_disk {
      # When sticky is true and the task group is updated, the scheduler
      # will prefer to place the updated allocation on the same node and
      # will migrate the data. This is useful for tasks that store data
      # that should persist across allocation updates.
      # sticky = true
      #
      # Setting migrate to true results in the allocation directory of a
      # sticky allocation directory to be migrated.
      # migrate = true
      #
      # The "size" parameter specifies the size in MB of shared ephemeral disk
      # between tasks in the group.
      size = 300
    }

    # The "affinity" stanza enables operators to express placement preferences
    # based on node attributes or metadata.
    #
    # For more information and examples on the "affinity" stanza, please
    # see the online documentation at:
    #
    #     https://www.nomadproject.io/docs/job-specification/affinity
    #
    # affinity {
    # attribute specifies the name of a node attribute or metadata
    # attribute = "${node.datacenter}"


    # value specifies the desired attribute value. In this example Nomad
    # will prefer placement in the "us-west1" datacenter.
    # value  = "us-west1"


    # weight can be used to indicate relative preference
    # when the job has more than one affinity. It defaults to 50 if not set.
    # weight = 100
    #  }


    # The "spread" stanza allows operators to increase the failure tolerance of
    # their applications by specifying a node attribute that allocations
    # should be spread over.
    #
    # For more information and examples on the "spread" stanza, please
    # see the online documentation at:
    #
    #     https://www.nomadproject.io/docs/job-specification/spread
    #
    # spread {
    # attribute specifies the name of a node attribute or metadata
    # attribute = "${node.datacenter}"


    # targets can be used to define desired percentages of allocations
    # for each targeted attribute value.
    #
    #   target "us-east1" {
    #     percent = 60
    #   }
    #   target "us-west1" {
    #     percent = 40
    #   }
    #  }

    # The "task" stanza creates an individual unit of work, such as a Docker
    # container, web application, or batch processing.
    #
    # For more information and examples on the "task" stanza, please see
    # the online documentation at:
    #
    #     https://www.nomadproject.io/docs/job-specification/task
    #
    task "redis" {
      # The "driver" parameter specifies the task driver that should be used to
      # run the task.
      driver = "docker"

      # The "config" stanza specifies the driver configuration, which is passed
      # directly to the driver to start the task. The details of configurations
      # are specific to each driver, so please see specific driver
      # documentation for more information.
      config {
        image = "redis:3.2"

        ports = ["db"]
      }

      # The "artifact" stanza instructs Nomad to download an artifact from a
      # remote source prior to starting the task. This provides a convenient
      # mechanism for downloading configuration files or data needed to run the
      # task. It is possible to specify the "artifact" stanza multiple times to
      # download multiple artifacts.
      #
      # For more information and examples on the "artifact" stanza, please see
      # the online documentation at:
      #
      #     https://www.nomadproject.io/docs/job-specification/artifact
      #
      # artifact {
      #   source = "http://foo.com/artifact.tar.gz"
      #   options {
      #     checksum = "md5:c4aa853ad2215426eb7d70a21922e794"
      #   }
      # }


      # The "logs" stanza instructs the Nomad client on how many log files and
      # the maximum size of those logs files to retain. Logging is enabled by
      # default, but the "logs" stanza allows for finer-grained control over
      # the log rotation and storage configuration.
      #
      # For more information and examples on the "logs" stanza, please see
      # the online documentation at:
      #
      #     https://www.nomadproject.io/docs/job-specification/logs
      #
      # logs {
      #   max_files     = 10
      #   max_file_size = 15
      # }

      # The "resources" stanza describes the requirements a task needs to
      # execute. Resource requirements include memory, cpu, and more.
      # This ensures the task will execute on a machine that contains enough
      # resource capacity.
      #
      # For more information and examples on the "resources" stanza, please see
      # the online documentation at:
      #
      #     https://www.nomadproject.io/docs/job-specification/resources
      #
      resources {
        cpu    = 500 # 500 MHz
        memory = 256 # 256MB
      }


      # The "template" stanza instructs Nomad to manage a template, such as
      # a configuration file or script. This template can optionally pull data
      # from Consul or Vault to populate runtime configuration data.
      #
      # For more information and examples on the "template" stanza, please see
      # the online documentation at:
      #
      #     https://www.nomadproject.io/docs/job-specification/template
      #
      # template {
      #   data          = "---\nkey: {{ key \"service/my-key\" }}"
      #   destination   = "local/file.yml"
      #   change_mode   = "signal"
      #   change_signal = "SIGHUP"
      # }

      # The "template" stanza can also be used to create environment variables
      # for tasks that prefer those to config files. The task will be restarted
      # when data pulled from Consul or Vault changes.
      #
      # template {
      #   data        = "KEY={{ key \"service/my-key\" }}"
      #   destination = "local/file.env"
      #   env         = true
      # }

      # The "vault" stanza instructs the Nomad client to acquire a token from
      # a HashiCorp Vault server. The Nomad servers must be configured and
      # authorized to communicate with Vault. By default, Nomad will inject
      # The token into the job via an environment variable and make the token
      # available to the "template" stanza. The Nomad client handles the renewal
      # and revocation of the Vault token.
      #
      # For more information and examples on the "vault" stanza, please see
      # the online documentation at:
      #
      #     https://www.nomadproject.io/docs/job-specification/vault
      #
      # vault {
      #   policies      = ["cdn", "frontend"]
      #   change_mode   = "signal"
      #   change_signal = "SIGHUP"
      # }

      # Controls the timeout between signalling a task it will be killed
      # and killing the task. If not set a default is used.
      # kill_timeout = "20s"
    }
  }
}

Nomad Server logs (if appropriate)

Nomad Client logs (if appropriate)

@ghost ghost added the type/bug label Apr 26, 2022
@shoenig
Copy link
Member

shoenig commented Apr 26, 2022

@twunderlich-grapl can you try running the job but without using the locals indirection?

e.g.

dns {
  servers = "[${attr.unique.network.ip-address}]"
}

@ghost
Copy link
Author

ghost commented Apr 26, 2022

Hi @shoenig,
I just tried that, no change.

Looking through the code, #12021 updates interpolateNetworks. However, that's only called in the network_hook preRun function. Looking through that code, all of it seems specific to the hostname. I suspect the issue may be that interpolateNetworks may need to be called elsewhere.

That said, I haven't dug through the code in depth yet

@shoenig
Copy link
Member

shoenig commented Apr 26, 2022

Hmm alright thanks for trying @twunderlich-grapl ... I was hoping it was just an HCL parsing thing. Looks like we'll need to investigate further 🙂

@ghost
Copy link
Author

ghost commented Apr 28, 2022

@shoenig,
I've been able to confirm that the interpolatedNetworks function is ok, the issue is that the interpolated dns values aren't actually used.

As a quick and dirty test I used the interpolateNetworks function elsewhere in the code (See #12817), and running that dns.servers = ["${attr.unique.network.ip-address}"] is interpolated correctly and I get the expected private IP address.

Since what I did is clearly not production-code, if someone can weigh in on where the best place to update the code would actually be, I can take this across the finish line

@shantanugadgil
Copy link
Contributor

I was trying to "variable-ize" the docker bridge IP and happened to struggle with this. The following minimal job file shows that the variable interpolation works in the docker's config section, but not in the group network section.

The /etc/resolv.conf inside the container is set appropriately.

job "example" {
  datacenters = ["dc1"]

  group "cache" {
    network {
      # use either the "dns" block OR the "dns_servers". DO NOT use both

#      dns {
#        servers = ["${attr.driver.docker.bridge_ip}"]   # this DOES NOT work
#      }

      port "db" {
        to = 6379
      } 
    } 

    task "redis" {
      driver = "docker"

      config {
        image = "redis:3.2"

        ports = ["db"]

#       dns_servers = ["${attr.driver.docker.bridge_ip}"]  # this works
      } 

      resources {
        cpu    = 500
        memory = 256
      } 
    } 
  } 
} 

@ghost
Copy link
Author

ghost commented May 17, 2022

@shantanugadgil thanks for the find! For anyone who doesn't require bridge mode, its certainly a functional workaround.
@shoenig in light of this, it feels like there are two paths.

  1. Update the docs to clarify that interpolation works ONLY in the config section. Currently the docs state that the group -> network stanza should support variable interpolation and are incorrect.
  2. Merge in a fix along the lines of Fix network.dns interpolation #12817 that actually adds variable interpolation for the group -> network stanza. Since the dns_servers option doesn't work for tasks in bridge mode, I suspect that this is still the way to go.
    2.5 Do both?

If you think option 1 is the way to go, I can put up a PR updating the docs pretty quickly

edited to add the notes about bridge mode being a requirement for dns_servers

@shoenig
Copy link
Member

shoenig commented May 17, 2022

My apologies for the delay @twunderlich-grapl, you got us right on the 1.3 release and this fell off my radar. I'll take a look at your PR today.

@ghost
Copy link
Author

ghost commented May 17, 2022

No worries, thanks for getting this merged in!

@shoenig shoenig added stage/accepted Confirmed, and intend to work on. No timeline committment though. and removed stage/waiting-reply stage/needs-investigation labels May 17, 2022
@github-actions
Copy link

github-actions bot commented Oct 8, 2022

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 8, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/hcl type/bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants