Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client: enable configuring enable_tag_override for services #7106

Merged
merged 1 commit into from
Feb 13, 2020

Conversation

shoenig
Copy link
Member

@shoenig shoenig commented Feb 10, 2020

Consul provides a feature of Service Definitions where the tags
associated with a service can be modified through the Catalog API,
overriding the value(s) configured in the agent's service configuration.

To enable this feature, the flag enable_tag_override must be configured
in the service definition.

Previously, Nomad did not allow configuring this flag, and thus the default
value of false was used. Now, it is configurable.

Because Nomad itself acts as a state machine around the the service definitions
of the tasks it manages, it's worth describing what happens when this feature
is enabled and why.

Consider the basic case where there is no Nomad, and your service is provided
to consul as a boring JSON file. The ultimate source of truth for the definition
of that service is the file, and is stored in the agent. Later, Consul performs
"anti-entropy" which synchronizes the Catalog (stored only the leaders). Then
with enable_tag_override=true, the tags field is available for "external"
modification through the Catalog API (rather than directly configuring the
service definition file, or using the Agent API). The important observation
is that if the service definition ever changes (i.e. the file is changed &
config reloaded OR the Agent API is used to modify the service), those
"external" tag values are thrown away, and the new service definition is
once again the source of truth.

In the Nomad case, Nomad itself is the source of truth over the Agent in
the same way the JSON file was the source of truth in the example above.
That means any time Nomad sets a new service definition, any externally
configured tags are going to be replaced. When does this happen? Only on
major lifecycle events, for example when a task is modified because of an
updated job spec from the 'nomad job run ' command. Otherwise,
Nomad's periodic re-sync's with Consul will now no longer try to restore
the externally modified tag values (as long as enable_tag_override=true).

Fixes #2057

Consul provides a feature of Service Definitions where the tags
associated with a service can be modified through the Catalog API,
overriding the value(s) configured in the agent's service configuration.

To enable this feature, the flag enable_tag_override must be configured
in the service definition.

Previously, Nomad did not allow configuring this flag, and thus the default
value of false was used. Now, it is configurable.

Because Nomad itself acts as a state machine around the the service definitions
of the tasks it manages, it's worth describing what happens when this feature
is enabled and why.

Consider the basic case where there is no Nomad, and your service is provided
to consul as a boring JSON file. The ultimate source of truth for the definition
of that service is the file, and is stored in the agent. Later, Consul performs
"anti-entropy" which synchronizes the Catalog (stored only the leaders). Then
with enable_tag_override=true, the tags field is available for "external"
modification through the Catalog API (rather than directly configuring the
service definition file, or using the Agent API). The important observation
is that if the service definition ever changes (i.e. the file is changed &
config reloaded OR the Agent API is used to modify the service), those
"external" tag values are thrown away, and the new service definition is
once again the source of truth.

In the Nomad case, Nomad itself is the source of truth over the Agent in
the same way the JSON file was the source of truth in the example above.
That means any time Nomad sets a new service definition, any externally
configured tags are going to be replaced. When does this happen? Only on
major lifecycle events, for example when a task is modified because of an
updated job spec from the 'nomad job run <existing>' command. Otherwise,
Nomad's periodic re-sync's with Consul will now no longer try to restore
the externally modified tag values (as long as enable_tag_override=true).

Fixes #2057
@shoenig
Copy link
Member Author

shoenig commented Feb 10, 2020

This comes with a litter helper script for manual checking

#!/usr/bin/env bash

set -euo pipefail

function slice {
  args=("${@}")
  quotes=$(echo "${args[@]@Q}" | tr "'" '"' |  sed -e 's/ /, /g')
  echo "[${quotes}]"
}

job="eto-example"
service="sleep" # just the one for now
testcase="${1}"
host=$(hostname)
nomadV=$(nomad version)
consulV=$(consul version | xargs | cut -d' ' -f1,2)

function serviceID {
  echo "$(curl -s localhost:8500/v1/catalog/service/sleep | jq -r .[0].ServiceID)"
}

function setTags {
  payload=$(cat <<EOM 
  {
    "Node": "${host}",
    "Address": "127.0.0.1",
    "DC": "dc1",
    "Service": {
      "ID": "$(serviceID "${service}")",
      "Service": "${service}",
      "EnableTagOverride": true,
      "Tags": $(slice "$@")
    }
  }
EOM
)
  tmp=$(mktemp)
  echo "${payload}" > "${tmp}"
  curl -XPUT localhost:8500/v1/catalog/register -d "@${tmp}"
}

function startExample {
  echo "[will start ${job} nomad job with enable_tag_override=${1}]"
  payload=$(cat <<EOM
job "${job}" {
  datacenters = ["dc1"]
  type = "service"
  group "group" {
    task "${service}" {
      driver = "raw_exec"
      config {
      command = "/bin/sleep"
        args = ["10000"]
      }
      service {
        name = "${service}"
        tags = ["original", "tags"]
        enable_tag_override = ${1}
      }
    }
  }
}
EOM
)
  tmp=$(mktemp)
  echo "${payload}" > "${tmp}"
  nomad job run "${tmp}"
}

function stopExample {
  nomad job stop "${job}"
}

function watchTags {
  watch "curl -s localhost:8500/v1/catalog/service/${service} | jq '.[0] | .ServiceID, .ServiceName, .ServiceTags'"
}

function showService {
  curl "localhost:8500/v1/catalog/service/${service}"
}

###################
### entry point ###
###################

echo "[setup] host:           ${host}"
echo "[setup] nomad version:  ${nomadV}"
echo "[setup] consul version: ${consulV}"
echo "[setup] action:         ${testcase}"

case "${testcase}" in
  "set-tags")
    echo "--- set-tags ---"
      setTags some new tags
    ;;
  "watch-tags")
    echo "--- watch-tags ---"
      watchTags
    ;;
  "show-service")
    echo "--- show-service ---"
      showService
    ;;
  "start-example")
    echo "--- start-example ---"
      startExample "${2}"
    ;;
  "stop-example")
    echo "--- stop-example ---"
      stopExample
    ;;
  *)
    echo "not a valid test case"
    exit 1
    ;;
esac

Usage outline

# compile nomad
#  $ go install
#
# run nomad
# $ nomad agent -dev -log-level=INFO
#
# run consul
# $ consul agent -dev
#
# keep a tab watching the tags of our service
# ./demo.sh watch-tags
#
# create example job, with enable_tag_override=true
# ./demo.sh start-example true
#
# do a manual update on the tags via consul catalog
# ./demo.sh set-tags
#
# (the watch-tags tab should show the change)
# can also double check the entire service output
# ./demo.sh show-service
#
# now wait ~60 seconds for Consul anti-entropy to take place
# something like: [DEBUG] agent: Node info in sync
#
# the tags should not be changing (indicating Consul is
# respecting the ETO field in the service definition)
#
# now wait another ~30 seconds for Nomad's periodic resync
# to take place
# (there does not seem to be a nice log line indicating the
# periodic resync if nothing happens)
# just note that the tags never get reset to their original values

# to test the ETO=false behavior is unchanged, do all of the above
# but with: start-example false
#
# note that the consul anti-entropy kicks in and restores the tags
# to their original values from the nomad jobspec (about 1 minute
# for a tiny cluster size)

Copy link
Contributor

@endocrimes endocrimes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me - I like the fairly neat changes to the update detector.

@shoenig shoenig merged commit 1ced8ba into master Feb 13, 2020
@shoenig shoenig deleted the f-ctag-override branch February 13, 2020 18:34
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 17, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for consul EnableTagOverride
3 participants