Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose consul-template configuration parameters for operators to tune. #11707

Closed
DerekStrickland opened this issue Dec 20, 2021 · 1 comment · Fixed by #11606
Closed

Expose consul-template configuration parameters for operators to tune. #11707

DerekStrickland opened this issue Dec 20, 2021 · 1 comment · Fixed by #11606
Assignees
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/template type/enhancement

Comments

@DerekStrickland
Copy link
Contributor

DerekStrickland commented Dec 20, 2021

Use-cases

When Nomad clusters are experiencing Consul/Vault system degradation, or in situations where Operators have a higher tolerance for latency or connectivity loss (edge), a lack of control over consul-template fault tolerance behaviors can lead to a significant amount of allocation churn. consul-template exposes configuration options for fine-tuning retries, blocking queries, and startup fault tolerance. Having the ability to configure these options would be useful for Nomad Operators, especially when experiencing latency or instability when communicating with Consul.

Proposal

Expose consul-template configuration parameters for operators to tune. The set of configuration options that will be exposed for consul-template, and their usage are:

  • max_stale - This is the maximum interval to allow "stale" data. By default, only the Consul leader will respond to queries. Requests to a follower or local agent will be forwarded to the leader. In large clusters with many requests, this is not as scalable, so this option allows any follower to respond to a query, so long as the last-replicated data is within these bounds. Higher values result in less cluster load but are more likely to have outdated data.
  • block_query_wait - This is the amount of time to perform a blocking query. Many endpoints in Consul support a feature known as "blocking queries". A blocking query is used to wait for a potential change using long polling. This reduces the load on Consul by avoiding making new requests to Consul when nothing has changed.
  • wait - This defines the minimum and maximum amount of time to wait for the cluster to reach a consistent state before rendering a template. This is useful to enable in systems that are experiencing a lot of flapping because it will reduce the number of times a template is rendered. This is configurable at both the client and the task level. The task-level setting can be used to override the global setting.
  wait {
    enabled = true
    min     = "5s"
    max     = "90s"
  }
  • wait_bounds - This is a Nomad-specific configuration that enables Nomad Operators to set client level constraints that set bounds on individual jobspec configuration. This setting defines lower and upper bounds for per-template wait configuration on a given client. If the individual template configuration has a min lower than wait_bounds.min or a max greater than the wait_bounds.max, the bounds will be enforced, and the template wait will be adjusted before being sent to consul-template.
wait_bounds {
  enabled = true
  min     = "5s"
  max     = "10s"
}
  • consul_retry - This controls the retry behavior when an error is returned from Consul. By default, Nomad will fail and reschedule an alloc when a template fails to render. This can lead to a significant eval -> alloc -> template render failure cycle in clusters where Consul is unstable.
 consul_retry {
   enabled = true
   # This specifies the number of attempts to make before giving up. Each
   # attempt adds the exponential backoff sleep time. Setting this to
   # zero will implement an unlimited number of retries.
   attempts = 12
   # This is the base amount of time to sleep between retry attempts. Each
   # retry sleeps for an exponent of 2 longer than this base. For 5 retries,
   # the sleep times would be: 250ms, 500ms, 1s, 2s, then 4s.
   backoff = "250ms"
   # This is the maximum amount of time to sleep between retry attempts.
   # When max_backoff is set to zero, there is no upper limit to the
   # exponential sleep between retry attempts.
   # If max_backoff is set to 10s and backoff is set to 1s, sleep times
   # would be: 1s, 2s, 4s, 8s, 10s, 10s, ...
   max_backoff = "1m"
 }

- `vault_entry` - This controls the retry behavior when an error is returned from Vault. By default, Nomad will fail and reschedule an alloc when a template fails to render. This can lead to a significant `eval` -> `alloc` -> `template` render failure cycle in clusters where Consul is unstable. 

```hcl
 vault_retry {
   # Same explanations as Consul
   enabled = true
   attempts = 12
   backoff = "250ms"
   max_backoff = "1m"
 }

Related Issues

Closes #3866
Closes #2623

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 12, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/template type/enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants