Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve consul fingerprinting #10688

Closed
shoenig opened this issue Jun 2, 2021 · 1 comment · Fixed by #10699
Closed

Improve consul fingerprinting #10688

shoenig opened this issue Jun 2, 2021 · 1 comment · Fixed by #10699
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/fingerprint type/enhancement

Comments

@shoenig
Copy link
Member

shoenig commented Jun 2, 2021

Currently, Nomad agent will fingerprint these attributes about its configured Consul client:

consul.server
consul.version
consul.revision
unique.consul.name
consul.datacenter
consul.segment
unique.consul.name

Although whether Consul is enterprise is embedded in consul.version as a build meta extension (+ent), this is insufficient to properly switch on enterprise features, i.e. in the case of Namespace support. This is because Enterprise features may be gated behind licensing, returning an error when querying related endpoints. With how Nomad tries to use namespaces, this currently causes an error:

consul.sync: still unable to update services in Consul: failures=120 error="failed to query Consul namespaces: Unexpected response code: 500 (rpc error making call: Namespaces are currently disabled until all servers in the datacenter supports the feature)"

The Consul fingerprint should include all available features on Nomad client startup, so we can proactively avoid hitting endpoint(s) that will just return an error later on.

While we're at it, we can also detect if Consul Connect is enabled, and avoid scheduling Connect tasks on nodes with improperly configured Consul agents. Before using Connect, Consul must be conifgured with connect.enabled = true and ports.grpc = <non-zero>. We can detect if connect is enabled with a query to the /v1/agent/self API and checking for a non-empty* xDS.SupportedProxes response blob. We can check for the grpc port being open by making a TCP connection to the configured consul.grpc_address (or 8502 if not configured).

We can detect if connect is enabled with .DebugConfig.ConnectEnabled, and if gRPC is enabled with .DebugConfig.GRPCPort. Items under DebugConfig are subject to change, but I suspect these two values are going to be pretty stable...

@shoenig shoenig added type/enhancement theme/fingerprint stage/accepted Confirmed, and intend to work on. No timeline committment though. labels Jun 2, 2021
shoenig added a commit that referenced this issue Jun 7, 2021
This PR changes Nomad's wrapper around the Consul NamespaceAPI so that
it will detect if the Consul Namespaces feature is enabled before making
a request to the Namespaces API. Namespaces are not enabled in Consul OSS,
and require a suitable license to be used with Consul ENT.

Previously Nomad would check for a 404 status code when makeing a request
to the Namespaces API to "detect" if Consul OSS was being used. This does
not work for Consul ENT with Namespaces disabled, which returns a 500.

Now we avoid requesting the namespace API altogether if Consul is detected
to be the OSS sku, or if the Namespaces feature is not licensed. Since
Consul can be upgraded from OSS to ENT, or a new license applied, we cache
the value for 1 minute, refreshing on demand if expired.

Fixes hashicorp/nomad-enterprise#575

Note that the ticket originally describes using attributes from #10688.
This turns out not to be possible due to a chicken-egg situation between
bootstrapping the agent and setting up the consul client. Also fun: the
Consul fingerprinter creates its own Consul client, because there is no
[currently] no way to pass the agent's client through the fingerprint factory.
shoenig added a commit that referenced this issue Jun 7, 2021
This PR changes Nomad's wrapper around the Consul NamespaceAPI so that
it will detect if the Consul Namespaces feature is enabled before making
a request to the Namespaces API. Namespaces are not enabled in Consul OSS,
and require a suitable license to be used with Consul ENT.

Previously Nomad would check for a 404 status code when makeing a request
to the Namespaces API to "detect" if Consul OSS was being used. This does
not work for Consul ENT with Namespaces disabled, which returns a 500.

Now we avoid requesting the namespace API altogether if Consul is detected
to be the OSS sku, or if the Namespaces feature is not licensed. Since
Consul can be upgraded from OSS to ENT, or a new license applied, we cache
the value for 1 minute, refreshing on demand if expired.

Fixes hashicorp/nomad-enterprise#575

Note that the ticket originally describes using attributes from #10688.
This turns out not to be possible due to a chicken-egg situation between
bootstrapping the agent and setting up the consul client. Also fun: the
Consul fingerprinter creates its own Consul client, because there is no
[currently] no way to pass the agent's client through the fingerprint factory.
shoenig added a commit that referenced this issue Jun 7, 2021
This PR changes Nomad's wrapper around the Consul NamespaceAPI so that
it will detect if the Consul Namespaces feature is enabled before making
a request to the Namespaces API. Namespaces are not enabled in Consul OSS,
and require a suitable license to be used with Consul ENT.

Previously Nomad would check for a 404 status code when makeing a request
to the Namespaces API to "detect" if Consul OSS was being used. This does
not work for Consul ENT with Namespaces disabled, which returns a 500.

Now we avoid requesting the namespace API altogether if Consul is detected
to be the OSS sku, or if the Namespaces feature is not licensed. Since
Consul can be upgraded from OSS to ENT, or a new license applied, we cache
the value for 1 minute, refreshing on demand if expired.

Fixes hashicorp/nomad-enterprise#575

Note that the ticket originally describes using attributes from #10688.
This turns out not to be possible due to a chicken-egg situation between
bootstrapping the agent and setting up the consul client. Also fun: the
Consul fingerprinter creates its own Consul client, because there is no
[currently] no way to pass the agent's client through the fingerprint factory.
@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/fingerprint type/enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant