Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consul grpc_tls usage cannot deploy nomad jobs anymore #15266

Closed
suikast42 opened this issue Nov 16, 2022 · 5 comments · Fixed by #15309
Closed

Consul grpc_tls usage cannot deploy nomad jobs anymore #15266

suikast42 opened this issue Nov 16, 2022 · 5 comments · Fixed by #15309

Comments

@suikast42
Copy link
Contributor

suikast42 commented Nov 16, 2022

Nomad version

1.4.2

Consul version

1.14.0

I updated my consul agent today. Consul 1.40 have a breaking change

BREAKING CHANGES:
config: Add new ports.grpc_tls configuration option.

I remove the grpc falg form the config and replace it with grpc_tls . After that nomad is not able to deploy jobs with the error:

Placement Failure
Task Group "keycloak":

  • Constraint "${attr.consul.grpc} > 0": 1 nodes excluded by filter
@jrasell
Copy link
Member

jrasell commented Nov 16, 2022

Hi @suikast42 and thanks for raising this issue. We are currently looking into how this change impacts Nomad's integration, and what, if any fixes would be needed.

@jrasell jrasell pinned this issue Nov 16, 2022
@mikenomitch mikenomitch added this to the 1.4.3 milestone Nov 16, 2022
hc-github-team-consul-core pushed a commit to hashicorp/consul that referenced this issue Nov 16, 2022
See hashicorp/nomad#15266 for details.

I plan to submit a followup PR to update these docs once Nomad releases
a fix. At that point Nomad users will still need to be informed that
they must upgrade Nomad to a compatible version *before* upgrading
Consul to 1.14.
@jrasell
Copy link
Member

jrasell commented Nov 18, 2022

This has been investigated internally and we are currently working on building and testing a fix. A note has been added to the Consul upgrade guide which I will also post here:

The changes to Consul service mesh in version 1.14 are incompatible with Nomad 1.4.2 and earlier. If you operate Consul service mesh using Nomad 1.4.2 or earlier, do not upgrade to Consul 1.14 until hashicorp/nomad#15266 is fixed.

Overview

Consul 1.14.0 changed the way in which gRPC listeners are configured, particularly when using TLS. Prior to the change, a single listener was responsible for handling plain-text and encrypted gRPC requests. In 1.14.0 and beyond, separate listeners will be used for each, defaulting to 8502 and 8503 for plain-text and TLS respectively.

The change means that Nomad’s Consul Connect integration will not currently work when integrated with Consul clusters using TLS and running 1.14.0 or greater. Consul clusters that do not utilize TLS to protect communication are not affected and continue to work with Nomad’s Connect integration.

Required Changes

Consul 1.14.0 clusters running TLS fail to be fingerprinted correctly by Nomad due to the changes described above in combination with modifications to the returns object from Consul’s agent self endpoint. The fingerprint results in the Nomad nodes consul.grpc attribute being set to -1, meaning gRPC is disabled. This is critical, as Nomad adds an implicit constraint to all Nomad tasks which configure a Connect Gateway or Connect Sidecar.

Fingerprinter

The Nomad Consul fingerprinter identifies the gRPC port Consul has exposed using the DebugConfig.GRPCPort value from Consul’s /v1/agent/self endpoint. In Consul 1.14.0 and greater, this only represents the plain-text gRPC port which is likely to be disabled in clusters running TLS. In order to identify the gRPC port when running TLS, Nomad will need to pull the DebugConfig.GRPCTLSPort value from Consul’s agent self return object depending whether TLS is enabled or not and the version being queried.

Consul gRPC Socket Hook

The consul_grcp_socket allocrunner hook is responsible for creating a UNIX socket to allow communication to the Consul gRPC endpoint from inside a Linux network namespace. It uses the consul.grpc_address value as the socket destination address. If, however, the value is an empty string, it will concatenate the value consul.address with 8502. Ideally we should pass the fingerprinted port to the hook to avoid hardcoding values.

@suikast42
Copy link
Contributor Author

Regarding to your comment #15295 (comment)

The working consul 1.13.3 sever and agent config

Consul 1.14.0 does not booting up if the grpc_tls is missing. So I add only
grpc_tls = 8503 to the ports section.
If I delete the grpc sec falg in tthe ports section then nomad can't deploy artifacts because of missinf not attribute grpc.

I hope the was the right answer.

{
    "node_name": "{{host_name}}",
    "datacenter": "{{data_center}}",
    "data_dir": "{{consul_data_dir}}",
    "server": true,
    "log_level" : "INFO",
    "bind_addr": "0.0.0.0",
    "advertise_addr": "{{host_ip}}",
    "client_addr": "0.0.0.0",
    "encrypt": "{{consul_encrypt_key}}",
    "ui_config": {
        "enabled" : true
    },
    "addresses": {
        "grpc" : "127.0.0.1"
    },
    "ports": {
        "http": -1,
        "grpc" : 8502,
        "https": 8501
    },
    "connect": {
        "enabled": true
    },
    "retry_join":{{masters | to_json }},
    "bootstrap_expect": {{masters|length}},

    "auto_encrypt": {
        "allow_tls": true
    },

	"performance" :{
	   "raft_multiplier" : 1
	},

    "node_meta": {
        "node_type": "server"
    },
    "tls":{
      "defaults":{
        "ca_file": "{{cluster_intermediate_ca_bundle}}",
        "cert_file": "{{consul_cert}}",
        "key_file": "{{consul_cert_key}}",
        "verify_incoming": true,
        "verify_outgoing": true
      },
     "internal_rpc":{
         "verify_server_hostname": true
     }
    },
     "telemetry": {
       "disable_hostname" : true
     }
}

{
    "node_name": "{{host_name}}",
    "datacenter": "{{data_center}}",
    "data_dir": "{{consul_data_dir}}",
	"ports": {"https":8501},
	"bind_addr": "0.0.0.0",
    "advertise_addr": "{{host_ip}}",
    "auto_encrypt": {
        "tls": true
    },
    "retry_join":{{masters | to_json }},
    "encrypt": "{{consul_encrypt_key}}",
    "node_meta": {
        "node_type": "worker"
    },
    "ports": {
        "http": -1,
        "grpc" : 8502,
        "https": 8501
    },
    "connect": {
        "enabled": true
    },
    "tls":{
      "defaults":{
        "ca_file": "{{cluster_intermediate_ca_bundle}}",
        "cert_file": "{{consul_cert}}",
        "key_file": "{{consul_cert_key}}",
        "verify_incoming": false,
        "verify_outgoing": true
      },
      "internal_rpc":{
         "verify_server_hostname": true
      }
    },
    "telemetry": {
      "disable_hostname" : true
    }
}
´´´

@tgross
Copy link
Member

tgross commented Nov 21, 2022

Hi folks, we've got a patch landed for this and it will ship in Nomad 1.4.3, which is coming out shortly.

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
Development

Successfully merging a pull request may close this issue.

5 participants