Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

connect native tasks using bridge networking and consul TLS need consul tls server name #10804

Closed
shoenig opened this issue Jun 23, 2021 · 3 comments · Fixed by #10805
Closed
Assignees
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/consul/connect Consul Connect integration type/bug
Milestone

Comments

@shoenig
Copy link
Member

shoenig commented Jun 23, 2021

Nomad does some magic to eliminate the need for configuration when using Connect native tasks, even when Consul is restrictive with TLS and ACLs. When using bridge networking, there is still a need to set CONSUL_TLS_SERVER_NAME, because otherwise Consul rejects requests with a certificate validity error. Setting this should help make traefik with service mesh super easy, once it becomes Connect native in traefik/traefik#7407

2021-06-23T15:34:09.089Z [ERROR] connect.watch: Watch errored: service=traefik type=connect_leaf error="Get "https://%2Falloc%2Ftmp%2Fconsul_http.sock/v1/agent/connect/ca/leaf/traefik": x509: certificate is valid for server.sfo.consul, localhost, not /alloc/tmp/consul_http.sock" retry=5s
job "traefik" {
  datacenters = ["dc1"]

  group "edge" {
    network {
      mode = "bridge"

      port "http" {
	static = 8080
	to = 8080
      }
    }

    service {
      name = "traefik"
      port = 8080
      connect {
	native = true
      }
    }    

    task "traefik" {
      driver = "docker"
      config {
	image = "shoenig/traefik:connect" # use the official image when it is ready
	args = [

	  "--providers.consulcatalog.connectaware=true",
	  "--providers.consulcatalog.connectbydefault=false",
	  "--providers.consulcatalog.exposedbydefault=false",
	  "--entrypoints.http=true",
	  "--entrypoints.http.address=:8080",

       
          # Automatically configured by Nomad through CONSUL_* environment variables
          # as long as client consul.share_ssl is enabled 
	  # "--providers.consulcatalog.endpoint.address=<socket|address>"
	  # "--providers.consulcatalog.endpoint.tls.ca=<path>"
	  # "--providers.consulcatalog.endpoint.tls.cert=<path>"
	  # "--providers.consulcatalog.endpoint.tls.key=<path>"
	  # "--providers.consulcatalog.endpoint.token=<token>"
	  # "--providers.consulcatalog.prefix=traefik",	  
	]
      }

      env {
        # Currently required, this ticket will automate setting this variable
	CONSUL_TLS_SERVER_NAME = "localhost"
      }
    }
  }
}
@shoenig shoenig added type/bug stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/consul/connect Consul Connect integration labels Jun 23, 2021
@shoenig shoenig self-assigned this Jun 23, 2021
shoenig added a commit that referenced this issue Jun 23, 2021
…ive tasks

This PR makes it so that Nomad will automatically set the CONSUL_TLS_SERVER_NAME
environment variable for Connect native tasks running in bridge networking mode
where Consul has TLS enabled. Because of the use of a unix domain socket for
communicating with Consul when in bridge networking mode, the server name is
a file name instead of something compatible with the mTLS certificate Consul
will authenticate against. "localhost" is by default a compatible name, so Nomad
will set the environment variable to that.

Fixes #10804
shoenig added a commit that referenced this issue Jun 28, 2021
…ive tasks

This PR makes it so that Nomad will automatically set the CONSUL_TLS_SERVER_NAME
environment variable for Connect native tasks running in bridge networking mode
where Consul has TLS enabled. Because of the use of a unix domain socket for
communicating with Consul when in bridge networking mode, the server name is
a file name instead of something compatible with the mTLS certificate Consul
will authenticate against. "localhost" is by default a compatible name, so Nomad
will set the environment variable to that.

Fixes #10804
@shoenig shoenig added this to the 1.1.3 milestone Jun 28, 2021
@apollo13
Copy link
Contributor

@shoenig Lovely, thank you so much for this. Sadly it looks like as if connect support will not make it into the next Traefik release :/

@apollo13
Copy link
Contributor

Hi @shoenig, while playing around and looking through the logs I see this caused by traefik:

Jul 23 19:31:08 nomad03 nomad[4441]:     2021-07-23T19:31:08.998+0200 [WARN]  client.alloc_runner.runner_hook: error proxying from Consul: alloc_id=6cbd6909-11c5-c809-3185-3641863c6dc5 error="write unix /opt/paas/data/nomad/alloc/6cbd6909-11c5-c809-3185-3641863c6dc5/alloc/tmp/consul_http.sock->@: write: broken pipe" dest=127.0.0.1:8501 src_local=/opt/paas/data/nomad/alloc/6cbd6909-11c5-c809-3185-3641863c6dc5/alloc/tmp/consul_http.sock src_remote=@ bytes=3710
Jul 23 19:31:09 nomad03 nomad[4441]:     2021-07-23T19:31:09.079+0200 [WARN]  client.alloc_runner.runner_hook: error proxying to Consul: alloc_id=6cbd6909-11c5-c809-3185-3641863c6dc5 error="readfrom tcp 127.0.0.1:55182->127.0.0.1:8501: splice: connection reset by peer" dest=127.0.0.1:8501 src_local=/opt/paas/data/nomad/alloc/6cbd6909-11c5-c809-3185-3641863c6dc5/alloc/tmp/consul_http.sock src_remote=@ bytes=579
Jul 23 19:31:09 nomad03 nomad[4441]:     2021-07-23T19:31:09.100+0200 [WARN]  client.alloc_runner.runner_hook: error proxying from Consul: alloc_id=6cbd6909-11c5-c809-3185-3641863c6dc5 error="write unix /opt/paas/data/nomad/alloc/6cbd6909-11c5-c809-3185-3641863c6dc5/alloc/tmp/consul_http.sock->@: write: broken pipe" dest=127.0.0.1:8501 src_local=/opt/paas/data/nomad/alloc/6cbd6909-11c5-c809-3185-3641863c6dc5/alloc/tmp/consul_http.sock src_remote=@ bytes=3479
Jul 23 19:31:09 nomad03 nomad[4441]:     2021-07-23T19:31:09.144+0200 [WARN]  client.alloc_runner.runner_hook: error proxying to Consul: alloc_id=6cbd6909-11c5-c809-3185-3641863c6dc5 error="readfrom tcp 127.0.0.1:55190->127.0.0.1:8501: splice: connection reset by peer" dest=127.0.0.1:8501 src_local=/opt/paas/data/nomad/alloc/6cbd6909-11c5-c809-3185-3641863c6dc5/alloc/tmp/consul_http.sock src_remote=@ bytes=576
Jul 23 19:31:23 nomad03 nomad[4441]:     2021-07-23T19:31:23.668+0200 [WARN]  client.alloc_runner.runner_hook: error proxying to Consul: alloc_id=6cbd6909-11c5-c809-3185-3641863c6dc5 error="readfrom tcp 127.0.0.1:55286->127.0.0.1:8501: splice: connection reset by peer" dest=127.0.0.1:8501 src_local=/opt/paas/data/nomad/alloc/6cbd6909-11c5-c809-3185-3641863c6dc5/alloc/tmp/consul_http.sock src_remote=@ bytes=544
Jul 23 19:31:23 nomad03 nomad[4441]:     2021-07-23T19:31:23.726+0200 [WARN]  client.alloc_runner.runner_hook: error proxying to Consul: alloc_id=6cbd6909-11c5-c809-3185-3641863c6dc5 error="readfrom tcp 127.0.0.1:55292->127.0.0.1:8501: splice: connection reset by peer" dest=127.0.0.1:8501 src_local=/opt/paas/data/nomad/alloc/6cbd6909-11c5-c809-3185-3641863c6dc5/alloc/tmp/consul_http.sock src_remote=@ bytes=573
Jul 23 19:31:23 nomad03 nomad[4441]:     2021-07-23T19:31:23.726+0200 [WARN]  client.alloc_runner.runner_hook: error proxying from Consul: alloc_id=6cbd6909-11c5-c809-3185-3641863c6dc5 error="read tcp 127.0.0.1:55292->127.0.0.1:8501: read: connection reset by peer" dest=127.0.0.1:8501 src_local=/opt/paas/data/nomad/alloc/6cbd6909-11c5-c809-3185-3641863c6dc5/alloc/tmp/consul_http.sock src_remote=@ bytes=3503
Jul 23 19:31:23 nomad03 nomad[4441]:     2021-07-23T19:31:23.916+0200 [WARN]  client.alloc_runner.runner_hook: error proxying from Consul: alloc_id=6cbd6909-11c5-c809-3185-3641863c6dc5 error="write unix /opt/paas/data/nomad/alloc/6cbd6909-11c5-c809-3185-3641863c6dc5/alloc/tmp/consul_http.sock->@: write: broken pipe" dest=127.0.0.1:8501 src_local=/opt/paas/data/nomad/alloc/6cbd6909-11c5-c809-3185-3641863c6dc5/alloc/tmp/consul_http.sock src_remote=@ bytes=3787
Jul 23 19:31:23 nomad03 nomad[4441]:     2021-07-23T19:31:23.996+0200 [WARN]  client.alloc_runner.runner_hook: error proxying to Consul: alloc_id=6cbd6909-11c5-c809-3185-3641863c6dc5 error="readfrom tcp 127.0.0.1:55316->127.0.0.1:8501: splice: connection reset by peer" dest=127.0.0.1:8501 src_local=/opt/paas/data/nomad/alloc/6cbd6909-11c5-c809-3185-3641863c6dc5/alloc/tmp/consul_http.sock src_remote=@ bytes=591
Jul 23 19:31:24 nomad03 nomad[4441]:     2021-07-23T19:31:24.057+0200 [WARN]  client.alloc_runner.runner_hook: error proxying from Consul: alloc_id=6cbd6909-11c5-c809-3185-3641863c6dc5 error="write unix /opt/paas/data/nomad/alloc/6cbd6909-11c5-c809-3185-3641863c6dc5/alloc/tmp/consul_http.sock->@: write: broken pipe" dest=127.0.0.1:8501 src_local=/opt/paas/data/nomad/alloc/6cbd6909-11c5-c809-3185-3641863c6dc5/alloc/tmp/consul_http.sock src_remote=@ bytes=3792

Now I a wondering if this is worth logging as warning. Do you also see this with your example above? I am also not exactly sure if traefik is at fault or consul closing the connection? (Neither consul nor traefik logs show anything for me)

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 17, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/consul/connect Consul Connect integration type/bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants