Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consul Connect: Multiple services with the same upstream #7833

Closed
varsanojidan opened this issue Apr 29, 2020 · 3 comments · Fixed by #10789
Closed

Consul Connect: Multiple services with the same upstream #7833

varsanojidan opened this issue Apr 29, 2020 · 3 comments · Fixed by #10789
Assignees
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/consul/connect Consul Connect integration type/enhancement

Comments

@varsanojidan
Copy link

varsanojidan commented Apr 29, 2020

Hi, we recently ran into an issue when trying to build services around Consul Connect.

Setup
Deploy a Nomad job with multiple Services (all in the same container), each having their own sidecar proxy, and the same upstream destination/LocalBindPort.

Issue
Since each service has the same upstream, requests made within the container to that upstream could potentially end up routing through any of the services' sidecars, as a result we are seeing intermittent: "Connection Refused"

Our question is: Is there a way around this issue (or a better design practice) that would avoid us having to individualize the upstreams to the individual services in the container?

Any help would be greatly appreciated.

@schmichael
Copy link
Member

Would you mind submitting the jobspec (or a minimal reproduction)?

...the same upstream destination/LocalBindPort.

I'm not sure why that works at all! If I'm understanding correctly you end up with multiple sidecars - each one trying to bind to the same port inside the group's network namespace! Maybe I'm misunderstanding something in which case the jobspec would clear things up.

Can you only specify the upstream on one service? If not I'd love to hear more about your use case as it may not be one we've fully considered.

@shoenig
Copy link
Member

shoenig commented Jun 18, 2021

A small reproduction (correct me if I'm on the wrong track @varsanojidan!)

job "upstreams" {
  datacenters = ["dc1"]

  group "server" {
    network {
      mode = "bridge"
    }

    service {
      name = "server"
      port = 8999
      connect {
        sidecar_service {}
      }
    }

    task "server" {
      driver = "docker"
      config {
	image = "shoenig/simple-http:v1"
	args = ["server"]
      }
    }
  }

  group "clients" {
    network {
      mode = "bridge"
    }

    service {
      name = "client-a"
      connect {
        sidecar_service {
          proxy {
            upstreams {
              destination_name = "server"
              local_bind_port  = 8999
            }
          }
        }
      }
    }

    service {
      name = "client-b"
      connect {
        sidecar_service {
          proxy {
            upstreams {
              destination_name = "server"
              local_bind_port  = 8999
            }
          }
        }
      }
    }    

    task "client-a" {
      driver = "docker"
      config {
	image = "shoenig/simple-http:v1"
	args = ["client"]
      }
    }

    task "client-b" {
      driver = "docker"
      config {
	image = "shoenig/simple-http:v1"
	args = ["client"]
      }
    }    
  }
}

The client requests in this case work because one envoy is able to bind the listener while the other errors with,

[2021-06-18 19:05:59.888][8][warning][config] [source/common/config/grpc_subscription_impl.cc:107] gRPC config for type.googleapis.com/envoy.api.v2.Listener rejected: Error adding/updating listener(s) server:127.0.0.1:8999: cannot bind '127.0.0.1:8999': Address already in use

This isn't about connecting to the same upstream, but rather both of the client services are trying to use the same bind address and port inside the group's network namespace. We could add some validation and reject such a job on job submission, since we know it is not going to work.

@shoenig shoenig assigned shoenig and unassigned schmichael Jun 18, 2021
@shoenig shoenig added stage/accepted Confirmed, and intend to work on. No timeline committment though. type/enhancement and removed stage/waiting-reply type/question labels Jun 18, 2021
shoenig added a commit that referenced this issue Jun 18, 2021
…group

This PR adds validation during job submission that Connect proxy upstreams
within a task group are using different listener addresses. Otherwise, a
duplicate envoy listener will be created and not be able to bind.

Closes #7833
Nomad - Community Issues Triage automation moved this from In Progress to Done Jun 21, 2021
@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/consul/connect Consul Connect integration type/enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants