Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix wrong control plane discovery, when serf advertised address located in network, unreachable from clients #11895

Conversation

tantra35
Copy link
Contributor

Suppose follow configuration, when we use nomad federation:

  1. We have 2 nomad regions (A and B)
  2. controlplane of regions located in common network: 10.168.0.0/16, rpc and http addresses located in private networks, unique for both regions(192.168.102.0/24 for region A, and 192.168.103.0/24 for region B)
  3. nomad clients located only in private networks and can't communicate with controlplane throw 10.168.0.0/24
  4. on controllplane we advertise addreses like this:
    1. for region A on first server
      advertise
      {
          http = "192.168.102.11"
          rpc =  "192.168.102.11"
          serf = "10.168.0.11"
      } 
      
    2. for region B on first server
      advertise
      {
          http = "192.168.103.11"
          rpc =  "192.168.103.11"
          serf = "10.168.1.11"
      } 
      

In such configuration nomad clients when discover controlplane by consul will get addresses from network 10.168.0.0/16, and of course nothing will work, because they can't reach this network

This patch try fix this situation by adding new RPC method RpcPeers which will return private RPC adresses instead of Serf

in another network - unreachable from nomad clients(client when discovering controlplane by consul, got serf advirtised address, but must get rpc)
Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @tantra35!

The situation you've described here seems a little weird to me, because the existing Status.Peers endpoint should already be returning the RPC endpoints. Here's an example configuration:

advertise {
  http = "192.168.56.10"
  rpc  = "192.168.56.10"
  serf = "10.0.2.15"
}

And then if I curl the appropriate endpoint:

$ curl -H "X-Nomad-Token: $NOMAD_TOKEN" -s "http://localhost:4646/v1/status/peers"
["192.168.56.20:4647","192.168.56.10:4647","192.168.56.30:4647"]%

Can you provide a bit more information about the client configuration you have? Maybe that'd help us figure out what the issue is for you.

@tantra35
Copy link
Contributor Author

tantra35 commented Jan 24, 2022

@tgross Hm just tested on my test stand and as expected peers return wrong addresses:

vagrant@consulnomad-11x-1:~/nomad$ curl -H "X-Nomad-Token: 5f1b4db3-8110-de8f-486e-3c8c806e1912" -s "http://localhost:4646/v1/status/peers"; echo
["10.168.0.12:4647","10.168.0.13:4647","10.168.0.11:4647"]

our nomad configs:

  1. server.hcl
    datacenter = "test"
    data_dir = "/var/lib/nomad/"
    
    disable_update_check = true
    
    enable_syslog = false
    log_level = "INFO"
    
    server
    {
      enabled          = true
    
      raft_protocol    = 3
      bootstrap_expect = 3
    
    
    }
    
  2. acl.hcl
    acl
    {
      enabled = true
      }
    
  3. consul.hcl
    consul
    {
      token = "018e6c3c-7c4a-4793-b9fd-d8cde8618269"
      }
    
  4. advertise.hcl (its depends based on server)
    advertise
    {
            http = "192.168.102.11"
            rpc = "192.168.102.11"
            serf = "10.168.0.11"
    }
    

On this stand i up only controlplane, without any clients or anything else, just 3 controlplane servers(nomad + consul). If it make sense consul also advertise wan address like this:

/opt/consul/consul agent -config-dir=/etc/consul -advertise=192.168.102.11 -advertise-wan=10.168.0.11

If it will be useful i can provide my Vagrant stand

@tantra35
Copy link
Contributor Author

tantra35 commented Jan 25, 2022

@tgross I having rebuilded test stand where only nomad installed(consul not installed in that stand), and I don't understand why you get the right result and I don't?
server config now holds join:

datacenter = "test"
data_dir = "/var/lib/nomad/"

disable_update_check = true

enable_syslog = false
log_level = "INFO"

server
{
  enabled          = true

  raft_protocol    = 3
  bootstrap_expect = 3


  server_join {
    retry_join = ["192.168.102.11", "192.168.102.12", "192.168.102.13"]

  }
}

then i try to get server mebers (tokens that present regenerated every time when started new stand)

vagrant@consulnomad-11x-1:~/nomad$ nomad server members -token=4225fc30-0caf-6b23-7a2d-4f4961a8111a
Name                      Address      Port  Status  Leader  Protocol  Build   Datacenter  Region
consulnomad-11x-1.global  10.168.0.11  4648  alive   false   2         1.1.10  test        global
consulnomad-11x-2.global  10.168.0.12  4648  alive   true    2         1.1.10  test        global
consulnomad-11x-3.global  10.168.0.13  4648  alive   false   2         1.1.10  test        global

then try to call Status.Peers via REST interface

vagrant@consulnomad-11x-1:~/nomad$ curl -H "X-Nomad-Token: 4225fc30-0caf-6b23-7a2d-4f4961a8111a" -s "http://localhost:4646/v1/status/peers"; echo
["10.168.0.12:4647","10.168.0.11:4647","10.168.0.13:4647"]

And imo this is absolutely expected behavior, I've never seen nomad work any other way,

Maybe something changed between master and 1.1.10 versions? But its strange because i doesn't see significant changes between between versions which would address these points

@tantra35 tantra35 requested a review from tgross January 25, 2022 11:24
@hashicorp-cla
Copy link

hashicorp-cla commented Mar 12, 2022

CLA assistant check
All committers have signed the CLA.

@tgross
Copy link
Member

tgross commented Feb 17, 2023

I'm going to close this PR in lieu of #16217. #16211 was opened recently and this gave us some extra clues we needed to reproduce this problem and ensure we had the right fix. Sorry this got left for so long!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

None yet

3 participants