Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consul wan docs enhancement #1914

Closed
shakisha opened this issue Apr 2, 2016 · 31 comments
Closed

Consul wan docs enhancement #1914

shakisha opened this issue Apr 2, 2016 · 31 comments
Labels
theme/operator-usability Replaces UX. Anything related to making things easier for the practitioner type/bug Feature does not function as expected

Comments

@shakisha
Copy link

shakisha commented Apr 2, 2016

I have got my datacenter dc1 with various nodes, and i''ve bootstrapped datacenter2 dc2.

As told on this link (https://www.consul.io/docs/guides/datacenters.html), i need to launch command

consul join -wan 

but documentation isn't clear on this things:

  1. what is the port number for wan and should be this one listening on public wan interface?
  2. how to place this consul join -wan into configuration file without having to launch manually the command into cli?
  3. this part "Of course, all server nodes must be able to talk to each other. Otherwise, the gossip protocol as well as RPC forwarding will not work." is referring about which datacenter?
@slackpad slackpad added the type/docs Documentation needs to be created/updated/clarified label Apr 12, 2016
@slackpad
Copy link
Contributor

Hi @shakisha I'll leave this open as a reminder to improve the docs in this section. Here's some info that should help:

what is the port number for wan and should be this one listening on public wan interface?

The port you'd want to use for the join is the "Serf WAN" port which defaults to 8302. You'd need the Consul servers participating in the WAN to expose this port bound to an interface that's reachable from the other Consul servers (they need to form a fully connected mesh). You'll need TCP and UDP access to this port from any firewalls.

how to place this consul join -wan into configuration file without having to launch manually the command into cli?

The best way to do this is via the retry_join_wan config option - https://www.consul.io/docs/agent/options.html#retry_join_wan.

this part "Of course, all server nodes must be able to talk to each other. Otherwise, the gossip protocol as well as RPC forwarding will not work." is referring about which datacenter?

The gossip part is the "Serf WAN" port discussed above. You'll also need to expose the "Server RPC" port 8300/tcp in order for RPC forwarding to work. All servers participating in the WAN should be able to reach each other on 8300/tcp.

@shakisha
Copy link
Author

Hi James @slackpad , thank you really much for your answer;
i still have got troubles for making this wan consul work properly, this is my configuration:

{
    "server": true,
    "datacenter": "lon01",
    "data_dir": "/var/consul",
    "log_level": "info",
    "enable_syslog": false,
    "encrypt": "blablabla",
    "bind_addr": "MY LAN INTERFACE",
    "retry_join_wan": ["IP ADDRESS OF REMOTE CONSUL SERVER"]
}

This remote consul server has got in the configuration the bind address configured as lan.

When i start consul on wan client i receive an error that the wan is available only in server mode.
After i switch to server, this is was i receive:

    2016/04/12 17:55:31 [ERR] agent: failed to sync remote state: No cluster leader
    2016/04/12 17:55:32 [ERR] agent: coordinate update error: No cluster leader
    2016/04/12 17:55:35 [INFO] agent: (WAN) joining: [SERVERIP]
    2016/04/12 17:55:35 [INFO] agent: (WAN) joined: 0 Err: dial tcp SERVERIP:8302: getsockopt: connection refused
    2016/04/12 17:55:35 [WARN] agent: Join -wan failed: dial tcp SERVERIP:8302: getsockopt: connection refused, retrying in 30s

Where i'm lost? in the documentation i cannot find nothing about this issue ;(

@shakisha
Copy link
Author

Hello @slackpad , still not have got success on this :-(

Have you got any kinda of update for me? Thank you a lot!

@slackpad
Copy link
Contributor

Hi @shakisha I think it might be "bind_addr": "MY LAN INTERFACE" - is that interface reachable by the other server (that's the TCP connection failing to SERVERIP:8302)? If there are multiple interfaces then you can remove that config to bind them all and then set advertise_addrs for serf_lan and serf_wan for the IP addresses for that server to advertise others should contact it on.

@shakisha
Copy link
Author

shakisha commented Apr 15, 2016

@slackpad : this is exactly the part of the docs which I didn't get.
on the bind address I have only lan interface because I have got 10 clients in the local lan (datacenter0)

in datacenter "lon1" I have the configuration listed above and connection refuse error, but from tcpdump in pretty sure that lon1 server is contacting the server at "datacenter0".

should I totally remove bind address and use https://www.consul.io/docs/agent/options.html#advertise_addrs or put only advertise wan address?

will be secure keeping only that port with encryption enabled?

UPDATE:i have tried putting these values in the configuration of datacenter0 and lon1:

   "advertise_addrs": {
       "serf_wan": "WAN_IP_ADDRESS:8302",
       "serf_lan": "LAN_IP_ADDRESS:8301",
       "rpc": "LAN_IP_ADDRESS:8300"
  }

but didn't worked, still the same "getsockopt: connection refused" (firewalls are disabled on both side).

Also i've tried to remove bind_address, but even with serf_lan,wan and rpc parameters, consul refused to start because of multiple private ip address on the machine.
I'm desesperated

@slackpad slackpad added type/bug Feature does not function as expected and removed type/docs Documentation needs to be created/updated/clarified labels Apr 15, 2016
@slackpad
Copy link
Contributor

Hi @shakisha and sorry for the trouble on this one. It looks like there's a bug where it's doing the private IP check, even though you've specified the advertise_addrs structure. The individual form of these configs looks like they should properly skip that check:

{
    "advertise_addr": "LAN_IP_ADDRESS",
    "advertise_addr_wan": "WAN_IP_ADDRESS",
    "bind_addr": "0.0.0.0"
}

You are right that this type of configuration will result in Consul binding to all interfaces on a machine so you'd want to enable encryption for the serf ports and TLS w/verification for the RPC port. Many folks bind everything to the LAN interface and use a NAT / VPN / tunneled connection to avoid exposing anything externally.

@slackpad
Copy link
Contributor

Also noting here that we should allow the configuration of a different bind address for LAN and WAN interfaces - this isn't currently supported. Linking #473 since it's related.

@shakisha
Copy link
Author

Hi @slackpad,

Many folks bind everything to the LAN interface and use a NAT / VPN / tunneled connection to avoid exposing anything externally

this is very a GREAT idea, but how to do that if vpn is on tun0 interface and i have to listen on both eth1 (lan) and tun0 (vpn)?

@slackpad
Copy link
Contributor

@shakisha I was thinking for the case where some other box provides the gateway - if the server itself has the tunnel then there's currently no way to support this other than bind 0.0.0.0 per #1914 (comment) and possibly some firewall rules to pare down the configuration. Once we get #473 and the option to bind (potentially multiple addresses) for each function (wan, lan, rpc) then this should get easier to configure.

@shakisha
Copy link
Author

Thanks for your answer, but actually if i bind to 0.0.0.0 will bind to all interfaces? i can just open to all the interfaces and use iptables to block unwanted ports, tell me if i'm correct

@slackpad
Copy link
Contributor

Yes - this config should bind to all interfaces:

{
    "advertise_addr": "LAN_IP_ADDRESS",
    "advertise_addr_wan": "WAN_IP_ADDRESS",
    "bind_addr": "0.0.0.0"
}

The advertise addresses tell other hosts in the cluster how to contact the server.

@shakisha
Copy link
Author

can i just put ?

{
"bind_addr": "0.0.0.0"
}

Because i want to use lan and tun0 (vpn)... or you advice to use tun0 instead of wan_ip_address on this config?

@slackpad
Copy link
Contributor

slackpad commented Apr 15, 2016

"advertise_addr": "LAN_IP_ADDRESS" and "advertise_addr_wan": "WAN_IP_ADDRESS" shouldn't affect any binding but they will prevent the "multiple private IP" error since you have several interfaces and Consul won't be able to decide which one to use.

@shakisha
Copy link
Author

shakisha commented Apr 15, 2016

ok, but will this config listen to packets coming from tun0 interface?

@slackpad
Copy link
Contributor

It should - it will bind to everything!

@shakisha
Copy link
Author

Hi to all, are there any updates about this issue? :-)

@slackpad slackpad added this to the 0.7.4 milestone Nov 22, 2016
This was referenced Nov 30, 2016
@sean-
Copy link
Contributor

sean- commented Dec 2, 2016

@shakisha please give the latest code in master a twirl and let us know. The syntax for selecting IP addresses, such as getting the first "usable" IP address on an interface, or any other manner of network craziness is likely possible now as a template evaluated address parameter can be passed to Consul addresses (e.g. -bind or bind_addr, or any other *_addr-like parameter):

-bind='{{ GetInterfaceIP "eth0" }}'
-bind='{{ GetAllInterfaces | include "network" "10.99.0.0/24" }}'
-bind='{{ GetDefaultInterfaces | include "network" "10.99.0.0/24" | sort "size,address" | attr "address" }}'
-bind='{{ GetAllInterfaces | exclude "rfc" "6890" | sort "type,size,address" | include "flags" "up|forwardable" | attr "address" }}'

Very few people should need to do anything as obscene as shown in the last example, but the functionality is there should you need it. I gave this issue a quick read and it seems like you could use this configuration enhancement to select the tun0 address for the bind_addr and then use a different parameter value for the advertise_addr. This doesn't solve your exact problem of listening on multiple interfaces or IPs, but it's on its way in the near future.

With the sockaddr command you can experiment with getting the right template syntax with the eval sub-command, for instance:

$ sockaddr eval 'GetAllInterfaces | include "network" "10.99.0.0/24" | sort "size,address" | attr "address"'
10.99.0.5
$ sockaddr eval 'GetInterfaceIP "eth0"'
10.99.0.5

There is now a configurable template language for examples and docs) behind this that you can use to create a customizable heuristic that should allow you to get whatever it is that you need from your environment when using an immutable image (see hashicorp/go-sockaddr/template and cmd/sockaddr.

If this doesn't solve your issue, let us know.

@shakisha
Copy link
Author

shakisha commented Dec 17, 2016

Hello @sean- and @slackpad ;

thanks for your answers. I explain better my situation so we can make a big step next to enhance this consul version;
@sean- of course, i've seen your reply just yesterday; can you confirm that i can test directly with the 0.7.2 rc?

Actually i have got two datacenters:

DC1 datacenter with only a single consul, wan connected machine.
DC2 datacenter with a 6 consul, lan connected machines and every one with a public ip address.

basically, from DC1 datacenter, i want query the DC2 datacenter to have got a list of pubblic ip addresses of all the DC2 machines (to use it with consul-template, another project from you which i love).

So when running this command on the single consul machine of DC1:

curl http://localhost:8500/v1/catalog/nodes?dc=DC1

or

curl http://localhost:8500/v1/catalog/nodes?dc=DC2

i get in BOTH CASES

rpc error: failed to get conn: dial tcp IP_ADDRESS_OF_THE_NODE_ON_DC2
:8300: getsockopt: connection refused

these are the configurations that i used:
(@sean- i'm sorry if i didn't test your way like
GetInterfaceIP "eth0"
but i don't know the exact syntax to put it inside configuration file; i don't like to use the command line :-) )

CONFIGURATION OF NODE ON DC1

{
    "server": true,
    "datacenter": "DC1",
    "data_dir": "/var/consul",
    "log_level": "WARN",
    "enable_syslog": false,
    "encrypt": "secretkey",
    "bind_addr": "private eth1",
    "advertise_addr_wan": "public ip of eth0",
    "retry_join_wan": ["public ip of node2 on dc2", "public ip of node3 on dc2", "public ip of node4 on dc2"]
}

CONFIGURATION OF NODE ON DC2

{
    "bootstrap_expect": 3,
    "server": true,
    "datacenter": "DC2",
    "data_dir": "/var/consul",
    "log_level": "WARN",
    "bind_addr": "private eth1",
    "serf_wan_bind": "public ip of eth0",
    "advertise_addr_wan": "public ip of eth0",
    "retry_join": ["private lan ip of node3", "private lan ip of node4"],
    "retry_join_wan": ["public ip of node1 on dc2", "public ip of node3 on dc2"],
    "encrypt": "secretkey",
}

@sean-
Copy link
Contributor

sean- commented Dec 22, 2016

@shakisha Try with the following config snippets:

CONFIGURATION OF NODE ON DC1

{
    "server": true,
    "datacenter": "DC1",
    "data_dir": "/var/consul",
    "log_level": "WARN",
    "enable_syslog": false,
    "encrypt": "secretkey",
    "bind_addr": "{{ GetInterfaceIP \"eth1\" }}",
    "advertise_addr_wan": "{{ GetInterfaceIP \"eth0\" }}",
    "retry_join_wan": ["public ip of node2 on dc2", "public ip of node3 on dc2", "public ip of node4 on dc2"]
}

You will have to plug in the IP/DNS addresses for retry_join_wan. It would be more concise if you didn't want to pick out an IP address from a particular interface name because then you could use something like GetPublicIP or GetPublicInterfaces, or GetPrivateIP or GetPrivateInterfaces. The same type or style of configuration could be used for the second configuration block.

If you want to test and experiment via the CLI, you can via:

$ go get -u github.com/hashicorp/go-sockaddr/cmd/sockaddr
$ sockaddr eval 'GetInterfaceIP "eth0"'
$ sockaddr eval 'GetAllInterfaces | include "name" "eth0" | sort "type,size" | include "RFC" "6890" | include "flags" "up|forwardable" | attr "address"'

Just be sure to re-escape the double quotes before injecting the working template back into your Consul config. Keep us posted!

@shakisha
Copy link
Author

shakisha commented Dec 22, 2016

@sean- doing now 👍

@shakisha
Copy link
Author

Got

==> Error starting agent: Advertise WAN address resolution failed: Unable to parse address template "{{ GetInterfaceIP \"eth0\" | sort \"type,size\" | exclude \"RFC\" \"6890\" | include \"flags\" \"up|forwardable\" | attr \"address\" }}": unable to execute sockaddr input "{{ GetInterfaceIP \"eth0\" | sort \"type,size\" | exclude \"RFC\" \"6890\" | include \"flags\" \"up|forwardable\" | attr \"address\" }}": template: sockaddr.Parse:1:32: executing "sockaddr.Parse" at <"type,size">: wrong type for value; expected sockaddr.IfAddrs; got string

@shakisha
Copy link
Author

@sean- and if i push in this way i have got:

 "bind_addr": "{{ GetInterfaceIP "eth1"| sort "type,size" | include "RFC" "6890" | include "flags" "up|forwardable" | attr "address" }}",
 "advertise_addr_wan": "{{ GetInterfaceIP "eth0" | sort "type,size" | exclude "RFC" "6890" | include "flags" "up|forwardable" | attr "address" }}",

Error decoding '/etc/consul/conf/config.json': invalid character 'e' after object key:value pair

@sean-
Copy link
Contributor

sean- commented Dec 26, 2016

Comment updated. I think I copy/pasted the wrong set of examples in. The following two should work and be very close to the same in their result.

$ sockaddr eval 'GetInterfaceIP "eth0"'
$ sockaddr eval 'GetAllInterfaces | include "name" "eth0" | sort "type,size" | include "RFC" "6890" | include "flags" "up|forwardable" | attr "address"'

@shakisha
Copy link
Author

shakisha commented Dec 26, 2016

@sean- should i put these inside the configuration file?

$ sockaddr eval 'GetInterfaceIP "eth0"'
$ sockaddr eval 'GetAllInterfaces | include "name" "eth0" | sort "type,size" | include "RFC" "6890" | include "flags" "up|forwardable" | attr "address"'

Like

    "bind_addr":  "sockaddr eval 'GetInterfaceIP "eth0"'",
    "advertise_addr_wan": "sockaddr eval 'GetAllInterfaces | include "name" "eth0" | sort "type,size" | include "RFC" "6890" | include "flags" "up|forwardable" | attr "address"'"

?

@sean-
Copy link
Contributor

sean- commented Dec 30, 2016

Not quite, sorry. A few points (being explicit for future readers):

  • The sockaddr eval ... command is there merely for experimentation.
  • When you find the right template that works for sockaddr eval you can drop it into the actual Consul config.
  • I would use either the GetInterfaceIP "eth0" or I would use the fully-spelled out pipe-chain and wouldn't mix and match them.
  • It is still necessary to wrap the configuration value in the text/template delimiters of {{ and }}, respectively, in order to preserve backwards compatibility.

So for instance, you can pick one of the two following config blocks:

  "bind_addr":  "{{ GetInterfaceIP \"eth0\" }}'",
  "advertise_addr_wan": "{{ GetInterfaceIP \"eth0\" }}'"

or:

  "bind_addr":  "{{ GetAllInterfaces | include \"name\" \"eth0\" | sort \"type,size\" | include \"RFC\" \"6890\" | include \"flags\" \"up|forwardable\" | attr \"address\" }}'",
  "advertise_addr_wan": "{{ GetAllInterfaces | include \"name\" \"eth0\" | sort \"type,size\" | include \"RFC\" \"6890\" | include \"flags\" \"up|forwardable\" | attr \"address\" }}'"

Also, it isn't necessary to specify both bind_addr and advertise_addr_wan because both advertise_addr_lan and advertise_addr_wan will default to the specified bind_addr. I think the simplified config block that you want is to just specify the advertise_addr using one of the two config bits above and leave it at that. No need to make it any more complicated than it already is. Setting both bind_addr and advertise_addr is still required and will be through the rest of the 0.7.X series, but expect changes before 0.8.0 is released.

// Try to get an advertise address
if config.AdvertiseAddr != "" {
ipStr, err := parseSingleIPTemplate(config.AdvertiseAddr)
if err != nil {
return nil, fmt.Errorf("Advertise address resolution failed: %v", err)
}
config.AdvertiseAddr = ipStr
if ip := net.ParseIP(config.AdvertiseAddr); ip == nil {
return nil, fmt.Errorf("Failed to parse advertise address: %v", config.AdvertiseAddr)
}
} else if config.BindAddr != "0.0.0.0" && config.BindAddr != "" && config.BindAddr != "[::]" {
config.AdvertiseAddr = config.BindAddr

@shakisha
Copy link
Author

shakisha commented Feb 9, 2017

Thanks @sean-
ok, two issues now;

  1. The "{{ GetInterfaceIP "eth0" }}'" takes a strange ip address (10.9.2.0) which is not my eth0 ip.

  2. from the consul wan machine, ran

curl http://localhost:8500/v1/catalog/nodes?dc=de01

and i have got

rpc error: failed to get conn: dial tcp "an ip address of a consul wan machine of remote dc" :8300: getsockopt: connection refused

on iptables i have got these rules:

#CONSUL
-A INPUT -i eth1 -p tcp -m tcp --dport 8300 -j ACCEPT
-A INPUT -i eth1 -p tcp -m tcp --dport 8301 -j ACCEPT
-A INPUT -i eth1 -p udp -m udp --dport 8301 -j ACCEPT
-A INPUT -i eth1 -p tcp -m tcp --dport 8400 -j ACCEPT
-A INPUT -i eth1 -p tcp -m tcp --dport 8500 -j ACCEPT
-A INPUT -i eth1 -p tcp -m tcp --dport 8600 -j ACCEPT
-A INPUT -i eth1 -p udp -m udp --dport 8600 -j ACCEPT

#CONSUL WAN 
-A INPUT -i eth0 -p tcp -m tcp --dport 8300 -j ACCEPT
-A INPUT -i eth0 -p tcp -m tcp --dport 8302 -j ACCEPT
-A INPUT -i eth0 -p udp -m udp --dport 8302 -j ACCEPT

@shakisha
Copy link
Author

shakisha commented Feb 9, 2017

Basically it seems that consul doesn't listen on port 8300 for the requests from wan machine.
I cannot find a solution.

@slackpad have you got an idea?

@shakisha
Copy link
Author

shakisha commented Feb 9, 2017

still debugging here;

basically the wan requestes works if i set "bind" address to wan port only.
if i set "bind" to 0.0.0.0 consul 0.7.4 neither starts, with error :

"Error starting agent: Failed to get advertise address: Multiple private IPs found. Please configure one"

I have tried also with
"serf_wan_bind": "ip of eth0",
"serf_lan_bind": "ip of eth1",
and nothing, still the same error.

The question is:
if i have got "bind" to private lan interface,
and "serf_wan_bind" and "advertise_addr_wan" correctly set to public ip address,
why consul members -wan works
and

curl http://localhost:8500/v1/catalog/nodes?dc=de01

gives me connection refused?

@shakisha
Copy link
Author

shakisha commented Feb 9, 2017

ok, i have identified the issue and got a workaround;

if i set
"bind" to PRIVATE INTERFACE (LAN)
"serf_wan_bind": "PUBLIC INTERFACE",
"serf_lan_bind": "PRIVATE INTERFACE",
NOTHING WORKS

but if i set
"bind" to PUBLIC INTERFACE (WAN)
"serf_wan_bind": "PUBLIC INTERFACE",
"serf_lan_bind": "PRIVATE INTERFACE",

everything works perfectly ( i have at the moment a lot of Refuting a suspect message) but seems everything works.

basically the RPC forwarding is not working as expected.

-serf-wan-bind doesn't allow queries from other consul wan on port 8300

Why this issue? @slackpad is this a bug?

@shakisha
Copy link
Author

shakisha commented Feb 10, 2017

my odissea with this issue continue....

i need to review my past comment, after some test, it i set
"bind" to PUBLIC INTERFACE (WAN)
"serf_wan_bind": "PUBLIC INTERFACE",
"serf_lan_bind": "PRIVATE INTERFACE",

the consul wan machine works, but election or local members will not work :-(

the configuration file now has got the following values:

"bind_addr": "PUBLIC IP",
"advertise_addr": "PRIVATE IP",
"serf_lan_bind": "PRIVATE IP",
"serf_wan_bind": "PUBLIC IP",
"advertise_addr_wan": "PUBLIC IP",
"translate_wan_addrs": true,

but election doesn't happen in this way.

@slackpad slackpad removed this from the Triaged milestone Apr 18, 2017
@slackpad slackpad added the theme/operator-usability Replaces UX. Anything related to making things easier for the practitioner label May 25, 2017
@hanshasselberg
Copy link
Member

Hello, we have much better docs these days: https://learn.hashicorp.com/consul/security-networking/datacenters and I hope they solve your problem. If thats not the case feel free to open a new issue. Thanks for reporting!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/operator-usability Replaces UX. Anything related to making things easier for the practitioner type/bug Feature does not function as expected
Projects
None yet
Development

No branches or pull requests

4 participants