Ambiguity in docs regarding advertise, addresses configuration parameters and multi-datacenter deployments #304

F21 · 2015-10-19T05:29:41Z

I am interested in setting up a multi-datacenter cluster. The documentation for configuring the advertise and addresses options are ambiguous.

In particular:

In the addresses section, it says that the rpc address should be accessible to all agents, but should only be accessible to cluster members. This sounds a bit contradictory. Should the rpc address be accessible to only agents within the datacenter the server is in?
In the addresses section the serf address should only be exposed to other servers within the datacenter. But, it also says that this address is used for gossiping, which according to the architecture is supposed to be used to talk to other servers in other data centers and regions.
In the advertise section, the rpc address should be accessible by all agents. Is this within a datacenter, a region or every single agent across all regions and data centers?
In the advertise section, the serf address should be reachable from all servers (I am assuming across all regions and data centers), but it contradicts the serf parameter within the addresses section which says it should only be reached by servers within their own data center.

In regards to multi-datacenter requirements, there's also some things that are not clear:

In the High-Level Overview section of the Architecture docs, this image does not show which data centers the servers reside in.
I understand that there are no regions in consul, so in consul's case, it's recommended to have multiple data centers and a quorum of servers within each data center. In nomad's case, the architecture docs states that there should be a quorum (around 3 or 5 servers) per region. It does not state how many servers we should have in each data center. If we have 3 servers split across 3 data centers, if a server dies, the whole data center would be out of action. If we have 5 servers, then there's still 1 data center that is vulnerable. I think a specific deployment example for a multi-data center deployment within the docs would be immensely useful.

The text was updated successfully, but these errors were encountered:

F21 · 2015-10-19T21:12:15Z

If someone from the nomad team could answer those questions in the mean time, that would be awesome! 😄 I am keen to build a test virtualized cluster simulating a multi-data center deployment.

cbednarski · 2015-10-19T23:15:56Z

@F21 We can certainly make these docs more clear. To clarify a few things:

Terminology

A region is a cluster of Nomad server agents, which participate in consensus (via raft) and membership (via serf) for high availability.
A datacenter represents something like an AWS availability zone, a physical datacenter location, or some other fault domain or boundary.

Multi-datacenter

The Nomad server cluster operates only at the region level. A single region deployment can manage scheduling for multiple datacenters, and unlike Consul you do not deploy Nomad per datacenter.

We expect most organizations will only ever need to run one Nomad cluster worldwide (essentially a "global" region), which will simplify deployment. You may choose to create multiple regions for business or regulatory reasons, or for clusters larger than tens of thousands of nodes.

Network rules inside a region are as follows:

Client agents in a datacenter must be able to communicate over RPC to server agents in the same region
Server agents in a region must be able to communicate over RPC to client agents in each datacenter in the same region.
Server agents in the same region must be able to communicate over serf (for raft consistency)
Server agents in the same region must be able to communicate over RPC (so API calls can be forwarded to the leader server)
Agents in different datacenters do not need to communicate with eachother
Only the servers (typically 3 or 5 for quorum) participate in serf. Agent machines do not communicate over this protocol.

Beyond that, datacenter is just a label on client nodes that is used for job placement.

Multi-region

If your total fleet is very large (upwards of tens of thousands of machines) you may need to shard into multiple regions. In this case, each region will be represented by a cluster of 3-5 Nomad servers. These servers communicate with each other over serf, both intra-regionally and inter-regionally.

Additionally, servers in different regions will need to access each other's RPC address in order for forward jobs to another region.

These inter-region links are optional. You could have completely isolated regions, but you will not be able to forward jobs from Region A to Region B, for example.

Network rules for this for forwarding are as follows:

(as above) Server agents in the same region must be able to communicate over serf
(as above) Server agents in the same region must be able to communicate over RPC
Server agents in different regions must be able to communicate over serf
Server agents in different regions must be able to communicate over RPC

Serf (technical details)

Serf operates in LAN and WAN modes. LAN mode is used for (typically) physically local, low-latency gossip pools, while WAN is used for higher latency pools on a global level. There are actually two pools here -- one using LAN mode and another using WAN mode.

If you have 5 regions with 3 Nomad servers each, you will have 5 local serf rings with 3 nodes each, and one global serf ring with 15 nodes.

F21 · 2015-10-21T00:37:27Z

@cbednarski Thanks! That was very useful.

Another question about multi-region deployments: There doesn't appear to be any setting to have a list of server ips for servers to join to. Do servers automatically discover each other and join (as long as they can reach each other)?

dadgar · 2015-10-22T00:06:03Z

@F21: They do not automatically discover each other. You can join servers in two ways. You can use the nomad server-join [list of servers] or you can hit the HTTP API

F21 · 2015-10-22T00:13:34Z

@dadgar Is this also something I can set in a configuration file?

dadgar · 2015-10-22T00:49:41Z

It is not something that can be set in the configuration file. The reason is in most cases the IP to join is not known when writing the static config file. You can see how we over-come this when boot-strapping a cluster in this terraform file. The file launches three nomad servers and then calls the HTTP API endpoint the dynamic IPs of the other servers.

F21 · 2015-10-22T01:20:13Z

Probably getting a bit off-topic here, but wouldn't it be possible to have something like consul's start_join where you give it a list of known IPs in the config that it will attempt to contact and join when it starts?

adrianlop · 2015-11-27T10:00:46Z

+1, I agree with @F21, it's very easy to automate the cluster installation & startup with start_join in the config file.

c4milo · 2015-12-11T20:09:11Z

@F21, @poll0rz this PR recently landed in master which implements retry-join logic: #527

c4milo · 2016-01-05T02:27:30Z

@cbednarski, @dadgar I have another question, in Consul we have two gossip pools, LAN and WAN. In order to join servers in a WAN pool we use the consul join -wan <server1> flag. However, I don't see such flag in Nomad. How do we create the WAN pool such that we can schedule jobs on multiple regions with one nomad run command?

pires · 2016-01-08T10:43:29Z

@cbednarski that's a great explanation. I suggest adding it to the documentation.

cigdono · 2016-05-02T19:19:04Z

I'm interested in getting a multi-region nomad cluster setup for a POC. I have followed the steps outlined in the documentation and in this post above. Once I have stood everything up I run the "nomad server-members" command and I see all my servers listed as being in the appropriate regions.

When I try submitting a job destined for region2 via the HTTP API to a server in region1 the job does not get forwarded to a server in region2. A scheduler in region1 fails to place the allocation and my evaluation ends up blocked. Your documentation implies that this should work. I have also looked at the code and there appears to be forwardRegion logic that looks like it should do the trick.

Can someone please document a minimal multi-region deployment?

xiang · 2016-12-21T19:57:47Z

I'm seeing the same behavior described above. Any more info on this?

dadgar · 2017-01-03T19:27:00Z

@xiang Can you expand on the issue you are facing. The issue immediately above yours was fixed as it was an issue in the CLI.

flyinprogrammer · 2018-09-05T13:55:08Z

We expect most organizations will only ever need to run one Nomad cluster worldwide (essentially a "global" region), which will simplify deployment. You may choose to create multiple regions for business or regulatory reasons, or for clusters larger than tens of thousands of nodes.

Serf operates in LAN and WAN modes. LAN mode is used for (typically) physically local, low-latency gossip pools, while WAN is used for higher latency pools on a global level. There are actually two pools here -- one using LAN mode and another using WAN mode.

If you have 5 regions with 3 Nomad servers each, you will have 5 local serf rings with 3 nodes each, and one global serf ring with 15 nodes.

For example, you may have a US region with the us-east-1 and us-west-1 datacenters

As a Nomad operator, and huge fan - these statements seem to contradict each other, on the basis that in my opinion there is nothing "physically local, low-latency" about a truly global deployment.

So are you guys suggesting or implying that most folks do not have a literal global deployment of infrastructure?

Maybe if we quantified what a "physically local, low-latency" network is, that would help?

Is ~65ms latency over a VPN between us-west-1 and us-east-1 in AWS considered "physically local, low-latency"?

What if it's over a direct connect with VPN backup?

What if instead of us-west-1, it's us-west-2 which has ~90ms latency?

Are we then also proposing to only run 1x nomad server per datacenter, where a datacenter is a vpc in each AWS region? What if my region only consists for 2 datacenters, us-east-1 and us-west-1, how will quorum work in the event of a network split, is my "global" infrastructure now down? And what if I have more than 5 datacenters in a region because I'm "multicloud" ?

I realize these questions could come off as being belligerent; however, I assure you that I just want to get on the same page with where we're going with this product and then show proof success.

shantanugadgil · 2018-09-05T16:37:48Z

Amazing coincidence of this thread.
Only today I realized my folly when I set region to 'us' for my datacenters 'us-east-1' amd 'us-west-2'

All my infra is on private IPs and we have many AWS regions which are vpc peered, though not full mesh, and will not be full mesh.

The fact that the servers having a common region are trying to talk to each other on 4647 hit my deployment plans.

Now I am planning to switch my region names to the AWS region names, so my regions will now be 'us-east-1' instead of just 'us'

Lesson learnt! 👍👍👍

stale · 2019-05-10T15:01:04Z

Hey there

Since this issue hasn't had any activity in a while - we're going to automatically close it in 30 days. If you're still seeing this issue with the latest version of Nomad, please respond here and we'll keep this open and take another look at this.

Thanks!

flyinprogrammer · 2019-05-12T03:25:15Z

this is still an issue.

shantanugadgil · 2019-05-12T06:04:37Z

Speaking for myself:

I see a few issues with this issue:

This issue is about 3 years old and more of a "documentation clarification" issue.
Many default things have changed since 3 years ago (I don't know which all specific confusions are still relevant)
Things like cloud-auto-join are a feature now, this issue has a mix of questions.

In my opinion, this issue should be laid to rest and maybe open a PR for the doc changes, or create a new one as to what the confusion/missing in the doc is.

Thoughts?

nickethier · 2019-05-13T13:31:49Z

After reading through this issue for the first time I would agree with @shantanugadgil. Initially there was a pretty broad scope to the questions this issue raised and further discussion seemed to have answered some of these but others were raised.

I'm going to close this one. If anyone in this thread feels there is still some part of this issue that has not been resolved please open a new one. Thanks!

github-actions · 2022-11-23T02:22:48Z

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

dadgar added the theme/docs Documentation issues and enhancements label Oct 19, 2015

BSick7 mentioned this issue Oct 30, 2015

Improve documentation for setting up a server cluster and keeping it healthy #367

Closed

jshaw86 mentioned this issue Jul 28, 2016

Support separate RPC interfaces for multiple regions #1484

Open

This was referenced Feb 26, 2018

Improve documentation on networking / port requirements #3906

Closed

Unclear docs hashicorp/consul#3921

Closed

stale bot added the stage/waiting-reply label May 10, 2019

stale bot removed the stage/waiting-reply label May 12, 2019

nickethier closed this as completed May 13, 2019

flyinprogrammer mentioned this issue May 14, 2019

Ambiguity in docs & multi-datacenter deployments #5697

Closed

github-actions bot locked as resolved and limited conversation to collaborators Nov 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ambiguity in docs regarding advertise, addresses configuration parameters and multi-datacenter deployments #304

Ambiguity in docs regarding advertise, addresses configuration parameters and multi-datacenter deployments #304

F21 commented Oct 19, 2015

F21 commented Oct 19, 2015

cbednarski commented Oct 19, 2015

F21 commented Oct 21, 2015

dadgar commented Oct 22, 2015

F21 commented Oct 22, 2015

dadgar commented Oct 22, 2015

F21 commented Oct 22, 2015

adrianlop commented Nov 27, 2015

c4milo commented Dec 11, 2015

c4milo commented Jan 5, 2016

pires commented Jan 8, 2016

cigdono commented May 2, 2016

xiang commented Dec 21, 2016

dadgar commented Jan 3, 2017

flyinprogrammer commented Sep 5, 2018 •

edited

Loading

shantanugadgil commented Sep 5, 2018

stale bot commented May 10, 2019

flyinprogrammer commented May 12, 2019

shantanugadgil commented May 12, 2019

nickethier commented May 13, 2019

github-actions bot commented Nov 23, 2022

Ambiguity in docs regarding advertise, addresses configuration parameters and multi-datacenter deployments #304

Ambiguity in docs regarding advertise, addresses configuration parameters and multi-datacenter deployments #304

Comments

F21 commented Oct 19, 2015

F21 commented Oct 19, 2015

cbednarski commented Oct 19, 2015

Terminology

Multi-datacenter

Multi-region

Serf (technical details)

F21 commented Oct 21, 2015

dadgar commented Oct 22, 2015

F21 commented Oct 22, 2015

dadgar commented Oct 22, 2015

F21 commented Oct 22, 2015

adrianlop commented Nov 27, 2015

c4milo commented Dec 11, 2015

c4milo commented Jan 5, 2016

pires commented Jan 8, 2016

cigdono commented May 2, 2016

xiang commented Dec 21, 2016

dadgar commented Jan 3, 2017

flyinprogrammer commented Sep 5, 2018 • edited Loading

shantanugadgil commented Sep 5, 2018

stale bot commented May 10, 2019

flyinprogrammer commented May 12, 2019

shantanugadgil commented May 12, 2019

nickethier commented May 13, 2019

github-actions bot commented Nov 23, 2022

flyinprogrammer commented Sep 5, 2018 •

edited

Loading