Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ambiguity in docs regarding advertise, addresses configuration parameters and multi-datacenter deployments #304

Closed
F21 opened this issue Oct 19, 2015 · 21 comments
Labels
theme/docs Documentation issues and enhancements

Comments

@F21
Copy link

F21 commented Oct 19, 2015

I am interested in setting up a multi-datacenter cluster. The documentation for configuring the advertise and addresses options are ambiguous.

In particular:

  • In the addresses section, it says that the rpc address should be accessible to all agents, but should only be accessible to cluster members. This sounds a bit contradictory. Should the rpc address be accessible to only agents within the datacenter the server is in?
  • In the addresses section the serf address should only be exposed to other servers within the datacenter. But, it also says that this address is used for gossiping, which according to the architecture is supposed to be used to talk to other servers in other data centers and regions.
  • In the advertise section, the rpc address should be accessible by all agents. Is this within a datacenter, a region or every single agent across all regions and data centers?
  • In the advertise section, the serf address should be reachable from all servers (I am assuming across all regions and data centers), but it contradicts the serf parameter within the addresses section which says it should only be reached by servers within their own data center.

In regards to multi-datacenter requirements, there's also some things that are not clear:

  • In the High-Level Overview section of the Architecture docs, this image does not show which data centers the servers reside in.
  • I understand that there are no regions in consul, so in consul's case, it's recommended to have multiple data centers and a quorum of servers within each data center. In nomad's case, the architecture docs states that there should be a quorum (around 3 or 5 servers) per region. It does not state how many servers we should have in each data center. If we have 3 servers split across 3 data centers, if a server dies, the whole data center would be out of action. If we have 5 servers, then there's still 1 data center that is vulnerable. I think a specific deployment example for a multi-data center deployment within the docs would be immensely useful.
@dadgar dadgar added the theme/docs Documentation issues and enhancements label Oct 19, 2015
@F21
Copy link
Author

F21 commented Oct 19, 2015

If someone from the nomad team could answer those questions in the mean time, that would be awesome! 😄 I am keen to build a test virtualized cluster simulating a multi-data center deployment.

@cbednarski
Copy link
Contributor

@F21 We can certainly make these docs more clear. To clarify a few things:

Terminology

  • A region is a cluster of Nomad server agents, which participate in consensus (via raft) and membership (via serf) for high availability.
  • A datacenter represents something like an AWS availability zone, a physical datacenter location, or some other fault domain or boundary.

Multi-datacenter

The Nomad server cluster operates only at the region level. A single region deployment can manage scheduling for multiple datacenters, and unlike Consul you do not deploy Nomad per datacenter.

We expect most organizations will only ever need to run one Nomad cluster worldwide (essentially a "global" region), which will simplify deployment. You may choose to create multiple regions for business or regulatory reasons, or for clusters larger than tens of thousands of nodes.

Network rules inside a region are as follows:

  • Client agents in a datacenter must be able to communicate over RPC to server agents in the same region
  • Server agents in a region must be able to communicate over RPC to client agents in each datacenter in the same region.
  • Server agents in the same region must be able to communicate over serf (for raft consistency)
  • Server agents in the same region must be able to communicate over RPC (so API calls can be forwarded to the leader server)
  • Agents in different datacenters do not need to communicate with eachother
  • Only the servers (typically 3 or 5 for quorum) participate in serf. Agent machines do not communicate over this protocol.

Beyond that, datacenter is just a label on client nodes that is used for job placement.

Multi-region

If your total fleet is very large (upwards of tens of thousands of machines) you may need to shard into multiple regions. In this case, each region will be represented by a cluster of 3-5 Nomad servers. These servers communicate with each other over serf, both intra-regionally and inter-regionally.

Additionally, servers in different regions will need to access each other's RPC address in order for forward jobs to another region.

These inter-region links are optional. You could have completely isolated regions, but you will not be able to forward jobs from Region A to Region B, for example.

Network rules for this for forwarding are as follows:

  • (as above) Server agents in the same region must be able to communicate over serf
  • (as above) Server agents in the same region must be able to communicate over RPC
  • Server agents in different regions must be able to communicate over serf
  • Server agents in different regions must be able to communicate over RPC

Serf (technical details)

Serf operates in LAN and WAN modes. LAN mode is used for (typically) physically local, low-latency gossip pools, while WAN is used for higher latency pools on a global level. There are actually two pools here -- one using LAN mode and another using WAN mode.

If you have 5 regions with 3 Nomad servers each, you will have 5 local serf rings with 3 nodes each, and one global serf ring with 15 nodes.

@F21
Copy link
Author

F21 commented Oct 21, 2015

@cbednarski Thanks! That was very useful.

Another question about multi-region deployments: There doesn't appear to be any setting to have a list of server ips for servers to join to. Do servers automatically discover each other and join (as long as they can reach each other)?

@dadgar
Copy link
Contributor

dadgar commented Oct 22, 2015

@F21: They do not automatically discover each other. You can join servers in two ways. You can use the nomad server-join [list of servers] or you can hit the HTTP API

@F21
Copy link
Author

F21 commented Oct 22, 2015

@dadgar Is this also something I can set in a configuration file?

@dadgar
Copy link
Contributor

dadgar commented Oct 22, 2015

It is not something that can be set in the configuration file. The reason is in most cases the IP to join is not known when writing the static config file. You can see how we over-come this when boot-strapping a cluster in this terraform file. The file launches three nomad servers and then calls the HTTP API endpoint the dynamic IPs of the other servers.

@F21
Copy link
Author

F21 commented Oct 22, 2015

Probably getting a bit off-topic here, but wouldn't it be possible to have something like consul's start_join where you give it a list of known IPs in the config that it will attempt to contact and join when it starts?

@adrianlop
Copy link
Contributor

+1, I agree with @F21, it's very easy to automate the cluster installation & startup with start_join in the config file.

@c4milo
Copy link
Contributor

c4milo commented Dec 11, 2015

@F21, @poll0rz this PR recently landed in master which implements retry-join logic: #527

@c4milo
Copy link
Contributor

c4milo commented Jan 5, 2016

@cbednarski, @dadgar I have another question, in Consul we have two gossip pools, LAN and WAN. In order to join servers in a WAN pool we use the consul join -wan <server1> flag. However, I don't see such flag in Nomad. How do we create the WAN pool such that we can schedule jobs on multiple regions with one nomad run command?

@pires
Copy link

pires commented Jan 8, 2016

@cbednarski that's a great explanation. I suggest adding it to the documentation.

@cigdono
Copy link

cigdono commented May 2, 2016

I'm interested in getting a multi-region nomad cluster setup for a POC. I have followed the steps outlined in the documentation and in this post above. Once I have stood everything up I run the "nomad server-members" command and I see all my servers listed as being in the appropriate regions.

When I try submitting a job destined for region2 via the HTTP API to a server in region1 the job does not get forwarded to a server in region2. A scheduler in region1 fails to place the allocation and my evaluation ends up blocked. Your documentation implies that this should work. I have also looked at the code and there appears to be forwardRegion logic that looks like it should do the trick.

Can someone please document a minimal multi-region deployment?

@xiang
Copy link

xiang commented Dec 21, 2016

I'm seeing the same behavior described above. Any more info on this?

@dadgar
Copy link
Contributor

dadgar commented Jan 3, 2017

@xiang Can you expand on the issue you are facing. The issue immediately above yours was fixed as it was an issue in the CLI.

@flyinprogrammer
Copy link
Contributor

flyinprogrammer commented Sep 5, 2018

We expect most organizations will only ever need to run one Nomad cluster worldwide (essentially a "global" region), which will simplify deployment. You may choose to create multiple regions for business or regulatory reasons, or for clusters larger than tens of thousands of nodes.


Serf operates in LAN and WAN modes. LAN mode is used for (typically) physically local, low-latency gossip pools, while WAN is used for higher latency pools on a global level. There are actually two pools here -- one using LAN mode and another using WAN mode.

If you have 5 regions with 3 Nomad servers each, you will have 5 local serf rings with 3 nodes each, and one global serf ring with 15 nodes.


For example, you may have a US region with the us-east-1 and us-west-1 datacenters


As a Nomad operator, and huge fan - these statements seem to contradict each other, on the basis that in my opinion there is nothing "physically local, low-latency" about a truly global deployment.

So are you guys suggesting or implying that most folks do not have a literal global deployment of infrastructure?

Maybe if we quantified what a "physically local, low-latency" network is, that would help?

Is ~65ms latency over a VPN between us-west-1 and us-east-1 in AWS considered "physically local, low-latency"?

What if it's over a direct connect with VPN backup?

What if instead of us-west-1, it's us-west-2 which has ~90ms latency?

Are we then also proposing to only run 1x nomad server per datacenter, where a datacenter is a vpc in each AWS region? What if my region only consists for 2 datacenters, us-east-1 and us-west-1, how will quorum work in the event of a network split, is my "global" infrastructure now down? And what if I have more than 5 datacenters in a region because I'm "multicloud" ?


I realize these questions could come off as being belligerent; however, I assure you that I just want to get on the same page with where we're going with this product and then show proof success.

@shantanugadgil
Copy link
Contributor

Amazing coincidence of this thread.
Only today I realized my folly when I set region to 'us' for my datacenters 'us-east-1' amd 'us-west-2'

All my infra is on private IPs and we have many AWS regions which are vpc peered, though not full mesh, and will not be full mesh.

The fact that the servers having a common region are trying to talk to each other on 4647 hit my deployment plans.

Now I am planning to switch my region names to the AWS region names, so my regions will now be 'us-east-1' instead of just 'us'

Lesson learnt! 👍👍👍

@stale
Copy link

stale bot commented May 10, 2019

Hey there

Since this issue hasn't had any activity in a while - we're going to automatically close it in 30 days. If you're still seeing this issue with the latest version of Nomad, please respond here and we'll keep this open and take another look at this.

Thanks!

@flyinprogrammer
Copy link
Contributor

this is still an issue.

@shantanugadgil
Copy link
Contributor

Speaking for myself:

I see a few issues with this issue:

  • This issue is about 3 years old and more of a "documentation clarification" issue.
  • Many default things have changed since 3 years ago (I don't know which all specific confusions are still relevant)
  • Things like cloud-auto-join are a feature now, this issue has a mix of questions.

In my opinion, this issue should be laid to rest and maybe open a PR for the doc changes, or create a new one as to what the confusion/missing in the doc is.

Thoughts?

@nickethier
Copy link
Member

After reading through this issue for the first time I would agree with @shantanugadgil. Initially there was a pretty broad scope to the questions this issue raised and further discussion seemed to have answered some of these but others were raised.

I'm going to close this one. If anyone in this thread feels there is still some part of this issue that has not been resolved please open a new one. Thanks!

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
theme/docs Documentation issues and enhancements
Projects
None yet
Development

No branches or pull requests