Skip to content

Commit

Permalink
Remove todos
Browse files Browse the repository at this point in the history
  • Loading branch information
dadgar committed Feb 10, 2017
1 parent 17ea89d commit 0c67c8b
Show file tree
Hide file tree
Showing 8 changed files with 308 additions and 5 deletions.
3 changes: 1 addition & 2 deletions nomad/server.go
Original file line number Diff line number Diff line change
Expand Up @@ -1058,7 +1058,6 @@ func (s *Server) GetConfig() *Config {
return s.config
}

// TODO(alex) we need a outage guide
// peersInfoContent is used to help operators understand what happened to the
// peers.json file. This is written to a file called peers.info in the same
// location.
Expand All @@ -1082,5 +1081,5 @@ creating the peers.json file, and that all servers receive the same
configuration. Once the peers.json file is successfully ingested and applied, it
will be deleted.
Please see https://www.consul.io/docs/guides/outage.html for more information.
Please see https://www.nomadproject.io/guides/outage.html for more information.
`
4 changes: 2 additions & 2 deletions website/source/docs/commands/operator-index.html.md.erb
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,9 @@ as interacting with the Raft subsystem. This was added in Nomad 0.5.5.
~> Use this command with extreme caution, as improper use could lead to a Nomad
outage and even loss of data.

See the [Outage Recovery](TODO alexdadgar) guide for some examples of how
See the [Outage Recovery](/guides/outage.html) guide for some examples of how
this command is used. For an API to perform these operations programatically,
please see the documentation for the [Operator](/docs/agent/http/operator.html)
please see the documentation for the [Operator](/guides/outage.html)
endpoint.

## Usage
Expand Down
2 changes: 1 addition & 1 deletion website/source/docs/http/operator.html.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ as interacting with the Raft subsystem. This was added in Nomad 0.5.5
~> Use this interface with extreme caution, as improper use could lead to a
Nomad outage and even loss of data.

See the [Outage Recovery](/docs/guides/outage.html) guide for some examples of how
See the [Outage Recovery](/guides/outage.html) guide for some examples of how
these capabilities are used. For a CLI to perform these operations manually, please
see the documentation for the [`nomad operator`](/docs/commands/operator-index.html)
command.
Expand Down
116 changes: 116 additions & 0 deletions website/source/guides/cluster/automatic.html.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
---
layout: "guides"
page_title: "Automatically Bootstrapping a Nomad Cluster"
sidebar_current: "guides-cluster-automatic"
description: |-
Learn how to automatically bootstrap a Nomad cluster using Consul. By having
a Consul agent installed on each host, Nomad can automatically discover other
clients and servers to bootstrap the cluster without operator involvement.
---

# Automatic Bootstrapping

To automatically bootstrap a Nomad cluster, we must leverage another HashiCorp
open source tool, [Consul](https://www.consul.io/). Bootstrapping Nomad is
easiest against an existing Consul cluster. The Nomad servers and clients
will become informed of each other's existence when the Consul agent is
installed and configured on each host. As an added benefit, integrating Consul
with Nomad provides service and health check registration for applications which
later run under Nomad.

Consul models infrastructures as datacenters and multiple Consul datacenters can
be connected over the WAN so that clients can discover nodes in other
datacenters. Since Nomad regions can encapsulate many datacenters, we recommend
running a Consul cluster in every Nomad datacenter and connecting them over the
WAN. Please refer to the Consul guide for both
[bootstrapping](https://www.consul.io/docs/guides/bootstrapping.html) a single
datacenter and [connecting multiple Consul clusters over the
WAN](https://www.consul.io/docs/guides/datacenters.html).

If a Consul agent is installed on the host prior to Nomad starting, the Nomad
agent will register with Consul and discover other nodes.

For servers, we must inform the cluster how many servers we expect to have. This
is required to form the initial quorum, since Nomad is unaware of how many peers
to expect. For example, to form a region with three Nomad servers, you would use
the following Nomad configuration file:

```hcl
# /etc/nomad.d/server.hcl
server {
enabled = true
bootstrap_expect = 3
}
```

This configuration would be saved to disk and then run:

```shell
$ nomad agent -config=/etc/nomad.d/server.hcl
```

A similar configuration is available for Nomad clients:

```hcl
# /etc/nomad.d/client.hcl
datacenter = "dc1"
client {
enabled = true
}
```

The agent is started in a similar manner:

```shell
$ nomad agent -config=/etc/nomad.d/client.hcl
```

As you can see, the above configurations include no IP or DNS addresses between
the clients and servers. This is because Nomad detected the existence of Consul
and utilized service discovery to form the cluster.

## Internals

~> This section discusses the internals of the Consul and Nomad integration at a
very high level. Reading is only recommended for those curious to the
implementation.

As discussed in the previous section, Nomad merges multiple configuration files
together, so the `-config` may be specified more than once:

```shell
$ nomad agent -config=base.hcl -config=server.hcl
```

In addition to merging configuration on the command line, Nomad also maintains
its own internal configurations (called "default configs") which include sane
base defaults. One of those default configurations includes a "consul" block,
which specifies sane defaults for connecting to and integrating with Consul. In
essence, this configuration file resembles the following:

```hcl
# You do not need to add this to your configuration file. This is an example
# that is part of Nomad's internal default configuration for Consul integration.
consul {
# The address to the Consul agent.
address = "127.0.0.1:8500"
# The service name to register the server and client with Consul.
server_service_name = "nomad"
client_service_name = "nomad-client"
# Enables automatically registering the services.
auto_advertise = true
# Enabling the server and client to bootstrap using Consul.
server_auto_join = true
client_auto_join = true
}
```

Please refer to the [Consul
documentation](/docs/agent/configuration/consul.html) for the complete set of
configuration options.
24 changes: 24 additions & 0 deletions website/source/guides/cluster/bootstrapping.html.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
---
layout: "guides"
page_title: "Bootstrapping a Nomad Cluster"
sidebar_current: "guides-cluster-bootstrap"
description: |-
Learn how to bootstrap a Nomad cluster.
---

# Bootstrapping a Nomad Cluster

Nomad models infrastructure into regions and datacenters. Servers reside at the
regional layer and manage all state and scheduling decisions for that region.
Regions contain multiple datacenters, and clients are registered to a single
datacenter (and thus a region that contains that datacenter). For more details on
the architecture of Nomad and how it models infrastructure see the [architecture
page](/docs/internals/architecture.html).

There are two strategies for bootstrapping a Nomad cluster:

1. <a href="/guides/cluster/automatic.html">Automatic bootstrapping</a>
1. <a href="/guides/cluster/manual.html">Manual bootstrapping</a>

Please refer to the specific documentation links above or in the sidebar for
more detailed information about each strategy.
28 changes: 28 additions & 0 deletions website/source/guides/cluster/federation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
---
layout: "guides"
page_title: "Federating a Nomad Cluster"
sidebar_current: "guides-cluster-federation"
description: |-
Learn how to join Nomad servers across multiple regions so users can submit
jobs to any server in any region using global federation.
---

# Federating a Cluster

Because Nomad operates at a regional level, federation is part of Nomad core.
Federation enables users to submit jobs or interact with the HTTP API targeting
any region, from any server, even if that server resides in a different region.

Federating multiple Nomad clusters is as simple as joining servers. From any
server in one region, issue a join command to a server in a remote region:

```shell
$ nomad server-join 1.2.3.4:4648
```

Note that only one join command is required per region. Servers across regions
discover other servers in the cluster via the gossip protocol and hence it's
enough to join just one known server.

If bootstrapped via Consul and the Consul clusters in the Nomad regions are
federated, then federation occurs automatically.
65 changes: 65 additions & 0 deletions website/source/guides/cluster/manual.html.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
---
layout: "guides"
page_title: "Manually Bootstrapping a Nomad Cluster"
sidebar_current: "guides-cluster-manual"
description: |-
Learn how to manually bootstrap a Nomad cluster using the server-join
command. This section also discusses Nomad federation across multiple
datacenters and regions.
---

# Manual Bootstrapping

Manually bootstrapping a Nomad cluster does not rely on additional tooling, but
does require operator participation in the cluster formation process. When
bootstrapping, Nomad servers and clients must be started and informed with the
address of at least one Nomad server.

As you can tell, this creates a chicken-and-egg problem where one server must
first be fully bootstrapped and configured before the remaining servers and
clients can join the cluster. This requirement can add additional provisioning
time as well as ordered dependencies during provisioning.

First, we bootstrap a single Nomad server and capture its IP address. After we
have that nodes IP address, we place this address in the configuration.

For Nomad servers, this configuration may look something like this:

```hcl
server {
enabled = true
bootstrap_expect = 3
# This is the IP address of the first server we provisioned
retry_join = ["<known-address>:4648"]
}
```

Alternatively, the address can be supplied after the servers have all been
started by running the [`server-join` command](/docs/commands/server-join.html)
on the servers individual to cluster the servers. All servers can join just one
other server, and then rely on the gossip protocol to discover the rest.

```
$ nomad server-join <known-address>
```

For Nomad clients, the configuration may look something like:

```hcl
client {
enabled = true
servers = ["<known-address>:4647"]
}
```

At this time, there is no equivalent of the <tt>server-join</tt> command for
Nomad clients.

The port corresponds to the RPC port. If no port is specified with the IP
address, the default RCP port of `4647` is assumed.

As servers are added or removed from the cluster, this information is pushed to
the client. This means only one server must be specified because, after initial
contact, the full set of servers in the client's region are shared with the
client.
71 changes: 71 additions & 0 deletions website/source/guides/cluster/requirements.html.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
---
layout: "guides"
page_title: "Nomad Client and Server Requirements"
sidebar_current: "guides-cluster-requirements"
description: |-
Learn about Nomad client and server requirements such as memory and CPU
recommendations, network topologies, and more.
---

# Cluster Requirements

## Resources (RAM, CPU, etc.)

**Nomad servers** may need to be run on large machine instances. We suggest
having 8+ cores, 32 GB+ of memory, 80 GB+ of disk and significant network
bandwidth. The core count and network recommendations are to ensure high
throughput as Nomad heavily relies on network communication and as the Servers
are managing all the nodes in the region and performing scheduling. The memory
and disk requirements are due to the fact that Nomad stores all state in memory
and will store two snapshots of this data onto disk. Thus disk should be at
least 2 times the memory available to the server when deploying a high load
cluster.

**Nomad clients** support reserving resources on the node that should not be
used by Nomad. This should be used to target a specific resource utilization per
node and to reserve resources for applications running outside of Nomad's
supervision such as Consul and the operating system itself.

Please see the [reservation configuration](/docs/agent/configuration/client.html#reserved) for
more detail.

## Network Topology

**Nomad servers** are expected to have sub 10 millisecond network latencies
between each other to ensure liveness and high throughput scheduling. Nomad
servers can be spread across multiple datacenters if they have low latency
connections between them to achieve high availability.

For example, on AWS every region comprises of multiple zones which have very low
latency links between them, so every zone can be modeled as a Nomad datacenter
and every Zone can have a single Nomad server which could be connected to form a
quorum and a region.

Nomad servers uses Raft for state replication and Raft being highly consistent
needs a quorum of servers to function, therefore we recommend running an odd
number of Nomad servers in a region. Usually running 3-5 servers in a region is
recommended. The cluster can withstand a failure of one server in a cluster of
three servers and two failures in a cluster of five servers. Adding more servers
to the quorum adds more time to replicate state and hence throughput decreases
so we don't recommend having more than seven servers in a region.

**Nomad clients** do not have the same latency requirements as servers since they
are not participating in Raft. Thus clients can have 100+ millisecond latency to
their servers. This allows having a set of Nomad servers that service clients
that can be spread geographically over a continent or even the world in the case
of having a single "global" region and many datacenter.

## Ports Used

Nomad requires 3 different ports to work properly on servers and 2 on clients,
some on TCP, UDP, or both protocols. Below we document the requirements for each
port.

* HTTP API (Default 4646). This is used by clients and servers to serve the HTTP
API. TCP only.

* RPC (Default 4647). This is used by servers and clients to communicate amongst
each other. TCP only.

* Serf WAN (Default 4648). This is used by servers to gossip over the WAN to
other servers. TCP and UDP.

0 comments on commit 0c67c8b

Please sign in to comment.