Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revised NomadProject "Guides" Content Structure #5667

Merged
merged 3 commits into from
May 9, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions website/source/guides/analytical-workloads/index.html.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
---
layout: "guides"
page_title: "Analytical Workloads on Nomad"
sidebar_current: "guides-analytical-workloads"
description: |-
List of data services.
---

# Analytical Workloads on Nomad

Nomad is well-suited for analytical workloads, given its [performance](https://www.hashicorp.com/c1m/) and first-class support for
[batch scheduling](/docs/schedulers.html).

This section provides some best practices and guidance for running analytical workloads on Nomad.

Please navigate the appropriate sub-sections for more information.
Original file line number Diff line number Diff line change
@@ -1,154 +1,153 @@
---
layout: "guides"
page_title: "Apache Spark Integration - Configuration Properties"
sidebar_current: "guides-spark-configuration"
sidebar_current: "guides-analytical-workloads-spark-configuration"
description: |-
Comprehensive list of Spark configuration properties.
---

# Spark Configuration Properties

Spark [configuration properties](https://spark.apache.org/docs/latest/configuration.html#available-properties)
are generally applicable to the Nomad integration. The properties listed below
are specific to running Spark on Nomad. Configuration properties can be set by
Spark [configuration properties](https://spark.apache.org/docs/latest/configuration.html#available-properties)
are generally applicable to the Nomad integration. The properties listed below
are specific to running Spark on Nomad. Configuration properties can be set by
adding `--conf [property]=[value]` to the `spark-submit` command.

- `spark.nomad.authToken` `(string: nil)` - Specifies the secret key of the auth
token to use when accessing the API. This falls back to the NOMAD_TOKEN environment
variable. Note that if this configuration setting is set and the cluster deploy
mode is used, this setting will be propagated to the driver application in the
job spec. If it is not set and an auth token is taken from the NOMAD_TOKEN
environment variable, the token will not be propagated to the driver which will
- `spark.nomad.authToken` `(string: nil)` - Specifies the secret key of the auth
token to use when accessing the API. This falls back to the NOMAD_TOKEN environment
variable. Note that if this configuration setting is set and the cluster deploy
mode is used, this setting will be propagated to the driver application in the
job spec. If it is not set and an auth token is taken from the NOMAD_TOKEN
environment variable, the token will not be propagated to the driver which will
require the driver to pick up its token from an environment variable.

- `spark.nomad.cluster.expectImmediateScheduling` `(bool: false)` - Specifies
that `spark-submit` should fail if Nomad is not able to schedule the job
- `spark.nomad.cluster.expectImmediateScheduling` `(bool: false)` - Specifies
that `spark-submit` should fail if Nomad is not able to schedule the job
immediately.

- `spark.nomad.cluster.monitorUntil` `(string: "submitted"`) - Specifies the
- `spark.nomad.cluster.monitorUntil` `(string: "submitted"`) - Specifies the
length of time that `spark-submit` should monitor a Spark application in cluster
mode. When set to `submitted`, `spark-submit` will return as soon as the
application has been submitted to the Nomad cluster. When set to `scheduled`,
`spark-submit` will return as soon as the Nomad job has been scheduled. When
set to `complete`, `spark-submit` will tail the output from the driver process
mode. When set to `submitted`, `spark-submit` will return as soon as the
application has been submitted to the Nomad cluster. When set to `scheduled`,
`spark-submit` will return as soon as the Nomad job has been scheduled. When
set to `complete`, `spark-submit` will tail the output from the driver process
and return when the job has completed.

- `spark.nomad.datacenters` `(string: dynamic)` - Specifies a comma-separated
list of Nomad datacenters to use. This property defaults to the datacenter of
- `spark.nomad.datacenters` `(string: dynamic)` - Specifies a comma-separated
list of Nomad datacenters to use. This property defaults to the datacenter of
the first Nomad server contacted.

- `spark.nomad.docker.email` `(string: nil)` - Specifies the email address to
use when downloading the Docker image specified by
[spark.nomad.dockerImage](#spark.nomad.dockerImage). See the
[Docker driver authentication](/docs/drivers/docker.html#authentication)
- `spark.nomad.docker.email` `(string: nil)` - Specifies the email address to
use when downloading the Docker image specified by
[spark.nomad.dockerImage](#spark.nomad.dockerImage). See the
[Docker driver authentication](/docs/drivers/docker.html#authentication)
docs for more information.

- `spark.nomad.docker.password` `(string: nil)` - Specifies the password to use
when downloading the Docker image specified by
[spark.nomad.dockerImage](#spark.nomad.dockerImage). See the
[Docker driver authentication](/docs/drivers/docker.html#authentication)
when downloading the Docker image specified by
[spark.nomad.dockerImage](#spark.nomad.dockerImage). See the
[Docker driver authentication](/docs/drivers/docker.html#authentication)
docs for more information.

- `spark.nomad.docker.serverAddress` `(string: nil)` - Specifies the server
address (domain/IP without the protocol) to use when downloading the Docker
image specified by [spark.nomad.dockerImage](#spark.nomad.dockerImage). Docker
Hub is used by default. See the
[Docker driver authentication](/docs/drivers/docker.html#authentication)
- `spark.nomad.docker.serverAddress` `(string: nil)` - Specifies the server
address (domain/IP without the protocol) to use when downloading the Docker
image specified by [spark.nomad.dockerImage](#spark.nomad.dockerImage). Docker
Hub is used by default. See the
[Docker driver authentication](/docs/drivers/docker.html#authentication)
docs for more information.

- `spark.nomad.docker.username` `(string: nil)` - Specifies the username to use
when downloading the Docker image specified by
[spark.nomad.dockerImage](#spark-nomad-dockerImage). See the
[Docker driver authentication](/docs/drivers/docker.html#authentication)
when downloading the Docker image specified by
[spark.nomad.dockerImage](#spark-nomad-dockerImage). See the
[Docker driver authentication](/docs/drivers/docker.html#authentication)
docs for more information.

- `spark.nomad.dockerImage` `(string: nil)` - Specifies the `URL` for the
[Docker image](/docs/drivers/docker.html#image) to
use to run Spark with Nomad's `docker` driver. When not specified, Nomad's
- `spark.nomad.dockerImage` `(string: nil)` - Specifies the `URL` for the
[Docker image](/docs/drivers/docker.html#image) to
use to run Spark with Nomad's `docker` driver. When not specified, Nomad's
`exec` driver will be used instead.

- `spark.nomad.driver.cpu` `(string: "1000")` - Specifies the CPU in MHz that
- `spark.nomad.driver.cpu` `(string: "1000")` - Specifies the CPU in MHz that
should be reserved for driver tasks.

- `spark.nomad.driver.logMaxFileSize` `(string: "1m")` - Specifies the maximum
- `spark.nomad.driver.logMaxFileSize` `(string: "1m")` - Specifies the maximum
size by time that Nomad should use for driver task log files.

- `spark.nomad.driver.logMaxFiles` `(string: "5")` - Specifies the number of log
files that Nomad should keep for driver tasks.

- `spark.nomad.driver.networkMBits` `(string: "1")` - Specifies the network
- `spark.nomad.driver.networkMBits` `(string: "1")` - Specifies the network
bandwidth that Nomad should allocate to driver tasks.

- `spark.nomad.driver.retryAttempts` `(string: "5")` - Specifies the number of
- `spark.nomad.driver.retryAttempts` `(string: "5")` - Specifies the number of
times that Nomad should retry driver task groups upon failure.

- `spark.nomad.driver.retryDelay` `(string: "15s")` - Specifies the length of
- `spark.nomad.driver.retryDelay` `(string: "15s")` - Specifies the length of
time that Nomad should wait before retrying driver task groups upon failure.

- `spark.nomad.driver.retryInterval` `(string: "1d")` - Specifies Nomad's retry
- `spark.nomad.driver.retryInterval` `(string: "1d")` - Specifies Nomad's retry
interval for driver task groups.

- `spark.nomad.executor.cpu` `(string: "1000")` - Specifies the CPU in MHz that
- `spark.nomad.executor.cpu` `(string: "1000")` - Specifies the CPU in MHz that
should be reserved for executor tasks.

- `spark.nomad.executor.logMaxFileSize` `(string: "1m")` - Specifies the maximum
size by time that Nomad should use for executor task log files.

- `spark.nomad.executor.logMaxFiles` `(string: "5")` - Specifies the number of
- `spark.nomad.executor.logMaxFiles` `(string: "5")` - Specifies the number of
log files that Nomad should keep for executor tasks.

- `spark.nomad.executor.networkMBits` `(string: "1")` - Specifies the network
- `spark.nomad.executor.networkMBits` `(string: "1")` - Specifies the network
bandwidth that Nomad should allocate to executor tasks.

- `spark.nomad.executor.retryAttempts` `(string: "5")` - Specifies the number of
times that Nomad should retry executor task groups upon failure.

- `spark.nomad.executor.retryDelay` `(string: "15s")` - Specifies the length of
- `spark.nomad.executor.retryDelay` `(string: "15s")` - Specifies the length of
time that Nomad should wait before retrying executor task groups upon failure.

- `spark.nomad.executor.retryInterval` `(string: "1d")` - Specifies Nomad's retry
- `spark.nomad.executor.retryInterval` `(string: "1d")` - Specifies Nomad's retry
interval for executor task groups.

- `spark.nomad.job.template` `(string: nil)` - Specifies the path to a JSON file
containing a Nomad job to use as a template. This can also be set with
- `spark.nomad.job.template` `(string: nil)` - Specifies the path to a JSON file
containing a Nomad job to use as a template. This can also be set with
`spark-submit's --nomad-template` parameter.

- `spark.nomad.namespace` `(string: nil)` - Specifies the namespace to use. This
falls back first to the NOMAD_NAMESPACE environment variable and then to Nomad's
default namespace.
- `spark.nomad.namespace` `(string: nil)` - Specifies the namespace to use. This
falls back first to the NOMAD_NAMESPACE environment variable and then to Nomad's
default namespace.

- `spark.nomad.priority` `(string: nil)` - Specifies the priority for the
- `spark.nomad.priority` `(string: nil)` - Specifies the priority for the
Nomad job.

- `spark.nomad.region` `(string: dynamic)` - Specifies the Nomad region to use.
- `spark.nomad.region` `(string: dynamic)` - Specifies the Nomad region to use.
This property defaults to the region of the first Nomad server contacted.

- `spark.nomad.shuffle.cpu` `(string: "1000")` - Specifies the CPU in MHz that
- `spark.nomad.shuffle.cpu` `(string: "1000")` - Specifies the CPU in MHz that
should be reserved for shuffle service tasks.

- `spark.nomad.shuffle.logMaxFileSize` `(string: "1m")` - Specifies the maximum
size by time that Nomad should use for shuffle service task log files..

- `spark.nomad.shuffle.logMaxFiles` `(string: "5")` - Specifies the number of
- `spark.nomad.shuffle.logMaxFiles` `(string: "5")` - Specifies the number of
log files that Nomad should keep for shuffle service tasks.

- `spark.nomad.shuffle.memory` `(string: "256m")` - Specifies the memory that
- `spark.nomad.shuffle.memory` `(string: "256m")` - Specifies the memory that
Nomad should allocate for the shuffle service tasks.

- `spark.nomad.shuffle.networkMBits` `(string: "1")` - Specifies the network
- `spark.nomad.shuffle.networkMBits` `(string: "1")` - Specifies the network
bandwidth that Nomad should allocate to shuffle service tasks.

- `spark.nomad.sparkDistribution` `(string: nil)` - Specifies the location of
- `spark.nomad.sparkDistribution` `(string: nil)` - Specifies the location of
the Spark distribution archive file to use.

- `spark.nomad.tls.caCert` `(string: nil)` - Specifies the path to a `.pem` file
containing the certificate authority that should be used to validate the Nomad
containing the certificate authority that should be used to validate the Nomad
server's TLS certificate.

- `spark.nomad.tls.cert` `(string: nil)` - Specifies the path to a `.pem` file
- `spark.nomad.tls.cert` `(string: nil)` - Specifies the path to a `.pem` file
containing the TLS certificate to present to the Nomad server.

- `spark.nomad.tls.trustStorePassword` `(string: nil)` - Specifies the path to a
`.pem` file containing the private key corresponding to the certificate in
`.pem` file containing the private key corresponding to the certificate in
[spark.nomad.tls.cert](#spark-nomad-tls-cert).

Original file line number Diff line number Diff line change
@@ -1,24 +1,24 @@
---
layout: "guides"
page_title: "Apache Spark Integration - Customizing Applications"
sidebar_current: "guides-spark-customizing"
sidebar_current: "guides-analytical-workloads-spark-customizing"
description: |-
Learn how to customize the Nomad job that is created to run a Spark
Learn how to customize the Nomad job that is created to run a Spark
application.
---

# Customizing Applications

There are two ways to customize the Nomad job that Spark creates to run an
There are two ways to customize the Nomad job that Spark creates to run an
application:

- Use the default job template and set configuration properties
- Use a custom job template

## Using the Default Job Template

The Spark integration will use a generic job template by default. The template
includes groups and tasks for the driver, executors and (optionally) the
The Spark integration will use a generic job template by default. The template
includes groups and tasks for the driver, executors and (optionally) the
[shuffle service](/guides/spark/dynamic.html). The job itself and the tasks that
are created have the `spark.nomad.role` meta value defined accordingly:

Expand All @@ -45,7 +45,7 @@ job "structure" {
}
}

# Shuffle service tasks are only added when enabled (as it must be when
# Shuffle service tasks are only added when enabled (as it must be when
# using dynamic allocation)
task "shuffle-service" {
meta {
Expand All @@ -56,38 +56,38 @@ job "structure" {
}
```

The default template can be customized indirectly by explicitly [setting
The default template can be customized indirectly by explicitly [setting
configuration properties](/guides/spark/configuration.html).

## Using a Custom Job Template

An alternative to using the default template is to set the
`spark.nomad.job.template` configuration property to the path of a file
An alternative to using the default template is to set the
`spark.nomad.job.template` configuration property to the path of a file
containing a custom job template. There are two important considerations:

* The template must use the JSON format. You can convert an HCL jobspec to
* The template must use the JSON format. You can convert an HCL jobspec to
JSON by running `nomad job run -output <job.nomad>`.

* `spark.nomad.job.template` should be set to a path on the submitting
machine, not to a URL (even in cluster mode). The template does not need to
* `spark.nomad.job.template` should be set to a path on the submitting
machine, not to a URL (even in cluster mode). The template does not need to
be accessible to the driver or executors.

Using a job template you can override Spark’s default resource utilization, add
additional metadata or constraints, set environment variables, add sidecar
tasks and utilize the Consul and Vault integration. The template does
not need to be a complete Nomad job specification, since Spark will add
everything necessary to run your the application. For example, your template
might set `job` metadata, but not contain any task groups, making it an
Using a job template you can override Spark’s default resource utilization, add
additional metadata or constraints, set environment variables, add sidecar
tasks and utilize the Consul and Vault integration. The template does
not need to be a complete Nomad job specification, since Spark will add
everything necessary to run your the application. For example, your template
might set `job` metadata, but not contain any task groups, making it an
incomplete Nomad job specification but still a valid template to use with Spark.

To customize the driver task group, include a task group in your template that
To customize the driver task group, include a task group in your template that
has a task that contains a `spark.nomad.role` meta value set to `driver`.

To customize the executor task group, include a task group in your template that
has a task that contains a `spark.nomad.role` meta value set to `executor` or
To customize the executor task group, include a task group in your template that
has a task that contains a `spark.nomad.role` meta value set to `executor` or
`shuffle`.

The following template adds a `meta` value at the job level and an environment
The following template adds a `meta` value at the job level and an environment
variable to the executor task group:

```hcl
Expand Down Expand Up @@ -122,5 +122,5 @@ The order of precedence for customized settings is as follows:

## Next Steps

Learn how to [allocate resources](/guides/spark/resource.html) for your Spark
Learn how to [allocate resources](/guides/spark/resource.html) for your Spark
applications.
Original file line number Diff line number Diff line change
@@ -1,25 +1,25 @@
---
layout: "guides"
page_title: "Apache Spark Integration - Dynamic Executors"
sidebar_current: "guides-spark-dynamic"
sidebar_current: "guides-analytical-workloads-spark-dynamic"
description: |-
Learn how to dynamically scale Spark executors based the queue of pending
Learn how to dynamically scale Spark executors based the queue of pending
tasks.
---

# Dynamically Allocate Spark Executors

By default, the Spark application will use a fixed number of executors. Setting
`spark.dynamicAllocation` to `true` enables Spark to add and remove executors
during execution depending on the number of Spark tasks scheduled to run. As
By default, the Spark application will use a fixed number of executors. Setting
`spark.dynamicAllocation` to `true` enables Spark to add and remove executors
during execution depending on the number of Spark tasks scheduled to run. As
described in [Dynamic Resource Allocation](http://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation), dynamic allocation requires that `spark.shuffle.service.enabled` be set to `true`.

On Nomad, this adds an additional shuffle service task to the executor
task group. This results in a one-to-one mapping of executors to shuffle
On Nomad, this adds an additional shuffle service task to the executor
task group. This results in a one-to-one mapping of executors to shuffle
services.

When the executor exits, the shuffle service continues running so that it can
serve any results produced by the executor. Due to the nature of resource
When the executor exits, the shuffle service continues running so that it can
serve any results produced by the executor. Due to the nature of resource
allocation in Nomad, the resources allocated to the executor tasks are not
freed until the shuffle service (and the application) has finished.

Expand Down
Loading