Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

document docker dangling container reaper #6762

Merged
merged 3 commits into from
Nov 22, 2019
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 45 additions & 3 deletions website/source/docs/drivers/docker.html.md
Original file line number Diff line number Diff line change
Expand Up @@ -166,7 +166,7 @@ The `docker` driver supports the following configuration in the job spec. Only
}
```

* `logging` - (Optional) A key-value map of Docker logging options.
* `logging` - (Optional) A key-value map of Docker logging options.
Defaults to `json-file` with log rotation (`max-file=2` and `max-size=2m`).

```hcl
Expand Down Expand Up @@ -648,6 +648,13 @@ plugin "docker" {
image = true
image_delay = "3m"
container = true

dangling_containers {
enabled = true
dry_run = false
period = "5m"
creation_grace = "5m"
}
}

volumes {
Expand Down Expand Up @@ -690,7 +697,7 @@ plugin "docker" {
* `config`<a id="plugin_auth_file"></a> - Allows an operator to specify a
JSON file which is in the dockercfg format containing authentication
information for a private registry, from either (in order) `auths`,
`credHelpers` or `credsStore`.
`credHelpers` or `credsStore`.
* `helper`<a id="plugin_auth_helper"></a> - Allows an operator to specify a
[credsStore](https://docs.docker.com/engine/reference/commandline/login/#credential-helper-protocol)
-like script on $PATH to lookup authentication information from external
Expand Down Expand Up @@ -719,6 +726,16 @@ plugin "docker" {
* `container` - Defaults to `true`. This option can be used to disable Nomad
from removing a container when the task exits. Under a name conflict,
Nomad may still remove the dead container.
* `dangling_containers` stanza for controlling dangling container detection
and cleanup:
* `enabled` - Defaults to `true`. Enables dangling container handling.
* `dry_run` - Defaults to `false`. Enables a mode where nomad logs
notnoop marked this conversation as resolved.
Show resolved Hide resolved
potential dangling containers without killing them.
* `period` - Defaults to `"5m"`. A time duration that controls interval
between Nomad scans for dangling containers.
* `creation_grace` - Defaults to `"5m"`. A time duration that controls
notnoop marked this conversation as resolved.
Show resolved Hide resolved
how long a container can run before it is tracked by Nomad or gets
marked (and killed) as a dangling container

* `volumes` stanza:
* `enabled` - Defaults to `true`. Allows tasks to bind host paths
Expand Down Expand Up @@ -894,7 +911,32 @@ need a higher degree of isolation between processes for security or other
reasons, it is recommended to use full virtualization like
[QEMU](/docs/drivers/qemu.html).

## Docker for Windows Caveats
## Caveats

### Dangling Containers

Nomad 0.10.2 introduces a detector and a reaper for dangling Docker containers,
containers that Nomad starts yet does not manage or track. Though rare, they
sometimes in very loaded clusters and lead to unexpectedly running services,
notnoop marked this conversation as resolved.
Show resolved Hide resolved
potentially with stale versions.

When Docker daemon becomes unavailable as Nomad starts a task, it is possible
for Docker to successfully start the container and fails the API call with 500
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for Docker to successfully start the container and fails the API call with 500
for Docker to successfully start the container but return a 500 error code from the API call.

error code. In such cases, Nomad retries and eventually aims to kill such
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
error code. In such cases, Nomad retries and eventually aims to kill such
In such cases, Nomad retries and eventually aims to kill such

containers. However, if the Docker Engine remains unhealthy, subsequent retries
and stop attempts may still fail, and the started container becomes a dangling
container that Nomad no longer manages.

The newly added reaper periodically scans for such containers. It only targets
containers with a `com.hashicorp.nomad.allocation_id` label, or match Nomad's
conventions for naming and bind-mounts (i.e. `/alloc`, `/secrets`, `local`).
Containers that don't match Nomad container patterns are left untouched.

Operators can run the reaper in a dry run mode, where it only logs dangling
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Operators can run the reaper in a dry run mode, where it only logs dangling
Operators can run the reaper in a dry-run mode, where it only logs dangling

container ids without killing them, or simply disable it through
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
container ids without killing them, or simply disable it through
container ids without killing them, or disable it by setting

the `gc.dangling_containers` config stanza.

### Docker for Windows

Docker for Windows only supports running Windows containers. Because Docker for
Windows is relatively new and rapidly evolving you may want to consult the
Expand Down
11 changes: 11 additions & 0 deletions website/source/guides/upgrade/upgrade-specific.html.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,16 @@ details provided for their upgrades as a result of new features or changed
behavior. This page is used to document those details separately from the
standard upgrade flow.

## Nomad 0.10.2

Nomad 0.10.2 addresses an issue occurring in heavily loaded clients, where
containers are started without being properly managed by Nomad. Nomad 0.10.2
introduced a reaper that detects and kills such containers.

Operators may opt to run reaper in a dry mode or disabling it through a client config.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Operators may opt to run reaper in a dry mode or disabling it through a client config.
Operators may opt to run reaper in a dry-run mode or disabling it through a client config.


For more information, see [Docker Dangling containers][dangling-containers].

## Nomad 0.10.0

### Deployments
Expand Down Expand Up @@ -364,6 +374,7 @@ deleted and then Nomad 0.3.0 can be launched.

[drain-api]: /api/nodes.html#drain-node
[drain-cli]: /docs/commands/node/drain.html
[dangling-containers]: /docs/drivers/docker.html#dangling-containers
[hcl2]: https://github.com/hashicorp/hcl2
[lxc]: /docs/drivers/external/lxc.html
[migrate]: /docs/job-specification/migrate.html
Expand Down