docs: backport #12466 into release-1.1.x (#12717)

hashicorp · Apr 20, 2022 · 0ed1750 · 0ed1750
1 parent c714639
commit 0ed1750
Show file tree

Hide file tree

Showing 2 changed files with 23 additions and 15 deletions.
diff --git a/website/content/docs/internals/plugins/csi.mdx b/website/content/docs/internals/plugins/csi.mdx
@@ -55,6 +55,11 @@ that perform both the controller and node roles in the same
 instance. Not every plugin provider has or needs a controller; that's
 specific to the provider implementation.
 
+Plugins mount and unmount volumes but are not in the data path once
+the volume is mounted for a task. Plugin tasks are needed when tasks
+using their volumes stop, so plugins should be left running on a Nomad
+client until all tasks using their volumes are stopped.
+
 You should always run node plugins as Nomad `system` jobs and use the
 `-ignore-system` flag on the `nomad node drain` command to ensure that the
 node plugins are still running while the node is being drained. Use
@@ -65,7 +70,8 @@ region. Controller plugins can be run as `service` jobs.
 Nomad exposes a Unix domain socket named `csi.sock` inside each CSI
 plugin task, and communicates over the gRPC protocol expected by the
 CSI specification. The `mount_dir` field tells Nomad where the plugin
-expects to find the socket file.
+expects to find the socket file. The path to this socket is exposed in
+the container as the `CSI_ENDPOINT` environment variable.
 
 ### Plugin Lifecycle and State
 
@@ -94,7 +100,7 @@ server and waits for a response; the allocation's tasks won't start
 until the volume has been claimed and is ready.
 
 If the volume's plugin requires a controller, the server will send an
-RPC to the Nomad client where that controller is running. The Nomad
+RPC to any Nomad client where that controller is running. The Nomad
 client will forward this request over the controller plugin's gRPC
 socket. The controller plugin will make the request volume available
 to the node that needs it.
@@ -110,13 +116,14 @@ client, and the node plugin mounts the volume to a staging area in
 the Nomad data directory. Nomad will bind-mount this staged directory
 into each task that mounts the volume.
 
-This cycle is reversed when a task that claims a volume becomes terminal. The
-client will send an "unpublish" RPC to the server, which will send "detach"
-RPCs to the node plugin. The node plugin unmounts the bind-mount from the
-allocation and unmounts the volume from the plugin (if it's not in use by
-another task). The server will then send "unpublish" RPCs to the controller
-plugin (if any), and decrement the claim count for the volume. At this point
-the volume’s claim capacity has been freed up for scheduling.
+This cycle is reversed when a task that claims a volume becomes
+terminal. The client frees the volume locally by making "unpublish"
+RPCs to the node plugin. The node plugin unmounts the bind-mount from
+the allocation and unmounts the volume from the plugin (if it's not in
+use by another task). The client will then send an "unpublish" RPC to
+the server, which will forward it to the the controller plugin (if
+any), and decrement the claim count for the volume. At this point the
+volume’s claim capacity has been freed up for scheduling.
 
 [csi-spec]: https://github.com/container-storage-interface/spec
 [csi-drivers-list]: https://kubernetes-csi.github.io/docs/drivers.html

diff --git a/website/content/docs/job-specification/csi_plugin.mdx b/website/content/docs/job-specification/csi_plugin.mdx
@@ -51,17 +51,18 @@ option.
 
 ## Recommendations for Deploying CSI Plugins
 
-CSI plugins run as Nomad jobs but after mounting the volume are not in the
-data path for the volume. Jobs that mount volumes write and read directly to
+CSI plugins run as Nomad tasks, but after mounting the volume are not in the
+data path for the volume. Tasks that mount volumes write and read directly to
 the volume via a bind-mount and there is no communication between the job and
 the CSI plugin. But when an allocation that mounts a volume stops, Nomad will
 need to communicate with the plugin on that allocation's node to unmount the
 volume. This has implications on how to deploy CSI plugins:
 
-* During node drains, jobs that claim volumes must be moved before the `node`
-  or `monolith` plugin for those volumes. You should run `node` or `monolith`
-  plugins as [`system`][system] jobs and use the `-ignore-system` flag on
-  `nomad node drain` to ensure that the plugins are running while the node is
+* If you are stopping jobs on a node, you must stop tasks that claim
+  volumes before stopping the `node` or `monolith` plugin for those
+  volumes. You should run `node` or `monolith` plugins as
+  [`system`][system] jobs and use the `-ignore-system` flag on `nomad
+  node drain` to ensure that the plugins are running while the node is
   being drained.
 
 * Only one plugin instance of a given plugin ID and type (controller or node)