Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSI: move node unmount to server-driven RPCs #7596

Merged
merged 4 commits into from
Apr 2, 2020

Conversation

tgross
Copy link
Member

@tgross tgross commented Apr 2, 2020

Fixes #7180

If a volume-claiming alloc stops and the CSI Node plugin that serves that alloc's volumes is missing, there's no way for the allocrunner hook to send the NodeUnpublish and NodeUnstage RPCs. Added complications:

  • The CSI spec requires that any ControllerUnpublish RPC happens after those two RPCs.
  • We don't want to stop volume-claiming jobs on Nomad client restart or plugin update/stop because that would be a big change from existing Nomad job behavior (and generally unpleasant for operators).
  • Nomad's implementation of the volume staging has the mountpoint path specific to the plugin ID, but specific to the volume ID. That means we don't have to be subject to the same "hang forever" condition as k8s is demonstrating in CSI: shutdown consuming tasks when node plugins shutdown #7180 (comment).
  • If the Node plugin is restored before the task gives up on unmounting the volume, everything works. But this is a short window.
  • But if the plugin is down and a task using it stops and then the plugin comes back later, we'd want that state tracked so we remember the mount point and can unpublish.

This changeset addresses this issue with a redesign of the client-side for CSI. Rather than unmounting in the alloc runner hook, the alloc runner hook will simply exit. When the server gets the Node.UpdateAlloc for the terminal allocation that had a volume claim, it creates a volume claim GC job. This job will made client RPCs to a new node plugin RPC endpoint, and only once that succeeds, move on to making the client RPCs to the controller plugin. If the node plugin is unavailable, the GC job will fail and be requeued.

We don't have to track any state about what the client has done, just that the allocs are terminal. When we do the "heartyeet" work in #2185, most of the plumbing is already in place. And this gives us the opportunity to add a Nomad command for operators to forcibly drop claims via the CLI in the future.

There's a bit of work done here (probably easiest to review commit-by-commit):

  • added the client-side RPC endpoint
  • moved all the client unmount operations into that handler
  • reworked the GC job significantly so that the logic could be exercised in tests without making a bunch of RPC

@tgross tgross added this to the 0.11.0 milestone Apr 2, 2020
@tgross tgross requested a review from langmartin April 2, 2020 02:54
tgross added a commit that referenced this pull request Apr 2, 2020
@tgross
Copy link
Member Author

tgross commented Apr 2, 2020

I just ran an end-to-end test of this where I deployed plugins, deployed jobs that need those plugins, removed the plugins, and then stopped the volume-claiming jobs. The deployed jobs stop normally, but the GC job fails and gets requeued as expected; this prevents a nomad system gc from completing its work as well. When I restored the plugins, the GCs complete and the claims were released as expected.

@tgross
Copy link
Member Author

tgross commented Apr 2, 2020

Interesting failure mode that stops a system GC. I had a volume claim that wasn't released for some reason (TBD), but it looks like the node plugin work had been completed. So when the system GC fired, it's hitting an error. We need to make sure the node client endpoint handles that gracefully.

     2020-04-02T14:33:06.100Z [DEBUG] core.sched: forced job GC
    2020-04-02T14:33:06.100Z [DEBUG] core.sched: job GC found eligible objects: jobs=2 evals=9 allocs=0
    2020-04-02T14:33:06.115Z [ERROR] worker: error invoking scheduler: error="failed to process evaluation: 2 errors occurred:
        * rpc error: node detach volume: rpc error: code = Internal desc = Could not unmount "/csi/per-alloc/1cda822e-bed8-0b90-92e6-24bbbf
Unmounting arguments: /csi/per-alloc/1cda822e-bed8-0b90-92e6-24bbbfdfe647/vol-0634a0d38d55012f9/rw-file-system-single-node-writer
Output: umount: /csi/per-alloc/1cda822e-bed8-0b90-92e6-24bbbfdfe647/vol-0634a0d38d55012f9/rw-file-system-single-node-writer: no mount point
        * rpc error: node detach volume: rpc error: code = Internal desc = Could not unmount "/csi/per-alloc/1cda822e-bed8-0b90-92e6-24bbbf
Unmounting arguments: /csi/per-alloc/1cda822e-bed8-0b90-92e6-24bbbfdfe647/vol-0634a0d38d55012f9/rw-file-system-single-node-writer
Output: umount: /csi/per-alloc/1cda822e-bed8-0b90-92e6-24bbbfdfe647/vol-0634a0d38d55012f9/rw-file-system-single-node-writer: no mount point
"

@tgross tgross force-pushed the f-csi-unmount-nodes-from-server branch from 8cd461f to a9e0d16 Compare April 2, 2020 18:31
client/csi_endpoint.go Show resolved Hide resolved
Copy link
Contributor

@langmartin langmartin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok! I think it looks good

@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 11, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CSI: shutdown consuming tasks when node plugins shutdown
2 participants