CSI: enforce usage at claim time #12112

tgross · 2022-02-23T20:12:56Z

When the scheduler checks feasibility for CSI volumes, the check is
fairly loose: earlier versions of the same job are not counted as
active claims. This allows the scheduler to place new allocations
for the new version of a job, under the assumption that we'll replace
the existing allocations and their volume claims.

But when the alloc runner claims the volume, we need to enforce the
active claims even if they're for allocations of an earlier version of
the job. Otherwise we'll try to mount a volume that's currently being
unmounted, and this will cause replacement allocations to frequently
fail.

This changeset corrects this behavior for both write claims and read
claims. I've broken it across a small set of commits for clarity.

Partial fix for #8609 but will also require #12113 to handle this scenario gracefully on the client.

If a volume has been created but not yet claimed, its capabilities will be checked in `WriteSchedulable` at both scheduling time and claim time. We don't need to also check them in the `FreeWriteClaims` method.

When the scheduler checks feasibility for CSI volumes, the check is fairly loose: earlier versions of the same job are not counted as active claims. This allows the scheduler to place new allocations for the new version of a job, under the assumption that we'll replace the existing allocations and their volume claims. But when the alloc runner claims the volume, we need to enforce the active claims even if they're for allocations of an earlier version of the job. Otherwise we'll try to mount a volume that's currently being unmounted, and this will cause replacement allocations to frequently fail. This commit correctly enforces maximum volume claims for writers.

When the alloc runner makes a claim for a read-only volume, we only check that the volume is potentially schedulable and not that it actually has free read claims.

shoenig

LGTM!

In #12112 and #12113 we solved for the problem of races in releasing volume claims, but there was a case that we missed. During a node drain with a controller attach/detach, we can hit a race where we call controller publish before the unpublish has completed. This is discouraged in the spec but plugins are supposed to handle it safely. But if the storage provider's API is slow enough and the plugin doesn't handle the case safely, the volume can get "locked" into a state where the provider's API won't detach it cleanly. Check the claim before making any external controller publish RPC calls so that Nomad is responsible for the canonical information about whether a volume is currently claimed. This has a couple side-effects that also had to get fixed here: * Changing the order means that the volume will have a past claim without a valid external node ID because it came from the client, and this uncovered a separate bug where we didn't assert the external node ID was valid before returning it. Fallthrough to getting the ID from the plugins in the state store in this case. We avoided this originally because of concerns around plugins getting lost during node drain but now that we've fixed that we may want to revisit it in future work. * We should make sure we're handling `FailedPrecondition` cases from the controller plugin the same way we handle other retryable cases.

In #12112 and #12113 we solved for the problem of races in releasing volume claims, but there was a case that we missed. During a node drain with a controller attach/detach, we can hit a race where we call controller publish before the unpublish has completed. This is discouraged in the spec but plugins are supposed to handle it safely. But if the storage provider's API is slow enough and the plugin doesn't handle the case safely, the volume can get "locked" into a state where the provider's API won't detach it cleanly. Check the claim before making any external controller publish RPC calls so that Nomad is responsible for the canonical information about whether a volume is currently claimed. This has a couple side-effects that also had to get fixed here: * Changing the order means that the volume will have a past claim without a valid external node ID because it came from the client, and this uncovered a separate bug where we didn't assert the external node ID was valid before returning it. Fallthrough to getting the ID from the plugins in the state store in this case. We avoided this originally because of concerns around plugins getting lost during node drain but now that we've fixed that we may want to revisit it in future work. * We should make sure we're handling `FailedPrecondition` cases from the controller plugin the same way we handle other retryable cases. * Several tests had to be updated because they were assuming we fail in a particular order that we're no longer doing.

github-actions · 2022-10-18T02:46:14Z

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

vercel bot deployed to Preview – nomad-storybook-and-ui February 23, 2022 20:16 View deployment

vercel bot temporarily deployed to Preview – nomad February 23, 2022 20:16 Inactive

tgross mentioned this pull request Feb 23, 2022

CSI: retry claims from client #12113

Merged

tgross added backport/1.0 theme/storage type/bug labels Feb 23, 2022

tgross added this to the 1.3.0 milestone Feb 23, 2022

tgross added backport/1.2 and removed backport/1.0 labels Feb 23, 2022

tgross added 4 commits February 23, 2022 15:45

csi: remove redundant schedulable check in FreeWriteClaims

3aedd9b

If a volume has been created but not yet claimed, its capabilities will be checked in `WriteSchedulable` at both scheduling time and claim time. We don't need to also check them in the `FreeWriteClaims` method.

csi: enforce single-node reader check for read-only volumes

85c086e

When the alloc runner makes a claim for a read-only volume, we only check that the volume is potentially schedulable and not that it actually has free read claims.

changelog entry

7200e88

tgross force-pushed the csi-enforce-usage-at-claim-time branch from 78f2944 to 7200e88 Compare February 23, 2022 20:46

vercel bot temporarily deployed to Preview – nomad February 23, 2022 20:46 Inactive

vercel bot deployed to Preview – nomad-storybook-and-ui February 23, 2022 20:46 View deployment

tgross marked this pull request as ready for review February 23, 2022 21:17

tgross requested review from lgfa29, DerekStrickland and shoenig February 23, 2022 21:19

tgross linked an issue Feb 23, 2022 that may be closed by this pull request

CSI: single access_mode is not enforced within a job #10157

Closed

tgross removed a link to an issue Feb 23, 2022

CSI: single access_mode is not enforced within a job #10157

Closed

shoenig approved these changes Feb 24, 2022

View reviewed changes

tgross merged commit 6b6b827 into main Feb 24, 2022

tgross deleted the csi-enforce-usage-at-claim-time branch February 24, 2022 14:37

tgross mentioned this pull request Mar 18, 2022

CSI: AccessMode not changed to <none> after claims released #11921

Closed

tgross mentioned this pull request Mar 25, 2022

CSI: reorder controller volume detachment #12387

Merged

lgfa29 added backport/1.1.x backport to 1.1.x release line backport/1.2.x backport to 1.1.x release line labels Apr 19, 2022

This was referenced Apr 19, 2022

Backport of CSI: enforce usage at claim time into release/1.2.x #12648

Merged

Backport of CSI: enforce usage at claim time into release/1.1.x #12649

Merged

lgfa29 removed stage/needs-backporting labels Apr 19, 2022

github-actions bot locked as resolved and limited conversation to collaborators Oct 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CSI: enforce usage at claim time #12112

CSI: enforce usage at claim time #12112

tgross commented Feb 23, 2022 •

edited

Loading

shoenig left a comment

github-actions bot commented Oct 18, 2022

CSI: enforce usage at claim time #12112

CSI: enforce usage at claim time #12112

Conversation

tgross commented Feb 23, 2022 • edited Loading

shoenig left a comment

Choose a reason for hiding this comment

github-actions bot commented Oct 18, 2022

tgross commented Feb 23, 2022 •

edited

Loading