csi: volume claim garbage collection #7125

tgross · 2020-02-11T22:13:21Z

When an alloc is marked terminal (and after node unstage/unpublish
have been called), the client syncs the terminal alloc state with the
server via Node.UpdateAlloc RPC.

For each job that has a terminal alloc, the Node.UpdateAlloc RPC
handler at the server will emit an eval for a new core job to garbage
collect CSI volume claims. When this eval is handled on the core
scheduler, it will call a volumeReap method to release the claims
for all terminal allocs on the job.

The volume reap will issue a ControllerUnpublishVolume RPC for any
alloc that has volumes with a controller plugin. Once this returns (or
is skipped), the volume reap will send a new CSIVolume.Claim RPC
that releases the volume claim for that allocation in the state store,
making it available for scheduling again.

This same volumeReap method will be called from the core job GC,
which gives us a second chance to reclaim volumes during GC if there
were controller RPC failures.

tgross · 2020-02-13T21:38:38Z

nomad/core_sched_test.go

+	err = core.Process(eval)
+	require.NoError(t, err)
+
+	// Verify the claim was released


@langmartin wanted to flag this for you in particular because you implemented the state store work. As far as I can tell there's no real reason to bother with tracking PastAllocs with this implementation, and we can swap out the implementation of structs.CSIVolume.ClaimRelease with that of structs.CSIVolume.GCAlloc. Do you have any thoughts here?

PastAllocs at this point is there to serve as a debugging tool. If there's a user sensible gap of several minutes between the reapVolumes that would detect the ReadAlloc + Terminal state and the eval reap that deletes the allocations, it's worth keeping both stages.

PastAllocs at this point is there to serve as a debugging tool. If there's a user sensible gap of several minutes between the reapVolumes that would detect the ReadAlloc + Terminal state and the eval reap that deletes the allocations, it's worth keeping both stages.

I was trying to wrap up the RFC section to explain this, but PastAllocs as it stands right now doesn't get us anything. When we get Node.UpdateAlloc for a terminal alloc, we can't move that alloc out of Read/WriteAllocs because that would make it eligible for scheduling before we've released the claim. At the very least we'd need to have a PastReadAllocs and a PastWriteAllocs and check that during scheduling, but right now we're not looking in PastAllocs at all during scheduling.

Alternately, we could add the alloc to PastAllocs but not remove it from ReadAllocs/WriteAllocs until it's GC'd, but I'm not sure that helps us in any way that isn't better served by checking if the alloc is terminal.

I think the way you have it working is correct. Here's how I think it should go:

UpdateAlloc informs the state store that the alloc is terminal

The volume and eval are marked for GC

If it's unclaimed, we ControllerUnpublish

we ClaimRelease, and move the alloc to PastAllocs

EvalGCThreshold later, the the eval is garbage collected, and on alloc GC we delete it from PastAllocs

The gap between 4 & 5 is the user perceivable bit that PastAllocs gets us, I think. I'm a bit fuzzy on the details of where the eval or job is marked for GC to actually reap the eval (and allocs) but I think that configured duration kicks in either way.

Ok, I think I get what you're proposing at least.

But when we GC the job/alloc we end up having to run the volume GC process anyways, because we don't have any guarantees they're not running concurrently (we can interleave transactions with the lengthy ControllerUnpublish). So the PastAllocs might be useful but could just as easily be removed instantly depending on timing, which makes for an unreliable debugging instrument that we have to pay for in extra raft transactions.

Also, without anything currently consuming PastAllocs I'm feeling extra-skeptical about its use at this stage of the design.

Ok, that's fair. We can always re-introduce it if we need to.

nomad/core_sched.go

langmartin · 2020-02-14T14:50:02Z

nomad/core_sched_test.go

+	err = core.Process(eval)
+	require.NoError(t, err)
+
+	// Verify the claim was released


PastAllocs at this point is there to serve as a debugging tool. If there's a user sensible gap of several minutes between the reapVolumes that would detect the ReadAlloc + Terminal state and the eval reap that deletes the allocations, it's worth keeping both stages.

nomad/core_sched.go

When an alloc is marked terminal (and after node unstage/unpublish have been called), the client syncs the terminal alloc state with the server via `Node.UpdateAlloc RPC`. For each job that has a terminal alloc, the `Node.UpdateAlloc` RPC handler at the server will emit an eval for a new core job to garbage collect CSI volume claims. When this eval is handled on the core scheduler, it will call a `volumeReap` method to release the claims for all terminal allocs on the job. The volume reap will issue a `ControllerUnpublishVolume` RPC for any node that has no alloc claiming the volume. Once this returns (or is skipped), the volume reap will send a new `CSIVolume.Claim` RPC that releases the volume claim for that allocation in the state store, making it available for scheduling again. This same `volumeReap` method will be called from the core job GC, which gives us a second chance to reclaim volumes during GC if there were controller RPC failures.

github-actions · 2023-01-17T02:15:47Z

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

tgross added the theme/storage label Feb 11, 2020

tgross added this to the 0.11.0 milestone Feb 11, 2020

tgross force-pushed the f-csi-volume-gc branch 2 times, most recently from 877e159 to ef11c8e Compare February 13, 2020 14:02

tgross mentioned this pull request Feb 13, 2020

CSI Volume Attachment/Detachment #6904

Closed

tgross force-pushed the f-csi-volume-gc branch 6 times, most recently from e213e55 to 1942fc3 Compare February 13, 2020 20:57

tgross force-pushed the f-csi-volumes branch from 33571ea to 44bdb3c Compare February 13, 2020 21:07

tgross force-pushed the f-csi-volume-gc branch from 1942fc3 to 37af303 Compare February 13, 2020 21:08

tgross marked this pull request as ready for review February 13, 2020 21:35

tgross requested review from langmartin and endocrimes February 13, 2020 21:35

tgross commented Feb 13, 2020

View reviewed changes

tgross force-pushed the f-csi-volume-gc branch from 37af303 to 9333bcf Compare February 13, 2020 21:40

tgross commented Feb 13, 2020

View reviewed changes

nomad/core_sched.go Outdated Show resolved Hide resolved

endocrimes reviewed Feb 14, 2020

View reviewed changes

nomad/core_sched.go Outdated Show resolved Hide resolved

tgross commented Feb 14, 2020

View reviewed changes

nomad/core_sched.go Outdated Show resolved Hide resolved

tgross force-pushed the f-csi-volume-gc branch 2 times, most recently from c06d075 to 6b219bd Compare February 14, 2020 19:07

endocrimes force-pushed the f-csi-volumes branch from 8eb6aaa to 5773db2 Compare February 17, 2020 10:37

tgross force-pushed the f-csi-volume-gc branch from 6b219bd to 5a3a9b0 Compare February 18, 2020 13:19

langmartin reviewed Feb 18, 2020

View reviewed changes

nomad/core_sched.go Show resolved Hide resolved

langmartin reviewed Feb 18, 2020

View reviewed changes

nomad/core_sched.go Outdated Show resolved Hide resolved

tgross force-pushed the f-csi-volume-gc branch from e42f01e to a976066 Compare February 18, 2020 19:29

tgross added 3 commits February 19, 2020 08:47

csi: remove PastAllocs from state_store

21966cf

fix condition for claimed nodes

7e0faf4

switch to alloc ID in CSIVolume.Claim request

a4a4b75

tgross force-pushed the f-csi-volume-gc branch from a976066 to a4a4b75 Compare February 19, 2020 13:51

tgross merged commit 3cf905d into f-csi-volumes Feb 19, 2020

tgross deleted the f-csi-volume-gc branch February 19, 2020 14:05

This was referenced Feb 19, 2020

CSI: implement server-side unpublish/unclaim for Node.UpdateAlloc #7036

Closed

CSI: release volume claims in core job GC #7038

Closed

github-actions bot locked as resolved and limited conversation to collaborators Jan 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

csi: volume claim garbage collection #7125

csi: volume claim garbage collection #7125

tgross commented Feb 11, 2020 •

edited

Loading

tgross Feb 13, 2020

langmartin Feb 14, 2020

tgross Feb 18, 2020

langmartin Feb 18, 2020 •

edited

Loading

tgross Feb 18, 2020

tgross Feb 18, 2020

langmartin Feb 18, 2020

langmartin Feb 14, 2020

github-actions bot commented Jan 17, 2023

csi: volume claim garbage collection #7125

csi: volume claim garbage collection #7125

Conversation

tgross commented Feb 11, 2020 • edited Loading

tgross Feb 13, 2020

Choose a reason for hiding this comment

langmartin Feb 14, 2020

Choose a reason for hiding this comment

tgross Feb 18, 2020

Choose a reason for hiding this comment

langmartin Feb 18, 2020 • edited Loading

Choose a reason for hiding this comment

tgross Feb 18, 2020

Choose a reason for hiding this comment

tgross Feb 18, 2020

Choose a reason for hiding this comment

langmartin Feb 18, 2020

Choose a reason for hiding this comment

langmartin Feb 14, 2020

Choose a reason for hiding this comment

github-actions bot commented Jan 17, 2023

tgross commented Feb 11, 2020 •

edited

Loading

langmartin Feb 18, 2020 •

edited

Loading