csi: move volume claim release into volumewatcher #7794

tgross · 2020-04-23T18:25:13Z

This changeset adds a subsystem to run on the leader, similar to the deployment watcher or node drainer. The Watcher performs a blocking query on updates to the CSIVolumes table and triggers reaping of volume claims.

This will avoid tying up scheduling workers by immediately sending volume claim workloads into their own loop, rather than blocking the scheduling workers in the core GC job doing things like talking to CSI controllers.

Notes:

There's unfortunately a lot of code in this PR but about third of it is tests and a third is removing the old GC job. I've broken it into commit-by-commit chunks that should be a bit more manageable.
Unit tests for this are pretty comprehensive, and I've hand-tested this with the hostpath, EBS, and EFS plugins.
I've pulled csi: read-repair CSI volume claims #7824 out into its own PR, but we'll need csi: read-repair CSI volume claims #7824 for correctness with pre-0.11.2 clusters that are upgraded.
This is a resurrection of csi: move volume claim release into volumewatcher #7708, after csi: checkpoint volume claim garbage collection #7782 was pulled out into separate work.
We'll also want a periodic job for iterating over batches of volumes to GC, similar to current periodic job GC. That work will be CSI: periodic volume and plugin cleanup #7825

This changeset adds a subsystem to run on the leader, similar to the deployment watcher or node drainer. The `Watcher` performs a blocking query on updates to the `CSIVolumes` table and triggers reaping of volume claims. This will avoid tying up scheduling workers by immediately sending volume claim workloads into their own loop, rather than blocking the scheduling workers in the core GC job doing things like talking to CSI controllers (This commit does not include wiring-up the leader or removing the old GC mechanism.)

Enable the volume watcher on leader step-up and disable it on leader step-down.

The volume claim GC mechanism now makes an empty claim RPC for the volume to trigger an index bump. That in turn unblocks the blocking query in the volume watcher so it can assess which claims can be released for a volume.

tgross · 2020-04-28T20:54:04Z

nomad/volumewatcher/batcher.go

@@ -0,0 +1,125 @@
+package volumewatcher


Note for review: this file draws heavily on https://github.com/hashicorp/nomad/blob/master/nomad/deploymentwatcher/batcher.go

tgross · 2020-04-28T20:55:27Z

nomad/fsm.go

@@ -1156,33 +1158,35 @@ func (n *nomadFSM) applyCSIVolumeDeregister(buf []byte, index uint64) interface{
 	return nil
 }

-func (n *nomadFSM) applyCSIVolumeClaim(buf []byte, index uint64) interface{} {


note for review: it's worth looking at this file without the diff... the diff really tangled this up because the applyCSIVolumeBatchClaim is similar to the applyCSIVolumeClaim. This should be entirely an addition.

tgross · 2020-04-28T20:56:02Z

nomad/interfaces.go

@@ -1,11 +0,0 @@
-package nomad


Note for review: this only supported the now-removed nomad/core_sched_test.go file.

tgross · 2020-04-28T20:57:21Z

nomad/volumewatcher/interfaces.go

@@ -0,0 +1,28 @@
+package volumewatcher


Note for review: these interfaces exist solely to give us handles for RPC mocks in the tests (so we don't have to set up clients with CSI plugins in unit tests).

tgross · 2020-04-28T20:58:47Z

nomad/volumewatcher/volume_watcher.go

+		return nil
+	}
+
+	for _, claim := range vol.PastClaims {


Note for review: from here down in this file we're almost entirely copying the code over from the nomad/core_sched.go file that's been removed. Changes mostly include better trace logging and naming/comments.

tgross · 2020-04-28T21:00:21Z

nomad/volumewatcher/volumes_watcher.go

+
+// TODO: this is currently dead code; we'll call a public remove
+// method on the Watcher once we have a periodic GC job
+// remove stops watching a volume and should only be called when locked.


langmartin

Ok! I've left a couple of non-blocking comments through here. This is a pile of work! I think it looks good and the reasoning is pretty straightforward.

langmartin · 2020-04-29T18:21:55Z

nomad/volumewatcher/batcher.go

+				claims[upd.VolumeID+upd.RequestNamespace()] = upd
+			}
+			w.f <- future
+		case <-timerCh:


In general for this pattern, I'd expect that we'd want to have a batch max size or timer, where either of those would apply the batch. I know the raft lib has some recent support for splitting large commits, but it seems like we're missing batch max from this (and deployments). I think we're good to go ahead, but should consider using an interface to allow this and deployments to use the same batcher code later.

Good call. Spun out to #7838

langmartin · 2020-04-29T19:43:27Z

nomad/volumewatcher/volumes_watcher.go

+
+	// Kill everything associated with the watcher
+	if w.exitFn != nil {
+		w.exitFn()


is this safe to call on a follower?

Yeah.

If the follower was recently a leader, we drop the blocking query it was making. Any in-flight work will be canceled except for client RPCs (which we can't yet cancel but are idempotent).

If the follower was not recently a leader but we call the flush() on it anyways for some spurious reason, it's safe to call a cancel function twice.

I'd convinced myself it was already, sorry, that was a note I made on the first pass and meant to remove before I hit the button. First read I thought exitFn looked like it was actually running the rpcs.

langmartin · 2020-04-29T21:19:32Z

nomad/volumewatcher/volume_watcher.go

+	}
+
+	// Start the long lived watcher that scans for allocation updates
+	w.Start()


For the long term, I'm a little worried about starting a go routine for every volume. It seems at least possible that some operators could set us up for more goroutines than we want running for these, maybe in a followup if warranted we could use a fixed pool?

I think that's a great idea. But it occurs to me we could also simply Stop() the goroutine when we've completed processing an incoming update (when we've determined either there are no more past claims to process or no more claims at all). This is a small change but will require some test reworking so I'm going to spin it off into #7837

nomad/volumewatcher/volume_watcher.go

Following the new volumewatcher in #7794 and performance improvements to it that landed afterwards, there's no particular reason we should be threading claim releases through the GC eval rather than writing an empty `CSIVolumeClaimRequest` with the mode set to `CSIVolumeClaimRelease`, just as the GC evaluation would do. Also, by batching up these raft messages, we can reduce the amount of raft writes by 1 and cross-server RPCs by 1 per volume we release claims on.

github-actions · 2023-01-08T02:17:57Z

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

tgross added the theme/storage label Apr 23, 2020

tgross force-pushed the f-csi-volume-watcher branch 18 times, most recently from 60bb6f3 to 75ed335 Compare April 27, 2020 20:12

csi: add batch volume claim to FSM

3854427

This was referenced Apr 28, 2020

e2e: csi test can purge target job #7823

Merged

csi: read-repair CSI volume claims #7824

Merged

tgross force-pushed the f-csi-volume-watcher branch from 75ed335 to 8c2ac25 Compare April 28, 2020 19:58

tgross added 2 commits April 28, 2020 16:03

csi: wire-up volumewatcher to leader

b158a1f

Enable the volume watcher on leader step-up and disable it on leader step-down.

tgross force-pushed the f-csi-volume-watcher branch from 8c2ac25 to e0b9f9d Compare April 28, 2020 20:03

tgross mentioned this pull request Apr 28, 2020

CSI: periodic volume and plugin cleanup #7825

Closed

tgross requested a review from langmartin April 28, 2020 20:26

tgross marked this pull request as ready for review April 28, 2020 20:27

csi: remove old volume claim GC mechanism

588acb5

The volume claim GC mechanism now makes an empty claim RPC for the volume to trigger an index bump. That in turn unblocks the blocking query in the volume watcher so it can assess which claims can be released for a volume.

tgross force-pushed the f-csi-volume-watcher branch from e0b9f9d to 588acb5 Compare April 28, 2020 20:30

tgross commented Apr 28, 2020

View reviewed changes

langmartin approved these changes Apr 29, 2020

View reviewed changes

This was referenced Apr 30, 2020

reduce running goroutines for volumewatcher #7837

Closed

add batch maximum to volume/deployment watcher #7838

Closed

[squashme] drop namespace from map key

14d5c68

tgross merged commit 775de0d into master Apr 30, 2020

tgross deleted the f-csi-volume-watcher branch April 30, 2020 13:13

This was referenced Apr 30, 2020

csi: controller plugin timeouts #7629

Closed

reorder volume claim batch request raft message #7871

Merged

tgross mentioned this pull request May 12, 2020

CSI: claim releases don't need to go thru GC eval #7933

Closed

tgross mentioned this pull request May 19, 2020

csi: don't pass volume claim releases thru GC eval #8021

Merged

github-actions bot locked as resolved and limited conversation to collaborators Jan 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

csi: move volume claim release into volumewatcher #7794

csi: move volume claim release into volumewatcher #7794

tgross commented Apr 23, 2020 •

edited

Loading

tgross Apr 28, 2020

tgross Apr 28, 2020

tgross Apr 28, 2020

tgross Apr 28, 2020

tgross Apr 28, 2020 •

edited

Loading

tgross Apr 28, 2020

langmartin left a comment

langmartin Apr 29, 2020

tgross Apr 30, 2020

langmartin Apr 29, 2020

tgross Apr 30, 2020

langmartin Apr 30, 2020

langmartin Apr 29, 2020

tgross Apr 30, 2020

github-actions bot commented Jan 8, 2023

csi: move volume claim release into volumewatcher #7794

csi: move volume claim release into volumewatcher #7794

Conversation

tgross commented Apr 23, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tgross Apr 28, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

langmartin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Jan 8, 2023

tgross commented Apr 23, 2020 •

edited

Loading

tgross Apr 28, 2020 •

edited

Loading