Replace per-PVC leader election with per-cluster #892

wongma7 · 2018-07-30T17:11:09Z

continuing work in #837

wongma7 · 2018-07-31T18:59:20Z

@orainxiong please review. 1st commit is taken from your work at #837 (thanks) 2nd commit is the replacement leader election. It will create an endpoints object in namespace kube-system with name equal to provisionerName, i.e. the value of storage class provisioner that the provisioner should watch. e.g. if the provisioner on the storageclass is example.com/nfs then the endpoint will be kube-system/example.com-nfs

Some open questions:

should the namespace for the endpoints object be configurable instead of just kube-system? IMO it's okay to hardcode it as kube-system.
is it bad to give provisioners endpoints write access? (don't think this is too big a deal, we already distribute clusterroles where provisioners have SECRETS access and it's up to users to audit it)
- TODO for me: fix all the clusterrole.yamls throughout the repo (while at it, merge serviceaccount+clusterrole+clusterrolebinding yamls) so that they include this new permission

/cc @cofyc

wongma7 · 2018-07-31T20:52:56Z

also TODO for me tomorrow: fix NFS docs, I think they reference hostpath daemonsets and such somewhere which won't be possible anymore :p

orainxiong · 2018-08-01T01:47:34Z

@wongma7

LGTM!

You have a better workaround, and I have closed related issues in external-provisioner repo.

If possible I will measure the effects of this pr for performance improvement with the same test case of provisioning 100 PVC at once.

BTW, as far as I know, the leaderelection inside k8s.io/client-go with current revision v8.0.0 probably cause overlap of two leaders. That says two instances of external-provisioner will be active at the same time when the original leader takes a really long time to call the function tryAcquireOrRenew since there is no threshold to set a timeout for tryAcquireOrRenew. The problem has been fixed in the master branch of k8s.io/client-go.

err := wait.PollImmediateUntil(le.config.RetryPeriod, func() (bool, error) {
			done := make(chan bool, 1)
			go func() {
				defer close(done)
				done <- le.tryAcquireOrRenew()
			}()

			select {
			case <-timeoutCtx.Done():
				return false, fmt.Errorf("failed to tryAcquireOrRenew %s", timeoutCtx.Err())
			case result := <-done:
				return result, nil
			}
		}, timeoutCtx.Done())

I hope it works.

wongma7 · 2018-08-01T14:52:14Z

I don't really understand the bug, how common is it? How can tryAcquireOrRenew get stuck? I don't want to bump all the k8s.io/* dependencies to master. :/ Since upstream did not think it's important enough to cherry-pick I am inclined to live with the bug for now.

wongma7 · 2018-08-01T18:19:44Z

I will probably tag lib v5.0.0 when this merges and then rebuild-push every image. This will breaks the library's required RBAC policies and so every provisioner out there will stop working if the cluster has RBAC configured. IDK how much chaos it will cause.

wongma7 · 2018-08-01T18:21:41Z

BTW, I also want to make the leader election opt-out. I.e., allow the author to do leader election at some higher level (in their main where they Run the controller) and make the controller ignorant, if they want. But this can be added later.

cofyc · 2018-08-03T05:00:46Z

lib/controller/controller.go

@@ -141,10 +142,6 @@ type ProvisionController struct {
 	// when multiple controllers are running: they race to lock (lead) every PVC
 	// so that only one calls Provision for it (saving API calls, CPU cycles...)


this comment should be updated

cofyc · 2018-08-03T05:17:13Z

lib/controller/controller.go

 	pvName := ctrl.getProvisionedVolumeNameForClaim(claim)
-	volume, err := ctrl.client.CoreV1().PersistentVolumes().Get(pvName, metav1.GetOptions{})
-	if err == nil && volume != nil {
+	_, exists, err := ctrl.volumes.GetByKey(fmt.Sprintf("%s/%s", namespace, pvName))


If we check volume existence in cache.Store, perhaps we should wait informers are fully synced before running any controller logic.

cofyc · 2018-08-03T05:19:07Z

lib/controller/controller.go

+		go ctrl.claimController.Run(stopCh)
+		go ctrl.volumeController.Run(stopCh)
+		go ctrl.classController.Run(stopCh)
+


cache.WaitForCacheSync(stopCh, ctrl.claimInformer.HasSynced, ctrl.volumeInformer.HasSynced, ctrl.classInformer.HasSynced)

oh , should use controller.HasSynced, ctrl.xxxInformer is optional.
cache.WaitForCacheSync(stopCh, ctrl.claimController.HasSynced, ctrl.volumeController.HasSynced, ctrl.classController.HasSynced)

Good idea, thank you

While I was looking at this code, I also think it is a bug that we call Run for SharedInformers. Users of lib should be able to Run the SharedInformers whenever they want... May as well fix it while we are here IMO

wongma7 · 2018-08-09T22:01:32Z

Last call for review @cofyc

I will merge this tomorrow and tag a release. I plan to fix the nfs e2e tests in a separate pr. I’ve revamped them so they’re not so fragile

Other than the e2e testing I’ve done some local testing. Probably not sufficient but it can’t be helped. Anyway the code is identical to controller manager so we should be okay

cofyc · 2018-08-10T02:12:24Z

/lgtm

…om-nfs

…nts permissions, consolidate where possible

k8s-ci-robot · 2018-08-10T11:58:44Z

New changes are detected. LGTM label has been removed.

wongma7 · 2018-08-10T12:01:20Z

/lgtm
rebased for controller_test.go

k8s-ci-robot · 2018-08-10T12:01:20Z

@wongma7: you cannot LGTM your own PR.

In response to this:

/lgtm
rebased for controller_test.go

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

weherdh · 2018-08-15T03:52:08Z

@wongma7 Hi, this change break Openshift testing... Openshift does not have resource PodSecurityPolicy, could you please also add the deployment files for Openshfit? And also provide the instruction how we deploy nfs-provisioner on Openshift. If it is needed, I'd like to open a bug for this. Thanks

wongma7 · 2018-08-15T14:17:31Z

@funky81 please try the new RBAC with the latest release v3.0.0-k8s1.11

@weherdh sorry, I removed the SCC without a replacement https://github.com/kubernetes-incubator/external-storage/pull/892/files#diff-fbc2b7e3391a05df13aa2ae2e9e9831a. Please feel free to open a bug so I can track it. In openshift instead of creating PSP we create SCC and it should work right?

wongma7 added the area/lib label Jul 30, 2018

k8s-ci-robot requested a review from cofyc July 31, 2018 18:59

wongma7 force-pushed the leader-election branch 3 times, most recently from ec85e58 to e906dce Compare July 31, 2018 20:09

orainxiong mentioned this pull request Aug 1, 2018

fix apiserver throttling issue kubernetes-csi/external-provisioner#104

Closed

wongma7 force-pushed the leader-election branch from 51851a4 to a682b20 Compare August 1, 2018 18:15

wongma7 changed the title ~~[WIP} Replace per-PVC leader election with per-cluster~~ Replace per-PVC leader election with per-cluster Aug 1, 2018

cofyc reviewed Aug 3, 2018

View reviewed changes

wongma7 force-pushed the leader-election branch 2 times, most recently from 4e1ba01 to 4ea77e4 Compare August 7, 2018 18:58

wongma7 added area/aws/efs area/ceph/cephfs area/gluster/block area/nfs area/nfs-client area/ceph/rbd area/iscsi/targetd labels Aug 9, 2018

wongma7 added area/gluster/glusterfs area/openebs area/snapshot area/digitalocean area/flex labels Aug 9, 2018

wongma7 mentioned this pull request Aug 9, 2018

Revamp nfs e2e tests to fork k8s instead of vendoring it and use yamls #911

Merged

k8s-ci-robot assigned cofyc Aug 10, 2018

k8s-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 10, 2018

wongma7 added 8 commits August 10, 2018 07:57

Remove per-PVC leader election

bbf5948

Do endpoints leader-election based on provisionerName, e.g. example.c…

948038c

…om-nfs

dep ensure

a4e72ba

Remove no longer relevant MultipleControllers race-to-lock test

a7d3c90

Change all clusterroles to have endpoints permissions and reduced eve…

d46083d

…nts permissions, consolidate where possible

Remove unused leaderelection fork package

4db2567

SharedInformers: WaitForCacheSync before starting and don't call Run

1a624c9

Use same ID for events & leader election

8052cf7

wongma7 force-pushed the leader-election branch from 4ea77e4 to 8052cf7 Compare August 10, 2018 11:58

k8s-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Aug 10, 2018

wongma7 added the lgtm Indicates that a PR is ready to be merged. label Aug 10, 2018

wongma7 merged commit 2db4446 into kubernetes-retired:master Aug 10, 2018

jsafrane mentioned this pull request Mar 31, 2020

Delay PublishVolumeRequest until OFFLINE-resizable Volume is resized kubernetes-csi/external-attacher#207

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace per-PVC leader election with per-cluster #892

Replace per-PVC leader election with per-cluster #892

wongma7 commented Jul 30, 2018

wongma7 commented Jul 31, 2018

wongma7 commented Jul 31, 2018

orainxiong commented Aug 1, 2018 •

edited

Loading

wongma7 commented Aug 1, 2018

wongma7 commented Aug 1, 2018

wongma7 commented Aug 1, 2018

cofyc Aug 3, 2018

cofyc Aug 3, 2018

cofyc Aug 3, 2018 •

edited

Loading

cofyc Aug 3, 2018

wongma7 Aug 3, 2018

wongma7 commented Aug 9, 2018

cofyc commented Aug 10, 2018

k8s-ci-robot commented Aug 10, 2018

wongma7 commented Aug 10, 2018

k8s-ci-robot commented Aug 10, 2018

weherdh commented Aug 15, 2018

wongma7 commented Aug 15, 2018

		@@ -141,10 +142,6 @@ type ProvisionController struct {
		// when multiple controllers are running: they race to lock (lead) every PVC
		// so that only one calls Provision for it (saving API calls, CPU cycles...)

Replace per-PVC leader election with per-cluster #892

Replace per-PVC leader election with per-cluster #892

Conversation

wongma7 commented Jul 30, 2018

wongma7 commented Jul 31, 2018

wongma7 commented Jul 31, 2018

orainxiong commented Aug 1, 2018 • edited Loading

wongma7 commented Aug 1, 2018

wongma7 commented Aug 1, 2018

wongma7 commented Aug 1, 2018

cofyc Aug 3, 2018

Choose a reason for hiding this comment

cofyc Aug 3, 2018

Choose a reason for hiding this comment

cofyc Aug 3, 2018 • edited Loading

Choose a reason for hiding this comment

cofyc Aug 3, 2018

Choose a reason for hiding this comment

wongma7 Aug 3, 2018

Choose a reason for hiding this comment

wongma7 commented Aug 9, 2018

cofyc commented Aug 10, 2018

k8s-ci-robot commented Aug 10, 2018

wongma7 commented Aug 10, 2018

k8s-ci-robot commented Aug 10, 2018

weherdh commented Aug 15, 2018

wongma7 commented Aug 15, 2018

orainxiong commented Aug 1, 2018 •

edited

Loading

cofyc Aug 3, 2018 •

edited

Loading