Skip to content
This repository has been archived by the owner on Oct 21, 2020. It is now read-only.

Replace per-PVC leader election with per-cluster #892

Merged
merged 8 commits into from
Aug 10, 2018

Conversation

wongma7
Copy link
Contributor

@wongma7 wongma7 commented Jul 30, 2018

continuing work in #837

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 30, 2018
@wongma7
Copy link
Contributor Author

wongma7 commented Jul 31, 2018

@orainxiong please review. 1st commit is taken from your work at #837 (thanks) 2nd commit is the replacement leader election. It will create an endpoints object in namespace kube-system with name equal to provisionerName, i.e. the value of storage class provisioner that the provisioner should watch. e.g. if the provisioner on the storageclass is example.com/nfs then the endpoint will be kube-system/example.com-nfs

Some open questions:

  • should the namespace for the endpoints object be configurable instead of just kube-system? IMO it's okay to hardcode it as kube-system.
  • is it bad to give provisioners endpoints write access? (don't think this is too big a deal, we already distribute clusterroles where provisioners have SECRETS access and it's up to users to audit it)
    • TODO for me: fix all the clusterrole.yamls throughout the repo (while at it, merge serviceaccount+clusterrole+clusterrolebinding yamls) so that they include this new permission

/cc @cofyc

@k8s-ci-robot k8s-ci-robot requested a review from cofyc July 31, 2018 18:59
@wongma7 wongma7 force-pushed the leader-election branch 3 times, most recently from ec85e58 to e906dce Compare July 31, 2018 20:09
@wongma7
Copy link
Contributor Author

wongma7 commented Jul 31, 2018

also TODO for me tomorrow: fix NFS docs, I think they reference hostpath daemonsets and such somewhere which won't be possible anymore :p

@orainxiong
Copy link

orainxiong commented Aug 1, 2018

@wongma7

LGTM!

You have a better workaround, and I have closed related issues in external-provisioner repo.

If possible I will measure the effects of this pr for performance improvement with the same test case of provisioning 100 PVC at once.

BTW, as far as I know, the leaderelection inside k8s.io/client-go with current revision v8.0.0 probably cause overlap of two leaders. That says two instances of external-provisioner will be active at the same time when the original leader takes a really long time to call the function tryAcquireOrRenew since there is no threshold to set a timeout for tryAcquireOrRenew. The problem has been fixed in the master branch of k8s.io/client-go.

err := wait.PollImmediateUntil(le.config.RetryPeriod, func() (bool, error) {
			done := make(chan bool, 1)
			go func() {
				defer close(done)
				done <- le.tryAcquireOrRenew()
			}()

			select {
			case <-timeoutCtx.Done():
				return false, fmt.Errorf("failed to tryAcquireOrRenew %s", timeoutCtx.Err())
			case result := <-done:
				return result, nil
			}
		}, timeoutCtx.Done())

I hope it works.

@wongma7
Copy link
Contributor Author

wongma7 commented Aug 1, 2018

I don't really understand the bug, how common is it? How can tryAcquireOrRenew get stuck? I don't want to bump all the k8s.io/* dependencies to master. :/ Since upstream did not think it's important enough to cherry-pick I am inclined to live with the bug for now.

@wongma7 wongma7 changed the title [WIP} Replace per-PVC leader election with per-cluster Replace per-PVC leader election with per-cluster Aug 1, 2018
@wongma7
Copy link
Contributor Author

wongma7 commented Aug 1, 2018

I will probably tag lib v5.0.0 when this merges and then rebuild-push every image. This will breaks the library's required RBAC policies and so every provisioner out there will stop working if the cluster has RBAC configured. IDK how much chaos it will cause.

@wongma7
Copy link
Contributor Author

wongma7 commented Aug 1, 2018

BTW, I also want to make the leader election opt-out. I.e., allow the author to do leader election at some higher level (in their main where they Run the controller) and make the controller ignorant, if they want. But this can be added later.

@@ -141,10 +142,6 @@ type ProvisionController struct {
// when multiple controllers are running: they race to lock (lead) every PVC
// so that only one calls Provision for it (saving API calls, CPU cycles...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment should be updated

pvName := ctrl.getProvisionedVolumeNameForClaim(claim)
volume, err := ctrl.client.CoreV1().PersistentVolumes().Get(pvName, metav1.GetOptions{})
if err == nil && volume != nil {
_, exists, err := ctrl.volumes.GetByKey(fmt.Sprintf("%s/%s", namespace, pvName))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we check volume existence in cache.Store, perhaps we should wait informers are fully synced before running any controller logic.

go ctrl.claimController.Run(stopCh)
go ctrl.volumeController.Run(stopCh)
go ctrl.classController.Run(stopCh)

Copy link
Contributor

@cofyc cofyc Aug 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cache.WaitForCacheSync(stopCh, ctrl.claimInformer.HasSynced, ctrl.volumeInformer.HasSynced, ctrl.classInformer.HasSynced)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh , should use controller.HasSynced, ctrl.xxxInformer is optional.
cache.WaitForCacheSync(stopCh, ctrl.claimController.HasSynced, ctrl.volumeController.HasSynced, ctrl.classController.HasSynced)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, thank you

While I was looking at this code, I also think it is a bug that we call Run for SharedInformers. Users of lib should be able to Run the SharedInformers whenever they want... May as well fix it while we are here IMO

@wongma7
Copy link
Contributor Author

wongma7 commented Aug 9, 2018

Last call for review @cofyc

I will merge this tomorrow and tag a release. I plan to fix the nfs e2e tests in a separate pr. I’ve revamped them so they’re not so fragile

Other than the e2e testing I’ve done some local testing. Probably not sufficient but it can’t be helped. Anyway the code is identical to controller manager so we should be okay

@cofyc
Copy link
Contributor

cofyc commented Aug 10, 2018

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 10, 2018
@k8s-ci-robot
Copy link
Contributor

New changes are detected. LGTM label has been removed.

@k8s-ci-robot k8s-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Aug 10, 2018
@wongma7
Copy link
Contributor Author

wongma7 commented Aug 10, 2018

/lgtm
rebased for controller_test.go

@k8s-ci-robot
Copy link
Contributor

@wongma7: you cannot LGTM your own PR.

In response to this:

/lgtm
rebased for controller_test.go

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@wongma7 wongma7 added the lgtm Indicates that a PR is ready to be merged. label Aug 10, 2018
@wongma7 wongma7 merged commit 2db4446 into kubernetes-retired:master Aug 10, 2018
@weherdh
Copy link
Contributor

weherdh commented Aug 15, 2018

@wongma7 Hi, this change break Openshift testing... Openshift does not have resource PodSecurityPolicy, could you please also add the deployment files for Openshfit? And also provide the instruction how we deploy nfs-provisioner on Openshift. If it is needed, I'd like to open a bug for this. Thanks

@wongma7
Copy link
Contributor Author

wongma7 commented Aug 15, 2018

@funky81 please try the new RBAC with the latest release v3.0.0-k8s1.11

@weherdh sorry, I removed the SCC without a replacement https://github.com/kubernetes-incubator/external-storage/pull/892/files#diff-fbc2b7e3391a05df13aa2ae2e9e9831a. Please feel free to open a bug so I can track it. In openshift instead of creating PSP we create SCC and it should work right?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants