Rewrite controller #812

wongma7 · 2018-06-15T20:31:06Z

I will fill this in later. Basically it should be rewritten to reflect current best practices

cofyc · 2018-06-16T02:04:48Z

I'm glad to help.

wongma7 · 2018-06-19T14:52:08Z

doing a li'l audit of the code, here are some tasks:

I'm not convinced the informers need changing, as far as I can tell we are using shared informers correctly, only we are initializing them ourselves instead of using the free constructed versions at e.g. coreinformers.PersistentVolumeInformer. Of course we should move to using those versions anyway just for future proof & verbosity purposes. Anyway there is not much to be done here as every provisioner instance needs its own set of informers and they can't "share" their informer cache with one another, inefficient as that is.
Work queues. Maybe make threadiness configurable
Make the resync period a lot higher than 15 seconds. Scalability: Increase resync period or make parameterizable kubernetes-csi/external-provisioner#100 (comment)
Clean up retry and exponential backoff logic. Iirc something similar happened upstream. Also, our retry period is tied to resync period so we need to decouple it.
Remove per-PVC leader election and add per-class leader election. The idea behind per-PVC was to cheaply enable "HA". I highly doubt anybody is deploying provisioners in this way. If they are, too bad! Only nfs-provisioner should be affected.
Doing above allows us to remove all the event watching logic. Yeah we use kubernetes events as...an event bus. Not the best idea

I am working on work queues, resync period, & retry period. Somebody is working on leader election.

wongma7 · 2018-06-19T14:53:06Z

Actually I'm just going to work on resync period and retry for now.

orainxiong · 2018-07-04T09:02:05Z

@wongma7

I have created a PR #837 to do two following major works:

Modify the implementation of leader election from per-PVC to per-Class
Leverage informer cache rather than talking directly against API Server

In my scenario, those are the root which cause 'external-provisioner' getting throttling.

I have no idea which is a right way to fix the problem. If any problems, please let me know. : - )

fejta-bot · 2019-04-24T12:42:56Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2019-05-24T13:25:03Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2019-06-23T14:16:07Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2019-06-23T14:16:15Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

wongma7 added the area/lib label Jun 15, 2018

wongma7 self-assigned this Jun 15, 2018

wongma7 added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Jun 15, 2018

wongma7 mentioned this issue Jun 15, 2018

Does possible to restrict concurrent provision operations started by controller? #733

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 24, 2019

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 24, 2019

k8s-ci-robot closed this as completed Jun 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite controller #812

Rewrite controller #812

wongma7 commented Jun 15, 2018

cofyc commented Jun 16, 2018

wongma7 commented Jun 19, 2018

wongma7 commented Jun 19, 2018

orainxiong commented Jul 4, 2018

fejta-bot commented Apr 24, 2019

fejta-bot commented May 24, 2019

fejta-bot commented Jun 23, 2019

k8s-ci-robot commented Jun 23, 2019

Rewrite controller #812

Rewrite controller #812

Comments

wongma7 commented Jun 15, 2018

cofyc commented Jun 16, 2018

wongma7 commented Jun 19, 2018

wongma7 commented Jun 19, 2018

orainxiong commented Jul 4, 2018

fejta-bot commented Apr 24, 2019

fejta-bot commented May 24, 2019

fejta-bot commented Jun 23, 2019

k8s-ci-robot commented Jun 23, 2019