Skip to content
This repository has been archived by the owner on Oct 21, 2020. It is now read-only.

Rewrite controller #812

Closed
wongma7 opened this issue Jun 15, 2018 · 8 comments
Closed

Rewrite controller #812

wongma7 opened this issue Jun 15, 2018 · 8 comments
Assignees
Labels
area/lib help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@wongma7
Copy link
Contributor

wongma7 commented Jun 15, 2018

I will fill this in later. Basically it should be rewritten to reflect current best practices

@wongma7 wongma7 self-assigned this Jun 15, 2018
@wongma7 wongma7 added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Jun 15, 2018
@cofyc
Copy link
Contributor

cofyc commented Jun 16, 2018

I'm glad to help.

@wongma7
Copy link
Contributor Author

wongma7 commented Jun 19, 2018

doing a li'l audit of the code, here are some tasks:

  • I'm not convinced the informers need changing, as far as I can tell we are using shared informers correctly, only we are initializing them ourselves instead of using the free constructed versions at e.g. coreinformers.PersistentVolumeInformer. Of course we should move to using those versions anyway just for future proof & verbosity purposes. Anyway there is not much to be done here as every provisioner instance needs its own set of informers and they can't "share" their informer cache with one another, inefficient as that is.
  • Work queues. Maybe make threadiness configurable
  • Make the resync period a lot higher than 15 seconds. Scalability: Increase resync period or make parameterizable kubernetes-csi/external-provisioner#100 (comment)
  • Clean up retry and exponential backoff logic. Iirc something similar happened upstream. Also, our retry period is tied to resync period so we need to decouple it.
  • Remove per-PVC leader election and add per-class leader election. The idea behind per-PVC was to cheaply enable "HA". I highly doubt anybody is deploying provisioners in this way. If they are, too bad! Only nfs-provisioner should be affected.
  • Doing above allows us to remove all the event watching logic. Yeah we use kubernetes events as...an event bus. Not the best idea

I am working on work queues, resync period, & retry period. Somebody is working on leader election.

@wongma7
Copy link
Contributor Author

wongma7 commented Jun 19, 2018

Actually I'm just going to work on resync period and retry for now.

@orainxiong
Copy link

@wongma7

I have created a PR #837 to do two following major works:

  • Modify the implementation of leader election from per-PVC to per-Class
  • Leverage informer cache rather than talking directly against API Server

In my scenario, those are the root which cause 'external-provisioner' getting throttling.

I have no idea which is a right way to fix the problem. If any problems, please let me know. : - )

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 24, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 24, 2019
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/lib help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

5 participants