[WIP] 🌱 Controller Lifecycle Management #1180

kevindelgado · 2020-09-25T23:09:38Z

This is a proof of concept PR for the Controller Lifecycle Management design doc at #1192.

It is:

A rebase of shomron’s PR at ✨ Allow removing individual informers from the cache (#935) #936 to enable removal of individual informers that is pre-requisite for the other goals.
A minimal set of hooks needed to enable a user to conditionally start/stop/restart controllers on their own.
An implementation of EventHandler reference counting on the Informer.

k8s-ci-robot · 2020-09-25T23:09:46Z

Welcome @kevindelgado!

It looks like this is your first PR to kubernetes-sigs/controller-runtime 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/controller-runtime has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2020-09-25T23:09:46Z

Hi @kevindelgado. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

kevindelgado · 2020-09-25T23:20:27Z

Adding WIP, recent change to prevent controllers from starting more than once (#1139) breaks this. Need to find a way to allow conditional runnables to set the controller's started field to false upon CRD uninstall.

alvaroaleman · 2020-09-28T19:40:54Z

pkg/manager/internal.go

 	}

 	cm.startedLeader = true
 }

+func (cm *controllerManager) startConditionalRunnables() {


hi @kevindelgado , thanks for this PR. It seems to be its a bit chaotic and involves a set of semi-related changes, like the removal of informers which is IIRC not something we have agreed on and definitely something for a distinct PR.

As for the conditionanally runnables idea, a much simpler approach would be to just check whatever you want to check before adding them to the manager:

if clusterHasMyCRD(){ controller.New() }

This also has the advantage that you can check anything, not whatever happens to be implemented for this in the manager.

Thanks for the feedback @alvaroaleman!

We’re definitely open to better ways of doing this, but we are currently doing what you suggest. Outside of controller-runtime we conditionally start the manager based on the existence of the CRD. A couple issues we see are:

Increasing complexity. Specifically, when multiple controllers are abstracted to appear as a single controller or when the manager is running multiple controllers that watch a mixture of CRDs that will always be installed on the cluster, but also CRDs that are only sometimes installed.

This approach still isn’t great for fluidly handling CRD install/uninstall/reinstall (rather than just waiting for install before starting), especially when latency becomes important immediately upon CRD installation.

/cc @DirectXMan12 in case they can better explain our desire to push this functionality further down the stack into c-r.

This sounds to me like you should probably build a metacontroller that watches crds and then starts and stops the actual controllers.

If you continue to think this requires changes in controller-runtime, the first step would be a design doc that describes the use-case, why this is problematic with how controller-runtime works currently, what changes would be required and possible alternatives so we can properly discuss this. We have the /designs folder for this.

Sure thing, we’re discussing alternatives and working to create a more compelling story for this and will have a design doc up if/when that is the case.

Sorry for the lack of context here, when learning the codebase it was easier to express ideas through concrete code than in prose initially.

While the code has been helpful to steer discussions and now make it easier to formalize into a design doc, I definitely understand that it appears chaotic and disorganized externally. Feel free to close if you want to reduce clutter in the repo, I can always submit a new one if necessary.

I think we can probably move a lot of this into a separate helper runnable that manages its own sub-runnables and managed the start/stop of things. We'd still need a way to start/stop caches though, and possible some other minor tweaks.

At the very least I think it'd be useful to have some basic hooks for better being able to start/stop controllers in CR, especially for multi-cluster usecases, where you don't necessarily want to/can't degrade performance for all clusters just to deal with a change in one. It also provides a motivating minimal usecase for starting/stopping caches

I think one of the things to keep in mind here is that if we go with the approach of "oh, you should just build it as a meta-controller", we need to make sure the hooks are there to do it cleanly, and we need to make sure those hooks are documented as such so that they don't get removed later on down the road.

At the very least I think it'd be useful to have some basic hooks for better being able to start/stop controllers in CR, especially for multi-cluster usecases, where you don't necessarily want to/can't degrade performance for all clusters just to deal with a change in one. It also provides a motivating minimal usecase for starting/stopping caches

Sure thing, but today we don't have any official story at all for multi-cluster without the assumption that clusters might come and go. I am not objecting to the basic idea (although I still think that if this is about being fast as mentioned above, a metacontroller would be the best), I just think that this requires a lot of groundwork first.

kevindelgado · 2020-09-28T21:23:55Z

Fixed the rebasing issues caused by not being able to restart controllers.

DirectXMan12 · 2020-09-29T21:14:30Z

pkg/manager/internal.go

+						if !ok {
+							log.Error(errors.New("runnable is not a controller"), "")
+						}
+						controller.Started = false


this is not a safe operation, IIRC

k8s-ci-robot · 2020-10-01T01:23:47Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: kevindelgado
To complete the pull request process, please assign joelanford after the PR has been reviewed.
You can assign the PR to them by writing /assign @joelanford in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

kevindelgado · 2020-10-01T01:38:38Z

Here’s a major overhaul of the previous approach that hopefully is much less invasive to the existing code that goes along with the design doc at #1192. Gonna bring this up for discussion at the next kubebuilder meeting.

If you can take a look at each commit individually, hopefully it comes across as less chaotic.

The second one is pretty minimal and is the only change strictly necessary to solving our problem (the second and third are optional and less fleshed out, the first is just a direct rebase and copy of #936).

The hope is that we can find some part of this small enough to move forward with, without first having a great story around multi-cluster.

@alvaroaleman @DirectXMan12

kevindelgado · 2020-10-07T20:06:47Z

I've distilled this PR into just the minimal changes to C-R necessary to enable better controller lifecycle management.

The POC for conditional controllers is on this branch https://github.com/kevindelgado/controller-runtime/tree/experimental/conditional-runnables, but it was clear that there's plenty to discuss just on allowing informer removal and the minimal hooks needed use informer removal to start/stop/restart controllers cleanly.

The design doc at #1192 has also been updated to reflect this.

This change adds support for removing individual informers from the cache using the new Remove() method. This is allowed before or after the cache has been started. Informers are stopped at the time of removal - once stopped, they will no longer deliver events to registered event handlers, and registered watched will be aborted. Also adds non-blocking API for getting informers without waiting for their cache to sync - GetInformerNonBlocking(). Signed-off-by: Oren Shomron <shomron@gmail.com>

kevindelgado · 2020-10-19T22:11:17Z

Updated to reflect updates to the doc #1192, PTAL anyone interested

fejta-bot · 2021-01-17T22:41:01Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

k8s-ci-robot · 2021-01-17T22:41:08Z

@kevindelgado: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

fejta-bot · 2021-02-16T23:26:56Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

fejta-bot · 2021-03-19T00:13:05Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

k8s-ci-robot · 2021-03-19T00:13:12Z

@fejta-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Sep 25, 2020

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Sep 25, 2020

k8s-ci-robot requested review from droot and joelanford September 25, 2020 23:09

k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Sep 25, 2020

kevindelgado force-pushed the pr/conditional-runnables branch from d4debf9 to aca96ee Compare September 25, 2020 23:14

kevindelgado changed the title ~~✨ Conditionally Runnable Controllers~~ [WIP] ✨ Conditionally Runnable Controllers Sep 25, 2020

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 25, 2020

alvaroaleman reviewed Sep 28, 2020

View reviewed changes

kevindelgado marked this pull request as draft September 28, 2020 20:07

kevindelgado force-pushed the pr/conditional-runnables branch from aca96ee to 778577e Compare September 28, 2020 21:11

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 28, 2020

kevindelgado force-pushed the pr/conditional-runnables branch from 778577e to 052b938 Compare September 28, 2020 21:16

kevindelgado changed the title ~~[WIP] ✨ Conditionally Runnable Controllers~~ ✨ Conditionally Runnable Controllers Sep 28, 2020

kevindelgado marked this pull request as ready for review September 28, 2020 21:24

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 28, 2020

kevindelgado marked this pull request as draft September 28, 2020 22:46

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 28, 2020

kevindelgado changed the title ~~✨ Conditionally Runnable Controllers~~ [WIP] ✨ Conditionally Runnable Controllers Sep 28, 2020

DirectXMan12 reviewed Sep 29, 2020

View reviewed changes

kevindelgado force-pushed the pr/conditional-runnables branch from 052b938 to 728317b Compare October 1, 2020 01:23

kevindelgado changed the title ~~[WIP] ✨ Conditionally Runnable Controllers~~ [WIP] ✨ Conditional Controllers Proof of Concept Oct 1, 2020

kevindelgado changed the title ~~[WIP] ✨ Conditional Controllers Proof of Concept~~ [WIP] 🌱 Conditional Controllers Proof of Concept Oct 1, 2020

kevindelgado mentioned this pull request Oct 1, 2020

Support running controller managers for CRDs not installed #1179

Closed

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 1, 2020

kevindelgado force-pushed the pr/conditional-runnables branch from 728317b to f0919d2 Compare October 7, 2020 19:57

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Oct 7, 2020

kevindelgado changed the title ~~[WIP] 🌱 Conditional Controllers Proof of Concept~~ [WIP] 🌱 Controller Restart Minimal Hooks Oct 7, 2020

shomron and others added 3 commits October 18, 2020 08:07

Enable starting/stopping controllers and their caches

579fb28

Add EventHandler reference counting

1cc6939

kevindelgado force-pushed the pr/conditional-runnables branch from f0919d2 to 1cc6939 Compare October 19, 2020 22:01

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 19, 2020

kevindelgado mentioned this pull request Oct 19, 2020

🌱 Controller Lifecycle Management design doc #1192

Closed

kevindelgado changed the title ~~[WIP] 🌱 Controller Restart Minimal Hooks~~ [WIP] 🌱 Controller Lifecycle Management Oct 19, 2020

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 17, 2021

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 17, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 16, 2021

k8s-ci-robot closed this Mar 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] 🌱 Controller Lifecycle Management #1180

[WIP] 🌱 Controller Lifecycle Management #1180

kevindelgado commented Sep 25, 2020 •

edited

Loading

k8s-ci-robot commented Sep 25, 2020

k8s-ci-robot commented Sep 25, 2020

kevindelgado commented Sep 25, 2020 •

edited

Loading

alvaroaleman Sep 28, 2020

kevindelgado Sep 28, 2020 •

edited

Loading

alvaroaleman Sep 28, 2020

kevindelgado Sep 29, 2020 •

edited

Loading

DirectXMan12 Sep 29, 2020 •

edited

Loading

alvaroaleman Sep 29, 2020

kevindelgado commented Sep 28, 2020

DirectXMan12 Sep 29, 2020

k8s-ci-robot commented Oct 1, 2020

kevindelgado commented Oct 1, 2020

kevindelgado commented Oct 7, 2020

kevindelgado commented Oct 19, 2020

fejta-bot commented Jan 17, 2021

k8s-ci-robot commented Jan 17, 2021

fejta-bot commented Feb 16, 2021

fejta-bot commented Mar 19, 2021

k8s-ci-robot commented Mar 19, 2021

[WIP] 🌱 Controller Lifecycle Management #1180

[WIP] 🌱 Controller Lifecycle Management #1180

Conversation

kevindelgado commented Sep 25, 2020 • edited Loading

k8s-ci-robot commented Sep 25, 2020

k8s-ci-robot commented Sep 25, 2020

kevindelgado commented Sep 25, 2020 • edited Loading

alvaroaleman Sep 28, 2020

Choose a reason for hiding this comment

kevindelgado Sep 28, 2020 • edited Loading

Choose a reason for hiding this comment

alvaroaleman Sep 28, 2020

Choose a reason for hiding this comment

kevindelgado Sep 29, 2020 • edited Loading

Choose a reason for hiding this comment

DirectXMan12 Sep 29, 2020 • edited Loading

Choose a reason for hiding this comment

alvaroaleman Sep 29, 2020

Choose a reason for hiding this comment

kevindelgado commented Sep 28, 2020

DirectXMan12 Sep 29, 2020

Choose a reason for hiding this comment

k8s-ci-robot commented Oct 1, 2020

kevindelgado commented Oct 1, 2020

kevindelgado commented Oct 7, 2020

kevindelgado commented Oct 19, 2020

fejta-bot commented Jan 17, 2021

k8s-ci-robot commented Jan 17, 2021

fejta-bot commented Feb 16, 2021

fejta-bot commented Mar 19, 2021

k8s-ci-robot commented Mar 19, 2021

kevindelgado commented Sep 25, 2020 •

edited

Loading

kevindelgado commented Sep 25, 2020 •

edited

Loading

kevindelgado Sep 28, 2020 •

edited

Loading

kevindelgado Sep 29, 2020 •

edited

Loading

DirectXMan12 Sep 29, 2020 •

edited

Loading