Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨Allow controllers to be started and stopped separately from the manager #863

Merged
merged 8 commits into from
Apr 22, 2020

Conversation

negz
Copy link
Contributor

@negz negz commented Mar 17, 2020

Fixes #730

Increasingly in Crossplane we find that we'd like to use one kind of custom resource to describe how another, arbitrary, kind of custom resource should be reconciled. For example a kind: Template resource may configure how a kind: Widget resource should be reconciled. This involves the controller that watches for kind: Template managing the lifecycles of another set of controllers, including the controller that watches for kind: Widget. This is not currently possible in controller-runtime because all controllers are implicitly added to and started by the controller manager, which has no facility for stopping or removing a controller.

This PR makes it possible to start, run, and stop a controller without ever adding the controller to the controller manager. My goal is not to arrive at the ideal API for this concept, but rather to enable the use case with the smallest possible change to controller-runtime. I imagine this will be niche functionality, and functionality that is slated for a broader refactoring at some point per #764.

I've tested this functionality using negz/crossplane@28e76ec, which contains an example implementation of a controller that starts and stops other controllers.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 17, 2020
@k8s-ci-robot
Copy link
Contributor

Welcome @negz!

It looks like this is your first PR to kubernetes-sigs/controller-runtime 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/controller-runtime has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot
Copy link
Contributor

Hi @negz. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Mar 17, 2020
@negz
Copy link
Contributor Author

negz commented Mar 17, 2020

/assign @gerred

@alexeldeib
Copy link
Contributor

/ok-to-test

love this, really elegant solution

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 17, 2020
@vincepri
Copy link
Member

This seems a breaking change for clients, unless I'm missing something?

@negz
Copy link
Contributor Author

negz commented Mar 17, 2020

@vincepri Could you elaborate on what makes it a breaking change? It's not immediately obvious to me. All existing functions, and methods continue to exist and function close to identically.

Running through the changes:

  1. We add controller.Configure as an alternative to controller.New
  2. controller.New will now explicitly configure c.SetFields, because controller.New now wraps controller.Configure. This should be a no-op, as this field was already injected by manager.Add.
  3. We add a manager.Elected() method to the manager interface.

Is it the third change you're concerned about? I typically would not consider new methods to be a breaking change, but I could see how it would be if we assume clients may be maintaining their own implementations of Manager. Eyeballing the size and nature of the Manager interface this seems really unlikely to me, but perhaps that is not the case?

@vincepri
Copy link
Member

Makes sense, I missed that last call to mgr.Add in the New function. It'd be great to provide examples around Configure for new users.

@alexeldeib
Copy link
Contributor

Could you also add a test similar to the goroutine test for starting/stopping a controller using this code?

@negz
Copy link
Contributor Author

negz commented Mar 17, 2020

Could you also add a test similar to the goroutine test for starting/stopping a controller using this code?

Happy to add a test - are you thinking of a test like the existing one that ensures goroutines are reaped when we stop a manager?

@alexeldeib
Copy link
Contributor

I was thinking of that test case, but doing that for one controller pretty much devolves to the existing manager and controller test cases. Nevermind 👍

@negz
Copy link
Contributor Author

negz commented Mar 17, 2020

It'd be great to provide examples around Configure for new users.

@vincepri Sure. Where is the best place to add examples?

Also, I wonder how far such an example should go? I imagine that few folks will ever call controller.Configure unless they're doing some fairly advanced stuff (i.e. wanting to manage their own controller lifecycles). Something like negz/crossplane@28e76ec illustrates this pattern best, but is quite verbose. Perhaps a good compromise would be something like:

func ConfigureExample(m ctrl.Manager) error {
        // A reconciler that knows how to reconcile *v1alpha1.Example
        r := &ExampleReconciler{}

        // Configure creates a new controller but does not add it to the supplied manager.
	c, err := controller.Configure("example", m, controller.Options{Reconciler: r})
	if err != nil {
		return err
	}

        // Request that our configured controller watch for *v1alpha1.Example
	if err := c.Watch(&source.Kind{Type: &v1alpha1.Example{}}, &handler.EnqueueRequestForObject{}); err != nil {
		return err
	}

        // Create a stop channel for our controller. The controller will stop when this
        // channel is closed.
        stop := make(chan struct{})

        // Start our controller in a goroutine so that we do not block.
	go func() {
                // Block until our controller manager is elected leader. We presume our entire
                // process will terminate if we lose leadership, so we don't need to handle that.
		<-m.Elected()

                // Start our controller. This will block until the stop channel is closed, or the
                // controller returns an error.
		if err := c.Start(stop); err != nil {
			log.Error(err, "cannot run experiment controller")
		}
	}()

	return nil
}

@djzager
Copy link
Contributor

djzager commented Mar 26, 2020

Where is the best place to add examples?

I think, if we decide an example is appropriate, that pkg/controller/example_test.go would be a great place for the kind of example you shared.

Copy link
Contributor

@djzager djzager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work.

pkg/manager/manager.go Show resolved Hide resolved
@alexeldeib
Copy link
Contributor

alexeldeib commented Mar 26, 2020

Can you close the stop channel in the example just to make it super explicit you can individually stop this controller too? I see the comment but code always speaks louder for me

Pending the checked-in example, lgtm

pkg/manager/internal.go Outdated Show resolved Hide resolved
Copy link
Member

@alvaroaleman alvaroaleman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My goal is not to arrive at the ideal API for this concept, but rather to enable the use case with the smallest possible change to controller-runtime

I would like to mention that this is not what we should strive for, this project is a library that needs to be easy to use and understand and be as compatible as possible. Keeping the code change for a given feature small is less important than making sure we expose easy to understand apis we feel comfortable maintaining long term.

That being said, I like how this is implemented, just a couple of nits around godocs and naming.

pkg/controller/controller.go Outdated Show resolved Hide resolved
pkg/manager/internal.go Show resolved Hide resolved
@negz
Copy link
Contributor Author

negz commented Mar 26, 2020

I would like to mention that this is not what we should strive for

@alvaroaleman That's fair. A more direct way for me to frame this would have been "my goal is to enable access to this functionality without blocking on redesigning graceful termination in controller-runtime". I had this in mind because the request for this functionality (#730) was rolled into the broader graceful termination redesign issue (#764).

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Mar 26, 2020
@negz
Copy link
Contributor Author

negz commented Mar 27, 2020

I believe all feedback has now been addressed!

As far as I can tell the main reason we add controllers to the manager is to
ensure that we've been elected leader before we start any controllers. The
downside of this design is that it's not possible to stop individual
controllers, or remove controllers from the manager.

I believe this commit is the minimum possible change necessary to allow
controllers to be started and stopped on-demand. It allows a controller to be
created, started, and stopped without ever being added to the manager. Any
controller that is started separately from the manager must handle its own
leader election.

kubernetes-sigs#730

Signed-off-by: Nic Cope <negz@rk0n.org>
Most fields are set explicitly when we create a new controller, but this one
relied solely on the controller manager performing injection during the Add
phase.

Signed-off-by: Nic Cope <negz@rk0n.org>
Signed-off-by: Nic Cope <negz@rk0n.org>
Signed-off-by: Nic Cope <negz@rk0n.org>
It's more intuitive to close the channel when a process (i.e. election) has
completed, rather than when a state (i.e. leading) has been entered.

Signed-off-by: Nic Cope <negz@rk0n.org>
(Or when no election is desired.)

Signed-off-by: Nic Cope <negz@rk0n.org>
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 22, 2020
@negz
Copy link
Contributor Author

negz commented Apr 22, 2020

@DirectXMan12 I've added a tiny test for the Elected method, but it seems the new apidiff test (which is great - I'd love to set it up for Crossplane) doesn't agree with our judgement in #863 (comment) that growing the method set of the Manager interface should not be considered a breaking API change. 😬

I agree in principle that expanding the method set of a public interface is a breaking change, as API consumers might be defining types that would satisfy the interface before the change, but would not after. However in practice I'm skeptical that there are alternative implementations of the manager.Manager interface in the wild due to its size and specificity. How do you think we should proceed?

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 22, 2020
@k8s-ci-robot
Copy link
Contributor

@negz: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-controller-runtime-apidiff-master f779bdd link /test pull-controller-runtime-apidiff-master

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@alexeldeib
Copy link
Contributor

/hold cancel
/lgtm

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Apr 22, 2020
@k8s-ci-robot k8s-ci-robot merged commit a457e27 into kubernetes-sigs:master Apr 22, 2020
@negz
Copy link
Contributor Author

negz commented Apr 22, 2020

Thanks for helping us get this merged @alexeldeib!

negz added a commit to negz/crossplane-runtime that referenced this pull request Apr 23, 2020
kubernetes-sigs/controller-runtime#863

We'd like to use the above PR, which is not yet included in a controller-runtime
release. Updating controller-runtime requires updating a few other dependencies,
which changed the signature of client-go clientset methods. This commit removes
the only two uses of clientset from crossplane-runtime. pkg/test/integration now
uses a controller-runtime client.Client. pkg/test.Env has been removed, as it no
longer has any known users.

Signed-off-by: Nic Cope <negz@rk0n.org>
@shomron
Copy link

shomron commented Apr 28, 2020

I'm excited about this change but wanted to point out some practicalities that we might want to document:

  • An unmanaged controller still depends on the cache provided by the manager. The manager must be started or the controller will block waiting for the cache.
  • As far as I know, informers do not currently support removal of event handlers. When stopping the controller in the example, I assume its work queue will continue to grow unbounded?

Also - would it make sense to bound an unmanaged controller's lifespan to that of the manager given that the manager shutting down would also shut down dependencies of the controller (e.g. the cache)?

@negz
Copy link
Contributor Author

negz commented Apr 28, 2020

https://pkg.go.dev/github.com/crossplane/crossplane-runtime/pkg/controller

o := kcontroller.Options{Reconciler: r}
c.Start(requirement.ControllerName(p.GetName()), o,
	controller.For(rq, &handler.EnqueueRequestForObject{}),
	controller.For(cp, &EnqueueRequestForRequirement{}))

@shomron we wrote a little wrapper to help with this that couples cache lifecycles to controller lifecycles. I agree it's not ideal, but it does the trick. it seems like there's a broader refactor in the wings for controller lifecycle management which will hopefully improve on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stopping individual controllers in a manager
10 participants