-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] 🌱 Controller Lifecycle Management #1180
[WIP] 🌱 Controller Lifecycle Management #1180
Conversation
Welcome @kevindelgado! |
Hi @kevindelgado. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
d4debf9
to
aca96ee
Compare
Adding WIP, recent change to prevent controllers from starting more than once (#1139) breaks this. Need to find a way to allow conditional runnables to set the controller's started field to false upon CRD uninstall. |
pkg/manager/internal.go
Outdated
} | ||
|
||
cm.startedLeader = true | ||
} | ||
|
||
func (cm *controllerManager) startConditionalRunnables() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @kevindelgado , thanks for this PR. It seems to be its a bit chaotic and involves a set of semi-related changes, like the removal of informers which is IIRC not something we have agreed on and definitely something for a distinct PR.
As for the conditionanally runnables
idea, a much simpler approach would be to just check whatever you want to check before adding them to the manager
:
if clusterHasMyCRD(){
controller.New()
}
This also has the advantage that you can check anything, not whatever happens to be implemented for this in the manager
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback @alvaroaleman!
We’re definitely open to better ways of doing this, but we are currently doing what you suggest. Outside of controller-runtime we conditionally start the manager based on the existence of the CRD. A couple issues we see are:
- Increasing complexity. Specifically, when multiple controllers are abstracted to appear as a single controller or when the manager is running multiple controllers that watch a mixture of CRDs that will always be installed on the cluster, but also CRDs that are only sometimes installed.
- This approach still isn’t great for fluidly handling CRD install/uninstall/reinstall (rather than just waiting for install before starting), especially when latency becomes important immediately upon CRD installation.
/cc @DirectXMan12 in case they can better explain our desire to push this functionality further down the stack into c-r.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds to me like you should probably build a metacontroller that watches crds and then starts and stops the actual controllers.
If you continue to think this requires changes in controller-runtime, the first step would be a design doc that describes the use-case, why this is problematic with how controller-runtime works currently, what changes would be required and possible alternatives so we can properly discuss this. We have the /designs folder for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing, we’re discussing alternatives and working to create a more compelling story for this and will have a design doc up if/when that is the case.
Sorry for the lack of context here, when learning the codebase it was easier to express ideas through concrete code than in prose initially.
While the code has been helpful to steer discussions and now make it easier to formalize into a design doc, I definitely understand that it appears chaotic and disorganized externally. Feel free to close if you want to reduce clutter in the repo, I can always submit a new one if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can probably move a lot of this into a separate helper runnable that manages its own sub-runnables and managed the start/stop of things. We'd still need a way to start/stop caches though, and possible some other minor tweaks.
At the very least I think it'd be useful to have some basic hooks for better being able to start/stop controllers in CR, especially for multi-cluster usecases, where you don't necessarily want to/can't degrade performance for all clusters just to deal with a change in one. It also provides a motivating minimal usecase for starting/stopping caches
I think one of the things to keep in mind here is that if we go with the approach of "oh, you should just build it as a meta-controller", we need to make sure the hooks are there to do it cleanly, and we need to make sure those hooks are documented as such so that they don't get removed later on down the road.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the very least I think it'd be useful to have some basic hooks for better being able to start/stop controllers in CR, especially for multi-cluster usecases, where you don't necessarily want to/can't degrade performance for all clusters just to deal with a change in one. It also provides a motivating minimal usecase for starting/stopping caches
Sure thing, but today we don't have any official story at all for multi-cluster without the assumption that clusters might come and go. I am not objecting to the basic idea (although I still think that if this is about being fast as mentioned above, a metacontroller would be the best), I just think that this requires a lot of groundwork first.
aca96ee
to
778577e
Compare
778577e
to
052b938
Compare
Fixed the rebasing issues caused by not being able to restart controllers. |
pkg/manager/internal.go
Outdated
if !ok { | ||
log.Error(errors.New("runnable is not a controller"), "") | ||
} | ||
controller.Started = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not a safe operation, IIRC
052b938
to
728317b
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: kevindelgado The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Here’s a major overhaul of the previous approach that hopefully is much less invasive to the existing code that goes along with the design doc at #1192. Gonna bring this up for discussion at the next kubebuilder meeting. If you can take a look at each commit individually, hopefully it comes across as less chaotic. The second one is pretty minimal and is the only change strictly necessary to solving our problem (the second and third are optional and less fleshed out, the first is just a direct rebase and copy of #936). The hope is that we can find some part of this small enough to move forward with, without first having a great story around multi-cluster. |
728317b
to
f0919d2
Compare
I've distilled this PR into just the minimal changes to C-R necessary to enable better controller lifecycle management. The POC for conditional controllers is on this branch https://github.com/kevindelgado/controller-runtime/tree/experimental/conditional-runnables, but it was clear that there's plenty to discuss just on allowing informer removal and the minimal hooks needed use informer removal to start/stop/restart controllers cleanly. The design doc at #1192 has also been updated to reflect this. |
This change adds support for removing individual informers from the cache using the new Remove() method. This is allowed before or after the cache has been started. Informers are stopped at the time of removal - once stopped, they will no longer deliver events to registered event handlers, and registered watched will be aborted. Also adds non-blocking API for getting informers without waiting for their cache to sync - GetInformerNonBlocking(). Signed-off-by: Oren Shomron <shomron@gmail.com>
f0919d2
to
1cc6939
Compare
Updated to reflect updates to the doc #1192, PTAL anyone interested |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@kevindelgado: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
Rotten issues close after 30d of inactivity. Send feedback to sig-contributor-experience at kubernetes/community. |
@fejta-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This is a proof of concept PR for the Controller Lifecycle Management design doc at #1192.
It is: