Add and remove watches at runtime #1884

jcanseco · 2022-05-03T22:29:36Z

Opening a new issue to raise visibility of my question here.

We have a very similar use-case as that linked issue, and we'd like to know how one can actually remove watches at runtime.

FillZpp · 2022-05-05T07:05:01Z

@jcanseco Could you please describe your use-case that needs to remove watches(informers) dynamically?

jcanseco · 2022-05-06T00:20:37Z

@FillZpp sure.

Suppose you have a Resource A that depends on Resource B. We'd like to immediately reconcile Resource A once Resource B is Ready.

We were hoping to do that by having the Resource A Controller create a Watch on Resource B, so that we can enqueue Resource A for immediate reconciliation once Resource B is ready. Once done, we'd like to remove the Watch since it's no longer needed.

Please let us know if there is a better way to resolve this problem as well.

FillZpp · 2022-05-06T02:53:13Z

Are A and B the resource kinds or the object names?

I suppose they are the resource kinds. So you firstly watch two kinds, but you only care about ONE object of the kind B. After watching the only object of kind B, your controller triggers to reconcile kind A and no longer needed to watch kind B.

Do I understand correctly?

jcanseco · 2022-05-06T04:46:33Z

Ah yes apologies for not being precise enough. That is correct.

To be particularly precise: Resource A and B are two particular objects (i.e. uniquely identified by the combination of their GVK, Namespace, and Name).

Typically in our case, the GVKs of Resource A and B are different.

FillZpp · 2022-05-06T05:34:13Z

Understand. Another question is, is there only one particular object of Resource B? Or there are multiple objects of Resource B, but you only need to watch the particular one of them?

How about this:

If there is only one object of Resource B, I think you don't have to stop and remove its informer.
If there are multiple objects, you could use SelectorByObject to make sure your cache only watch the particular object and then don't have to stop and remove it.

jcanseco · 2022-05-06T05:44:00Z

By Resource B, are you referring to just the GVK? If so, yes, there can be multiple objects with GVK B, but we only need to watch one particular object (i.e. a particular GVK + Namespace + Name combination).

I am not sure I understand the suggestion -- can you elaborate?

Perhaps another important info to add is: We have a controller manager with multiple controllers. Each controller manages all the objects with particular GVK. Our controller manager manages both GVK A and GVK B. All the controllers, as I understand, share the same cache.

FillZpp · 2022-05-06T05:59:05Z

We have a controller manager with multiple controllers. Each controller manages all the objects with particular GVK. Our controller manager manages both GVK A and GVK B. All the controllers, as I understand, share the same cache.

Oh, so you have multiple controllers in your operator and different controllers may care about different objects of Resource B. In this way, you are not going to stop or remove the whole informer of B, but just want to remove a watch for a controller?

However, there is no way to remove listeners in informer, which you can not remove the watch. The only thing you can do is to ignore all events in the watch handler after it has got the particular object.

jcanseco · 2022-05-06T19:02:57Z

We have multiple controllers in our operator, and different controllers care about different Resources (GVKs). So there is one controller managing all objects of Resource A, one controller managing all objects of Resource B, etc.

We want Controller A to create a temporary Watch on Object B in order to know when to enqueue Object A for immediate reconciliation.

However, there is no way to remove listeners in informer, which you can not remove the watch. The only thing you can do is to ignore all events in the watch handler after it has got the particular object.

This is the same conclusion I've come to, but wouldn't the suggested solution be problematic since it would effectively result in leaked memory? Note that we'll have to create a lot of these temporary watches.

What do you think of this idea:

Have Controller A Watch Object B through a custom Source.
The custom Source works a lot like source.Kind, but it creates its own new SharedInformer, and it contains a stop channel that can be signalled from outside the Source.
Once Object B is Ready, the event handler signals the Source to stop. The Source then stops its SharedInformer.

Is this feasible? I am not sure if you can create a brand new SharedInformer completely separate from the ones used by the controllers, and I am not sure if it's a good idea to potentially create a lot of independent SharedInformers even if only temporarily.

FillZpp · 2022-05-07T03:15:11Z

@jcanseco If so, you may just have to use the channel source with generic event channel to trigger Controller A from Controller B. There is no need to create a signal informer.

// a global channel
var triggerChan = make(chan event.GenericEvent, 1)

// Controller A, watches the channel source and enqueue Resource A in the handler.
// Note that the event will be generic event and should be handled by Generic() of handler.
...Watches(&source.Channel{Source: triggerChan}, &handler)

// Controller B, once Object B is ready, put it into channel with Once
var once sync.Once
once.Do(func() { triggerChan <- event.GenericEvent{Object: ...} })

jcanseco · 2022-05-07T03:54:32Z

Ok, let me try to make sure I understand. Are you suggesting for us to have all controllers watch one channel (triggerChan), and then use that channel to allow Controller B to notify all the other controllers (e.g. Controller A) who are waiting for Object B to be ready?

If so, how should we make sure Controller A only handles events for Object B temporarily (i.e. when Object A is waiting for Object B) -- I assume we'll need to maintain a data structure for Controller A that lists the objects it's waiting for currently, and have the event handler filter on those -- would this be the approach you'd take, or is there a better way?

jcanseco · 2022-05-07T03:57:02Z

Also, a different question:

I know you said we should consider looking into using source.Channel instead of creating a new SharedInformer for each temporary watch -- we will definitely consider this.

As a backup option though, I am still curious if it is possible and sensible to create new SharedInformers every time we want to create a temporary watch as described in my previous comment? We already kind of have a prototype that does this, so I am wondering if we can use this a potential fallback option.

FillZpp · 2022-05-07T08:16:48Z

Ok, let me try to make sure I understand. Are you suggesting for us to have all controllers watch one channel (triggerChan), and then use that channel to allow Controller B to notify all the other controllers (e.g. Controller A) who are waiting for Object B to be ready?

Emm... So there is only Controller A or multiple controllers that all wait for Resource B to be Ready?

If there are multiple controllers, you can wait for the temporary channel to be closed in each controller, which means Resource B has been Ready.

// There is a global channel which will be closed when Resource B is Ready.
var tempChan = make(chan struct{})

// Controller B, close the tempChan only once when Resource B is Ready.
once.Do(func() { close(tempChan) })

// In each controller that wait for B Ready, use func channel and wait for tempChan to be closed
Watches(source.Func(func(ctx context.Context, _ handler.EventHandler, q workqueue.RateLimitingInterface, _ ...predicate.Predicate) error {
    go func() {
        select {
        case <-tempChan:  // wait for channel to be closed
        case <-ctx.Done():
            // cache has been canceled or closed
            return
        }
        q.Add(...)  // enqueue Resource A or any other resources for other controllers
    }()
    return nil
}), &handler.EnqueueRequestForObject{})

I don't really recommend you to create a new SharedInformer for each temporary watch, which will bring pressure on both apiserver and operator. But it depends on your scenario, I don't know much about your controllers and the relationship between the resources.

jcanseco · 2022-05-07T20:08:59Z

But it depends on your scenario, I don't know much about your controllers and the relationship between the resources.

Apologies for the confusion. Perhaps it might help to make this more concrete. We are Config Connector -- an operator for Google Cloud resources. We have multiple resource kinds -- each one is managed by a controller. So for example, there is one controller for PubSubSubscription objects, another for PubSubTopic objects.

Objects can reference other objects, forming a DAG. For example, one PubSubSusbcription can reference multiple PubSubTopics, and multiple PubSubSubscriptions can reference the same PubSubTopic.

Take this example: the PubSubSubscription needs to wait for both PubSubTopics to be ready. But there can be another PubSubSubcription that need to wait for one or both of those PubSubTopics as well.

Going back to the suggested solution: I don't think this will work for us since our controllers (e.g. controller A) will need to be able to temporarily watch various objects of various resource kinds throughout the controller's lifecycle (e.g. various objects of resource kind B), so one global channel that is intended to be closed once won't work. We need a more generic solution.

Actually, thinking about it now, using channels to coordinate across controllers by itself won't work since we allow users to create multiple instances of our controller managers running in different pods, intended to manage different namespaces, so we need to be able to coordinate across process boundaries too.

I came up with a working prototype where:

Every controller watches a source.Channel.
When the controller encounters a non-Ready dependency, it creates a temporary Watch (by creating a new dynamic client instead of a new SharedInformer).
Then, once the dependency is ready, the goroutine monitoring the Watch adds a GenericEvent to the source.Channel to trigger reconciliation of the referencing resource, then stops the Watch.

This works. The concern is that we'll have to create temporary watches whenever a dependency is not ready -- I think this might be OK since our watches should be short-lived and are only really needed when users create new resources. We'll also end up saving on a lot of rapid, repeated Get requests which is what we do right now every time a dependency is not ready (due to how failed reconciliations are requeued with exponential backoff).

What do you think though? Are Watches that expensive to be concerned about? Ideally we'd want to rely on using the controller manager's SharedInformer to share watches (at least within the same process), but as you said, there is no way to remove event handlers right now.

grzesuav · 2022-05-21T12:07:47Z

Actually I have similar need for metacontroller - it allows to dynamically add / remove kinds which users want to watch. Not it is using some internal implementation of SharedInformers, I would be happy to switch to something from common libraries (client-go or controller runtime) if this functionality will be added

FillZpp · 2022-05-23T02:07:39Z

Actually I have similar need for metacontroller - it allows to dynamically add / remove kinds which users want to watch. Not it is using some internal implementation of SharedInformers, I would be happy to switch to something from common libraries (client-go or controller runtime) if this functionality will be added

@grzesuav If I understand correctly, what you want is to add/remove informers of some GVKs dynamically, but @jcanseco wants to keep the informers but only remove some registered event handlers from the informers.

grzesuav · 2022-05-23T09:14:23Z

Actually (I typed from my phone) I want the same, in the metacontroller RemoveEventHandler is the only method added to interface - https://github.com/metacontroller/metacontroller/blob/master/pkg/dynamic/informer/informer.go#L51

FillZpp · 2022-05-24T03:44:20Z

Actually (I typed from my phone) I want the same, in the metacontroller RemoveEventHandler is the only method added to interface - https://github.com/metacontroller/metacontroller/blob/master/pkg/dynamic/informer/informer.go#L51

Okey, I understand. That's a little different that RemoveEventHandlers() from metacontroller only remove your own handlers, while others handlers in the underlying shared informer won't be removed. But in controller-runtime, whose informer is only shared by controllers in the same process, one controller can not remove all handlers registered by multiple controllers. And how could a controller declare to remove its own handlers, that's a problem if we are going to support it in client-go and controller-runtime.

rnsv · 2022-06-05T16:03:28Z

I have a similar requirement where Controller-A watches for certain events and dynamically creates additional controllers like this
ctrlruntime.NewControllerManagedBy(manager). Named("test"). For(&Unstructured{}). Complete(reconciler)
How do I delete this controller so that I don't receive reconciles for the object anymore. I am looking for a function like
ctrlruntime.RemoveControllerManagedBy(manager, controller)

Any help is appreciated

k8s-triage-robot · 2022-09-03T16:28:28Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

alexzielenski · 2022-09-07T22:18:52Z

PR to allow individual event handlers to be removed from SharedIndexInformer merged here:

kubernetes/kubernetes#111122

cc @FillZpp

FillZpp · 2022-09-08T02:29:04Z

@alexzielenski That's great, thanks! I'm sorry I was going to help out looking at that PR, but I'm too busy these months...

So we just have to wait for a new release tag of client-go, and then it can be used in c-r.

/remove-lifecycle stale

p53 · 2022-10-12T12:42:29Z

@FillZpp v0.25.2 should contain remove method

aclevername · 2022-11-09T12:46:25Z

@FillZpp v0.25.2 should contain remove method

Hey @FillZpp 👋 thanks for helping with this issue! Now the new client-go release is out have the change been made to c-r? My usecase is the same as @rnsv, where we dynamically run ctrlruntime.NewControllerManagedBy(manager). ..... Complete(reconciler) and would love to be able to then remove the reconciler from the manager when we are done with it. Thanks!

jnan806 · 2022-11-09T14:13:10Z

@FillZpp

So we just have to wait for a new release tag of client-go, and then it can be used in c-r.

Yeah. I also has a problem with it. And I am waiting for this feature too.

- temporary fix until kubernetes-sigs/controller-runtime#1884 is resolved once resolved, delete dynamic controller rather than disable Co-authored-by: Abby Bangser <abby@syntasso.io>

aclevername · 2022-12-12T16:46:19Z

@FillZpp any update on this 😄 ? I'd be happy to give it a go at implementing this if you can point me in the right places to work on 😄

FillZpp · 2022-12-14T03:55:26Z

@aclevername I have posted #2099 , but it still needs some more tests, I will add them later.

Guys @jcanseco @grzesuav @rnsv @jnan806 , K8s 1.26 has released and we could finally achieve this. You can also help out to review the PR and see if it can satisfy your needs.

jnan806 · 2022-12-14T06:01:53Z

Great !

grzesuav · 2022-12-14T18:37:24Z

@FillZpp thanks! From controller perspective it look ok. In case of Source, an example how it can be used would be useful, I am not as familiar with controller runtime sourcecode to asses it.

odacremolbap · 2023-01-04T08:28:26Z

@FillZpp thanks for this, I am looking forward the PR to get merged.
If you think I might be able to help with it, please reach out.

k8s-triage-robot · 2023-04-04T08:41:45Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2023-05-04T09:16:58Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

sbueringer · 2023-05-04T09:19:03Z

/remove-lifecycle rotten

/lifecycle active

Work in progress

k8s-triage-robot · 2024-01-19T16:00:02Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2024-02-18T16:00:14Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2024-03-19T16:52:01Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot · 2024-03-19T16:52:06Z

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

FillZpp mentioned this issue Jun 9, 2022

support removal of event handlers from SharedIndexInformers kubernetes/kubernetes#98657

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 3, 2022

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 8, 2022

ebaron mentioned this issue Oct 11, 2022

fix(watches): update object types watched by controller cryostatio/cryostat-operator#468

Merged

FillZpp mentioned this issue Nov 9, 2022

Support shutdown controller to shutdown watching like SetupWithManager() to start watching. #2047

Closed

jnan806 mentioned this issue Nov 9, 2022

How to remove watches at runtime | 怎样在运行过程中，移除对 k8s 中 CRD的watcher opensergo/opensergo-control-plane#23

Open

FillZpp mentioned this issue Dec 14, 2022

⚠ Support shutdown controllers and watches dynamically #2099

Closed

maxsmythe mentioned this issue Jan 25, 2023

chore: Upgrade to k8s v0.26.1 and controller-runtime fork open-policy-agent/gatekeeper#2530

Merged

FillZpp mentioned this issue Feb 2, 2023

⚠ Support shutdown watches dynamically (v2) #2159

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 4, 2023

maxsmythe mentioned this issue Apr 22, 2023

🌱 Proposal for dynamic informer cache #2285

Merged

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 4, 2023

k8s-ci-robot added lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels May 4, 2023

thbkrkr mentioned this issue May 25, 2023

Update module sigs.k8s.io/controller-runtime to v0.15.0 elastic/cloud-on-k8s#6847

Merged

natasha41575 mentioned this issue Jun 26, 2023

feat: helm api support for reading valuesFiles from ConfigMaps GoogleContainerTools/kpt-config-sync#702

Merged

jonathan-innis mentioned this issue Dec 22, 2023

Karpenter NodeClaim Disruption spec.nodeClassRef Drift Detection kubernetes-sigs/karpenter#337

Closed

k8s-ci-robot added lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. and removed lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. labels Jan 19, 2024

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 18, 2024

k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 19, 2024

Add and remove watches at runtime #1884

Add and remove watches at runtime #1884

Comments

jcanseco commented May 3, 2022

FillZpp commented May 5, 2022

jcanseco commented May 6, 2022

FillZpp commented May 6, 2022

jcanseco commented May 6, 2022

FillZpp commented May 6, 2022

jcanseco commented May 6, 2022

FillZpp commented May 6, 2022

jcanseco commented May 6, 2022 • edited Loading

FillZpp commented May 7, 2022 • edited Loading

jcanseco commented May 7, 2022 • edited Loading

jcanseco commented May 7, 2022 • edited Loading

FillZpp commented May 7, 2022

jcanseco commented May 7, 2022 • edited Loading

grzesuav commented May 21, 2022

FillZpp commented May 23, 2022 • edited Loading

grzesuav commented May 23, 2022

FillZpp commented May 24, 2022

rnsv commented Jun 5, 2022

k8s-triage-robot commented Sep 3, 2022

alexzielenski commented Sep 7, 2022

FillZpp commented Sep 8, 2022

p53 commented Oct 12, 2022

aclevername commented Nov 9, 2022

jnan806 commented Nov 9, 2022 • edited Loading

aclevername commented Dec 12, 2022

FillZpp commented Dec 14, 2022

jnan806 commented Dec 14, 2022

grzesuav commented Dec 14, 2022

odacremolbap commented Jan 4, 2023

k8s-triage-robot commented Apr 4, 2023

k8s-triage-robot commented May 4, 2023

sbueringer commented May 4, 2023

k8s-triage-robot commented Jan 19, 2024

k8s-triage-robot commented Feb 18, 2024

k8s-triage-robot commented Mar 19, 2024

k8s-ci-robot commented Mar 19, 2024

jcanseco commented May 6, 2022 •

edited

Loading

FillZpp commented May 7, 2022 •

edited

Loading

jcanseco commented May 7, 2022 •

edited

Loading

jcanseco commented May 7, 2022 •

edited

Loading

jcanseco commented May 7, 2022 •

edited

Loading

FillZpp commented May 23, 2022 •

edited

Loading

jnan806 commented Nov 9, 2022 •

edited

Loading