Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add and remove watches at runtime #1884

Closed
jcanseco opened this issue May 3, 2022 · 36 comments
Closed

Add and remove watches at runtime #1884

jcanseco opened this issue May 3, 2022 · 36 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@jcanseco
Copy link
Contributor

jcanseco commented May 3, 2022

Opening a new issue to raise visibility of my question here.

We have a very similar use-case as that linked issue, and we'd like to know how one can actually remove watches at runtime.

@FillZpp
Copy link
Contributor

FillZpp commented May 5, 2022

@jcanseco Could you please describe your use-case that needs to remove watches(informers) dynamically?

@jcanseco
Copy link
Contributor Author

jcanseco commented May 6, 2022

@FillZpp sure.

Suppose you have a Resource A that depends on Resource B. We'd like to immediately reconcile Resource A once Resource B is Ready.

We were hoping to do that by having the Resource A Controller create a Watch on Resource B, so that we can enqueue Resource A for immediate reconciliation once Resource B is ready. Once done, we'd like to remove the Watch since it's no longer needed.

Please let us know if there is a better way to resolve this problem as well.

@FillZpp
Copy link
Contributor

FillZpp commented May 6, 2022

Are A and B the resource kinds or the object names?

I suppose they are the resource kinds. So you firstly watch two kinds, but you only care about ONE object of the kind B. After watching the only object of kind B, your controller triggers to reconcile kind A and no longer needed to watch kind B.

Do I understand correctly?

@jcanseco
Copy link
Contributor Author

jcanseco commented May 6, 2022

Ah yes apologies for not being precise enough. That is correct.

To be particularly precise: Resource A and B are two particular objects (i.e. uniquely identified by the combination of their GVK, Namespace, and Name).

Typically in our case, the GVKs of Resource A and B are different.

@FillZpp
Copy link
Contributor

FillZpp commented May 6, 2022

Understand. Another question is, is there only one particular object of Resource B? Or there are multiple objects of Resource B, but you only need to watch the particular one of them?

How about this:

  • If there is only one object of Resource B, I think you don't have to stop and remove its informer.
  • If there are multiple objects, you could use SelectorByObject to make sure your cache only watch the particular object and then don't have to stop and remove it.

@jcanseco
Copy link
Contributor Author

jcanseco commented May 6, 2022

By Resource B, are you referring to just the GVK? If so, yes, there can be multiple objects with GVK B, but we only need to watch one particular object (i.e. a particular GVK + Namespace + Name combination).

I am not sure I understand the suggestion -- can you elaborate?

Perhaps another important info to add is: We have a controller manager with multiple controllers. Each controller manages all the objects with particular GVK. Our controller manager manages both GVK A and GVK B. All the controllers, as I understand, share the same cache.

@FillZpp
Copy link
Contributor

FillZpp commented May 6, 2022

We have a controller manager with multiple controllers. Each controller manages all the objects with particular GVK. Our controller manager manages both GVK A and GVK B. All the controllers, as I understand, share the same cache.

Oh, so you have multiple controllers in your operator and different controllers may care about different objects of Resource B. In this way, you are not going to stop or remove the whole informer of B, but just want to remove a watch for a controller?

However, there is no way to remove listeners in informer, which you can not remove the watch. The only thing you can do is to ignore all events in the watch handler after it has got the particular object.

@jcanseco
Copy link
Contributor Author

jcanseco commented May 6, 2022

We have multiple controllers in our operator, and different controllers care about different Resources (GVKs). So there is one controller managing all objects of Resource A, one controller managing all objects of Resource B, etc.

We want Controller A to create a temporary Watch on Object B in order to know when to enqueue Object A for immediate reconciliation.

However, there is no way to remove listeners in informer, which you can not remove the watch. The only thing you can do is to ignore all events in the watch handler after it has got the particular object.

This is the same conclusion I've come to, but wouldn't the suggested solution be problematic since it would effectively result in leaked memory? Note that we'll have to create a lot of these temporary watches.

What do you think of this idea:

  • Have Controller A Watch Object B through a custom Source.
  • The custom Source works a lot like source.Kind, but it creates its own new SharedInformer, and it contains a stop channel that can be signalled from outside the Source.
  • Once Object B is Ready, the event handler signals the Source to stop. The Source then stops its SharedInformer.

Is this feasible? I am not sure if you can create a brand new SharedInformer completely separate from the ones used by the controllers, and I am not sure if it's a good idea to potentially create a lot of independent SharedInformers even if only temporarily.

@FillZpp
Copy link
Contributor

FillZpp commented May 7, 2022

@jcanseco If so, you may just have to use the channel source with generic event channel to trigger Controller A from Controller B. There is no need to create a signal informer.

// a global channel
var triggerChan = make(chan event.GenericEvent, 1)

// Controller A, watches the channel source and enqueue Resource A in the handler.
// Note that the event will be generic event and should be handled by Generic() of handler.
...Watches(&source.Channel{Source: triggerChan}, &handler)

// Controller B, once Object B is ready, put it into channel with Once
var once sync.Once
once.Do(func() { triggerChan <- event.GenericEvent{Object: ...} })

@jcanseco
Copy link
Contributor Author

jcanseco commented May 7, 2022

Ok, let me try to make sure I understand. Are you suggesting for us to have all controllers watch one channel (triggerChan), and then use that channel to allow Controller B to notify all the other controllers (e.g. Controller A) who are waiting for Object B to be ready?

If so, how should we make sure Controller A only handles events for Object B temporarily (i.e. when Object A is waiting for Object B) -- I assume we'll need to maintain a data structure for Controller A that lists the objects it's waiting for currently, and have the event handler filter on those -- would this be the approach you'd take, or is there a better way?

@jcanseco
Copy link
Contributor Author

jcanseco commented May 7, 2022

Also, a different question:

I know you said we should consider looking into using source.Channel instead of creating a new SharedInformer for each temporary watch -- we will definitely consider this.

As a backup option though, I am still curious if it is possible and sensible to create new SharedInformers every time we want to create a temporary watch as described in my previous comment? We already kind of have a prototype that does this, so I am wondering if we can use this a potential fallback option.

@FillZpp
Copy link
Contributor

FillZpp commented May 7, 2022

Ok, let me try to make sure I understand. Are you suggesting for us to have all controllers watch one channel (triggerChan), and then use that channel to allow Controller B to notify all the other controllers (e.g. Controller A) who are waiting for Object B to be ready?

Emm... So there is only Controller A or multiple controllers that all wait for Resource B to be Ready?

If there are multiple controllers, you can wait for the temporary channel to be closed in each controller, which means Resource B has been Ready.

// There is a global channel which will be closed when Resource B is Ready.
var tempChan = make(chan struct{})

// Controller B, close the tempChan only once when Resource B is Ready.
once.Do(func() { close(tempChan) })

// In each controller that wait for B Ready, use func channel and wait for tempChan to be closed
Watches(source.Func(func(ctx context.Context, _ handler.EventHandler, q workqueue.RateLimitingInterface, _ ...predicate.Predicate) error {
    go func() {
        select {
        case <-tempChan:  // wait for channel to be closed
        case <-ctx.Done():
            // cache has been canceled or closed
            return
        }
        q.Add(...)  // enqueue Resource A or any other resources for other controllers
    }()
    return nil
}), &handler.EnqueueRequestForObject{})

I don't really recommend you to create a new SharedInformer for each temporary watch, which will bring pressure on both apiserver and operator. But it depends on your scenario, I don't know much about your controllers and the relationship between the resources.

@jcanseco
Copy link
Contributor Author

jcanseco commented May 7, 2022

But it depends on your scenario, I don't know much about your controllers and the relationship between the resources.

Apologies for the confusion. Perhaps it might help to make this more concrete. We are Config Connector -- an operator for Google Cloud resources. We have multiple resource kinds -- each one is managed by a controller. So for example, there is one controller for PubSubSubscription objects, another for PubSubTopic objects.

Objects can reference other objects, forming a DAG. For example, one PubSubSusbcription can reference multiple PubSubTopics, and multiple PubSubSubscriptions can reference the same PubSubTopic.

Take this example: the PubSubSubscription needs to wait for both PubSubTopics to be ready. But there can be another PubSubSubcription that need to wait for one or both of those PubSubTopics as well.


Going back to the suggested solution: I don't think this will work for us since our controllers (e.g. controller A) will need to be able to temporarily watch various objects of various resource kinds throughout the controller's lifecycle (e.g. various objects of resource kind B), so one global channel that is intended to be closed once won't work. We need a more generic solution.

Actually, thinking about it now, using channels to coordinate across controllers by itself won't work since we allow users to create multiple instances of our controller managers running in different pods, intended to manage different namespaces, so we need to be able to coordinate across process boundaries too.

I came up with a working prototype where:

  1. Every controller watches a source.Channel.
  2. When the controller encounters a non-Ready dependency, it creates a temporary Watch (by creating a new dynamic client instead of a new SharedInformer).
  3. Then, once the dependency is ready, the goroutine monitoring the Watch adds a GenericEvent to the source.Channel to trigger reconciliation of the referencing resource, then stops the Watch.

This works. The concern is that we'll have to create temporary watches whenever a dependency is not ready -- I think this might be OK since our watches should be short-lived and are only really needed when users create new resources. We'll also end up saving on a lot of rapid, repeated Get requests which is what we do right now every time a dependency is not ready (due to how failed reconciliations are requeued with exponential backoff).

What do you think though? Are Watches that expensive to be concerned about? Ideally we'd want to rely on using the controller manager's SharedInformer to share watches (at least within the same process), but as you said, there is no way to remove event handlers right now.

@grzesuav
Copy link

Actually I have similar need for metacontroller - it allows to dynamically add / remove kinds which users want to watch. Not it is using some internal implementation of SharedInformers, I would be happy to switch to something from common libraries (client-go or controller runtime) if this functionality will be added

@FillZpp
Copy link
Contributor

FillZpp commented May 23, 2022

Actually I have similar need for metacontroller - it allows to dynamically add / remove kinds which users want to watch. Not it is using some internal implementation of SharedInformers, I would be happy to switch to something from common libraries (client-go or controller runtime) if this functionality will be added

@grzesuav If I understand correctly, what you want is to add/remove informers of some GVKs dynamically, but @jcanseco wants to keep the informers but only remove some registered event handlers from the informers.

@grzesuav
Copy link

Actually (I typed from my phone) I want the same, in the metacontroller RemoveEventHandler is the only method added to interface - https://github.com/metacontroller/metacontroller/blob/master/pkg/dynamic/informer/informer.go#L51

@FillZpp
Copy link
Contributor

FillZpp commented May 24, 2022

Actually (I typed from my phone) I want the same, in the metacontroller RemoveEventHandler is the only method added to interface - https://github.com/metacontroller/metacontroller/blob/master/pkg/dynamic/informer/informer.go#L51

Okey, I understand. That's a little different that RemoveEventHandlers() from metacontroller only remove your own handlers, while others handlers in the underlying shared informer won't be removed. But in controller-runtime, whose informer is only shared by controllers in the same process, one controller can not remove all handlers registered by multiple controllers. And how could a controller declare to remove its own handlers, that's a problem if we are going to support it in client-go and controller-runtime.

@rnsv
Copy link

rnsv commented Jun 5, 2022

I have a similar requirement where Controller-A watches for certain events and dynamically creates additional controllers like this
ctrlruntime.NewControllerManagedBy(manager). Named("test"). For(&Unstructured{}). Complete(reconciler)
How do I delete this controller so that I don't receive reconciles for the object anymore. I am looking for a function like
ctrlruntime.RemoveControllerManagedBy(manager, controller)

Any help is appreciated

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 3, 2022
@alexzielenski
Copy link
Contributor

PR to allow individual event handlers to be removed from SharedIndexInformer merged here:

kubernetes/kubernetes#111122

cc @FillZpp

@FillZpp
Copy link
Contributor

FillZpp commented Sep 8, 2022

@alexzielenski That's great, thanks! I'm sorry I was going to help out looking at that PR, but I'm too busy these months...

So we just have to wait for a new release tag of client-go, and then it can be used in c-r.

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 8, 2022
@p53
Copy link

p53 commented Oct 12, 2022

@FillZpp v0.25.2 should contain remove method

@aclevername
Copy link

@FillZpp v0.25.2 should contain remove method

Hey @FillZpp 👋 thanks for helping with this issue! Now the new client-go release is out have the change been made to c-r? My usecase is the same as @rnsv, where we dynamically run ctrlruntime.NewControllerManagedBy(manager). ..... Complete(reconciler) and would love to be able to then remove the reconciler from the manager when we are done with it. Thanks!

@jnan806
Copy link

jnan806 commented Nov 9, 2022

@FillZpp

So we just have to wait for a new release tag of client-go, and then it can be used in c-r.

Yeah. I also has a problem with it. And I am waiting for this feature too.

aclevername added a commit to syntasso/kratix that referenced this issue Nov 15, 2022
- temporary fix until kubernetes-sigs/controller-runtime#1884 is resolved
  once resolved, delete dynamic controller rather than disable

Co-authored-by: Abby Bangser <abby@syntasso.io>
@aclevername
Copy link

@FillZpp any update on this 😄 ? I'd be happy to give it a go at implementing this if you can point me in the right places to work on 😄

@FillZpp
Copy link
Contributor

FillZpp commented Dec 14, 2022

@aclevername I have posted #2099 , but it still needs some more tests, I will add them later.

Guys @jcanseco @grzesuav @rnsv @jnan806 , K8s 1.26 has released and we could finally achieve this. You can also help out to review the PR and see if it can satisfy your needs.

@jnan806
Copy link

jnan806 commented Dec 14, 2022

Great !

@grzesuav
Copy link

@FillZpp thanks! From controller perspective it look ok. In case of Source, an example how it can be used would be useful, I am not as familiar with controller runtime sourcecode to asses it.

@odacremolbap
Copy link

@FillZpp thanks for this, I am looking forward the PR to get merged.
If you think I might be able to help with it, please reach out.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 4, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 4, 2023
@sbueringer
Copy link
Member

/remove-lifecycle rotten

/lifecycle active

Work in progress

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. and removed lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. labels Jan 19, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 18, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

Successfully merging a pull request may close this issue.