-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ansible] be able to define watch[Dependent,ClusterScoped]Resources as CR annotations #3364
Comments
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
@camilamacedo86 Let me explain how I think things work today (in case I misunderstand something myself), and perhaps make this enhancement request more clear. In short, it is not involving anything about reconcile period. I understand the watches file and the CR annotation can define a reconcile period of X seconds - this allows the playbook to be blindly run every X seconds. So, if reconcilePeriod is set to "60s" that means every 60 seconds the playbook is executed. That is NOT what I am concerned about. Now, I also know the watches file has two other settings called As I understand it, when those settings are true, they will create k8s watches on all resources that the operator creates (cluster scoped and namespace scoped). When those watches are triggered (say, by a user deleting or modifying one of those operator-created resources), the playbook is immediately executed. There is no reconcile period consulted - there is no time delay or "every X seconds" - involved here. Whenever a dependent resource (i.e. an operator-created resource) is touched, the playbook is immediately invoked. That behavior is only enabled or disabled by the operator author via the watches.yaml settings. However, I am asking if it is possible to allow the CR author to enable or disable that functionality by some new annoations, say, something like:
I understand @fabianvf concern that this might turn on or off capabilities that the operator author does not intend. However, in my use case, it would be beneficial to allow a user to enable this capability. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Hi @jmazzitelli, So, by following your suggestion we will need to implement the watch to ignore any resource that for example has the annotation The watch for the dependent ansible files shows be implemented here. And then, it would need to be implemented to have filters for the labels as described in kubernetes-sigs/controller-runtime#244. Also, note that you can indeed disable the watch for specific GVKs by using the feature Example: # ConfigMaps owned by a Memcached CR will not be watched or cached.
- version: v1alpha1
group: cache.example.com
kind: Memcached
role: /opt/ansible/roles/memcached
blacklist:
- group: ""
version: v1
kind: ConfigMap |
I do remember the blacklist feature now that you mention it, but I don't think you can specify a blacklist in the CR (i.e. it can only be declared by the operator author in the
Right. That's what I am proposing. So, following your example there, suppose I have a Memcached CR that looks like this:
This would mean the operator will not watch any dependent resources, EVEN IF the Memcached operator has Conversely, if I annotate the Memcached CR with this instead:
Then I'm telling the Memcached Operator that I want it to watch all dependent resources and re-run its playbook when a dependent resource is touched, EVEN IF the Memcached operator has Now, this gets to the issue that @fabianvf raises. What if an operator is written with the explicit requirement that
When the author sets Therefore, I propose the following: Add two new settings in
Add two new annotations that can be added to a CR:
If the annotation But if Hopefully that makes sense. From the brief discussion I had on this with @fabianvf , he seemed to indicate this kind of thing was possible, with the caveat that it might cause problems if the operator author expects a certain watch-dependent-resource behavior - but perhaps my suggestion for the two "OverrideAllowed" settings in |
Hi @jmazzitelli, I catcher the @fabianvf's point and it shows introduce a complexity not just in the implementation in order to make it maintained and works as you described but for users know how to work with. So, what are the problems and use cases that would be solved/beneficiated with these features? Could you describe what are the scenarios that would justify this introduction? |
Yes. In my particular case - the Kiali Operator - the operator works perfectly fine without any periodic reconciliation (reconcilePeriod=0) and without any dependent-resource watching (watchDependentResources=false, watchClusterScopedResources=false). In other words, the Kiali Operator does not require those features in order for the Kiali Operator to function properly. The Kiali Operator sits dormant for the vast majority of the time (taking up very little cluster resources) until someone creates or modifies a Kiali CR. I like Kiali Operator not having to consume resources when it doesn't have to. However, I had a user that wanted the operator to run a reconciliation when, say, someone accidentally/purposefully deleted the Kiali Deployment - he wanted the Deployment to get resurrected by the operator automatically. A perfect reason to turn on either periodic reconciliation or dependent-resource-watching. Because of this feature request, I wrote up this issue here. But I noticed that when I turned on periodic reconciliation in watches.yaml (e.g. "reconcilePeriod=60s"), obviously my operator's playbook is run EVERY 60 seconds, even when it doesn't need to (and remember, Kiali Operator works fine without periodic reconciliation). So my operator immediately went from using almost no cluster resources (e.g. CPU/memory) the vast majority of time to needing resources every minute; and this is bad especially considering that each run of the Kiali Operator's playbook could take 30 seconds or even a minute to complete (thus the Kiali Operator would effectively be continually running its playbook non-stop). So I did not want to enable that by default. I could tell the user to turn on periodic reconciliation via the CR annotation, but that would mean that user would be seeing the Kiali Operator take up those resources every minute and I didn't think that would be good. The alternative would be to change it to something like 5 minutes, but.. meh... OK, so then I said to myself, "why not try this dependent-resource-watch feature?" So I went back and set reconcilePeriod back to "0" but enabled watchDependentResources=true and watchClusterScopedResources=true (note: my operator does create and manage cluster-scoped resources). And that worked better. Now the operator stays dormant until either the CR is created or modified OR one of the resources that operator creates is touched. That's exactly what the user wanted. However, there is one thing that bugs me about how that feature works. I noticed that every time a new CR is created (i.e. a Kiali CR is created to indicate that the operator needs to install an operand - the Kiali Server) my playbook is invoked twice. I talked to @fabianvf about this to see if I can find out what event causes this second invocation with the hope of being able to quickly abort the playbook run if I can determine it is an unimportant event that doesn't require the playbook to re-run, but he told me the SDK does not even log what event or dependent resource triggered the playbook invocation. So I cannot tell you exactly why this immediate second invocation happens, but it happens, and the operator has no way of determining if it should circumvent the full playbook reconciliation run. And notice that my playbook takes about 30 to 60 seconds to fully complete (even when it changes nothing). So it is annoying to effectively have the Kiali Operator playbook run twice when a Kiali CR is created, rather than just the one time that is necessary in order to install the operand at CR creation time. So - the solution to this would be - I'll turn off dependent-resource watch by default. Only if a user WANTS this capability to reconcile changes to dependent resources (with the negative side effect of having the playbook triggered multiple times, thus the operator taking more time and resources before going "dormant" again), then the user merely has to annotate his CR with these annotations we are talking about: ansible.operator-sdk/watch-dependent-resources and Thus the best of both worlds - the operator can be configured, by default, to use very little cluster resources and will only run when absolutely necessary (at CR creation or modification time) but the user can configure the operator to do reconciliation of dependent resources if they want that additional capability (at the expense of having the operator use more resources). This trade-off is why I would like the flexibility of being able to turn on and off the dependent-resource-watching feature by the user. |
Hi @jmazzitelli, So, IMO we might need to address these via docs instead of that such as make clear to the users that:
Example:
In this way, users probably would not end up in this scenario and needing these features. However, if users still requiring this kind of customizations it can be addressed with the hybrid operators. See: #670 @fabianvf wdyt? |
Is the suggestion that users of Kiali build their own operator (extending the existing Kiali Operator) if they want to optionally perform the dependent resource watch? I certainly do not want to ask my users to build an operator. Just getting them to deploy and use an operator is a difficult task (do you know how many times people have asked me to just provide a helm chart for my operand? People are not enthusiastic when asked to just USE an operator - I shutter to think what the reaction would be if we tell them they have to build their own just to get an operand to install). My users only care about "I want to install Kiali." Today, they have to install the Kiali Operator and create a Kiali CR in a watch namespace to see the operand get installed somewhere. Most of my users (I contend, the majority of them) don't even know what an operator is, let alone how to build one. I think it is a non-starter if a solution includes asking operand users to build their own operators. That suggestion in the previous comment's example:
requires the user to actually build an operator. Of course, for Red Hat customers, this is a non-starter because their operator will not show up in the OperatorHub as a "Red Hat Provided" operator and obviously will not be supported as one. So for these reasons I really think this is not a viable option. |
Hi @jmazzitelli, Just to clarifies. When I say users, I mean SDK users and not users of some project built with this tool. Ansible Operator SDK developers can build their projects by persuading good practices. In this way, IMO the problem described that motivates you to raise the RFE could be solved/addressed by improving the docs to make more clear some recommendations. Otherwise, it also can be addressed with the hybrid operators. See: #670 |
So would the recommendation be something like: set reconcilePeriod to some large value (say, 15 minutes) but also turn on dependent-resource-watching? I actually don't think I need both - I just need one or the other. But each has pros and cons - and I would prefer to give the user the option/authority to pick which one they want, rather than me as the operator author to impose that choice upon the user (e.g. I don't want to force the user to use extra cluster resources if that is something the user doesn't want to compromise on). |
Doing a drive-by here (@fabianvf shoot this down if this isn't what you have in mind) This makes sense to me, but the implementation is going to look at little different than what the UX makes it seem like is happening. If a watch is setup for a particular resource type, events will come in regardless of whether the CR has this new annotation. So when the event comes in, we would lookup the owner, check its annotation, and only then allow the event to trigger a reconciliation. The naive implementation would require that Doing it the other way around (default to NOT watching dependent resources, but allow it if a CR annotation is set to |
@joelanford ah.. yeah, I didn't think about the implementation of my suggestion. Now, considering I do not know anything about the ansible operator SDK implementation details, what I say here may be naive but -- couldn't the SDK wait to decide if it needs to watch the dependent resources until after the CR is initially created? Because (a) the resources aren't created until the CR is created and (b) but at that point, the CR annotations are also known. So, the operator could have watchDependentResources set to false in watches.yaml - thus the SDK defaults to not watching. But then when the CR is created (with the annotation turning watch on) - at that point, the SDK sees the resources need to be watched - and it is at that same time the SDK triggers the playbook (so it can set a flag of some sort to say, "anything this playbook creates needs to be watched"). Ignore me if this makes no sense because, as I say, I'm oblivious to the SDK Go implementation code :) |
I think that would be problematic. Consider this scenario:
|
Yes, I see your point. Though, couldn't the SDK check, "there is no annotation on the CR that owns the Secret (CR2), and the watches.yaml value is "false" so even though I got an event for this CR2 Secret resource, the owning resource (CR2) desired behavior (based on no annotation and global watches.yaml) dictates that the playbook should not be triggered right now" |
It could. I'm just pointing out that we need to be careful with the implementation to handle these cases (and write unit tests to verify them!) |
My suggestion to address the scenario which motivates the RFE is: #3384. IHMO is that this kind of customizations would more appropriately done in the specific cases by the owner projects when/if they find required via the hybrid operators instead of provided by default. |
@camilamacedo86 I finally understand what you are suggesting to do. You are saying, "Rather than have the user add annotations to the CR, have the user create a different Kind of CR to indicate what behavior they want". So in the Memcached example, I could create this:
Or this:
and the operator would watch for the different CRDs (one to indicate dependent resource watching is on, one with it off). I'm meh on that because (a) it adds more CRDs that my operator has to be associated with (what happens when I need to bump up my CRD versioning - I now have 2 I need to bump and manage). I just don't like having more CRDs that I have to maintain than I have to. And (b) to me it is confusing/misleading to have different CRDs just to be able to alter this watch behavior (to me, a different CRD would indicate a different operand - not simply just a small change in operator implementation behavior - that seems the classic purpose for annotations). Also, what if the operator utilizes the built in variable "_$GROUP_$KIND" - now the operator has to be refactored so it can handle the different versions of CRDs (e.g. It just seems to me this triggering of the reconciliation (i.e. the calling of the playbook) on changes to a dependent resource is very analogous to the annoation "reconcile-period" - it seems to make sense to have the same mechanism (annotations) to trigger the calling of the playbook when a dependent resource is touched just as when you want some arbitrary time period expiring do the same thing. |
Hi @jmazzitelli, Just to clarifies my POV over The suggestion is NOT for developers only be able to have this watch behaviour. The suggestion is for developers following good practices. See that when a CR/CRD is created/defined means a new resource/class(representation) with these attributes. So, if we use the same class to define all the different sorts of resources, we may be hurting concepts such as encapsulation, single responsibility principle and almost for sure cohesion that may cause unexpected sides effects such difficulty to read, maintain/improve and extend, just to mention a few. Just to clarify my POV, it is the same as defining a User class which will have attributes related to other objects and do not respect its domains. For example; add to the Client Kind attributes that are part of a Shoe Kind domain. Particularly I'd recommend you persuade the DDD concept to design the solutions. |
**Description of the change:** - add info over reconcilePeriod similar to the sync period attribute of [manager.Options](https://godoc.org/sigs.k8s.io/controller-runtime/pkg/manager#Options) to make clear that users should avoid small periods - clarifies that users are able to create many CRD's and use the APIs to persuade the concepts such as single responsibility, cohesion and encapsulation which would bring more flexibility and optimization since they are able to do configurations per GKV via the watches file. **Motivation for the change:** #3364
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
I still would like this feature. However, I hacked around it myself by creating my own setting in the CR to be able to turn on and off reconciliation manually if the user sees the operator going into infinite reconciliation loops (typically this happens due to bugs in the operator). Yes, it's an ugly hack :} |
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
Rotten issues close after 30d of inactivity. Reopen the issue by commenting /close |
@openshift-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Today, a user can modify the reconcile period by annotating the CR with
ansible.operator-sdk/reconcile-period: "120s"
. The default is defined in thewatches.yaml
setting calledreconcilePeriod
.There is no analogous CR annotation to disable/enable dependent resource watching. For example, in
watches.yaml
there are settings calledwatchDependentResources
andwatchClusterScopedResources
. This enhancement request is to create annotations that can be put on the CR to set those values in the same way the reconcile-period annotation can override the watches.yaml's reconcilePeriod setting.From @fabianvf on slack:
The text was updated successfully, but these errors were encountered: