Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

📖 Add OLMv1 Overview doc #692

Merged
merged 1 commit into from
Mar 26, 2024

Conversation

joelanford
Copy link
Member

Description

Reviewer Checklist

  • API Go Documentation
  • Tests: Unit Tests (and E2E Tests, if appropriate)
  • Comprehensive Commit Messages
  • Links to related GitHub Issue(s)

@joelanford joelanford requested a review from a team as a code owner March 12, 2024 14:35
Copy link

codecov bot commented Mar 12, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 64.01%. Comparing base (38da6fc) to head (9e66fee).
Report is 7 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #692   +/-   ##
=======================================
  Coverage   64.01%   64.01%           
=======================================
  Files          22       22           
  Lines        1370     1370           
=======================================
  Hits          877      877           
  Misses        442      442           
  Partials       51       51           
Flag Coverage Δ
e2e 47.37% <ø> (ø)
unit 58.41% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


### Single-tenant control planes

One choice for customers would be to adopt low-overhead single-tenant control planes (e.g. hypershift?) in which every tenant can have full control over their APIs and controllers and be truly isolated (at the control plane layer at least) from other tenants. With this option, the things OLMv1 cannot do (listed above) are irrelevant, because the purpose of all of those features is to support multi-tenant control planes in OLM.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Kubernetes docs on multi-tenancy also mention the Cluster API, Kamaji, and vcluster as options for creating virtual control planes per tenant for tenant isolation. It might be worth linking to the kubernetes docs and/or some of the projects that are specifically designed for addressing this (hypershift can be included in this but shouldn't be the only example IMO)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think including links to some of these other projects trying to solve the multi-tenancy problem will help get the point across that using something specifically for enabling multi-tenancy is a better solution than trying to force multi-tenancy support into OLMv1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I'll update this. Thanks! Missed the hypershift callout in my RH-specific scrub.


Using the [Operator Capability Levels](https://sdk.operatorframework.io/docs/overview/operator-capabilities/) as a rubric, operators that fall into Level 1 and some that fall into Level 2 are not making full use of the operator pattern. If content authors had the choice to ship their content without also shipping an operator that performs simple installation and upgrades, many supporting these Level 1 and Level 2 operators might make that choice to decrease their overall maintenance and support burden while losing very little in terms of value to their customers.

## What will OLM doo that a generic package manager doesn't?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## What will OLM doo that a generic package manager doesn't?
## What will OLM do that a generic package manager doesn't?

everettraven
everettraven previously approved these changes Mar 12, 2024
Copy link
Contributor

@everettraven everettraven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me. Had a couple comments, but nothing that I think is worth really holding this PR for.


### Watched namespaces cannot be configured in a first-class API

OLMv1 will not have a first-class API for configuring the namespaces that a controller will watch.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is "first-class API" being defined? Does this mean that OLM will not provide this capability itself, but this could be provided by someone else?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that not having a "first-class API" means that there will be no field on the APIs introduced by OLMv1 to set that information explicitly. There is nothing stopping the author of a controller from providing configuration options to users for this or hardcoding the namespaces it should watch.

I don't think we have really talked about this, but I could imagine the APIs introduced by OLMv1 having a kind of "pass through" field where users can use to set some arbitrary values on the manifests deployed by OLM that doesn't prevent this (e.g setting env vars on a deployment).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than saying "not", let's say what it does provide. e.g. "Namespace watching is provided via "

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tmshort This is somewhat nuanced, and I want to make sure that the overall message is clear.

Namespace watching is not provided by OLMv1 at all. However, OLMv1 will support arbitrary configuration with schemas provided by extension authors and values that must match those schemas provided by extension admins. If authors decide to include knobs in their configuration schemas that control scoping, that's fine. But it isn't anything OLMv1 knows about or can build features around.

I call this out with a bit more detail in the Approach -> Don't Fight Kubernetes section further down.

There is one and only one exception. In order to maintain backward compatibility with registry+v1 bundles, the OLM maintainers will define the parameterization schema for registry+v1 bundles and will include the ability to define the watched namespaces. But again, this is particular to this bundle format and is not something that the broader OLM system will have awareness of.

However, Kubernetes does not assume that a controller will be successful when it reconciles an object.

The Kubernetes design assumptions are:
- CRDs and their controllers are trusted cluster extensions.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that a CRD (the API) is global, but it is not clear to me why a controller cannot be namespaced. If the controller is running in a namespace, with a service account that has RBAC limited to the namespace, and only reconciles CRs within that namespace -- I'm not sure I would consider that a cluster extension?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be wrong, but in this case I am interpreting "cluster extension" as referring to it literally being something that extends the functionality of the Kubernetes cluster. I don't think there is anything in Kubernetes preventing a controller from running with reconciliation scoped only to a namespace.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read cluster extension as things that extend the functionally of the cluster, further then what is shipped with a default install of Kubernetes. This has no baring on the scope that an operator author chooses to be implemented within a controller. Even if the controller in question is just namespace scoped, it still extended a k8s clusters functionality.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A controller can be namespaced, but it requires that all controllers for a given CRD manage non-overlapping namespaces. That can be tricky. This is why we are/were considering splitting up "global components" (e.g. CRDs) from potentially "namespace-able components" (i.e. the controllers).


The Kubernetes design assumptions are:
- CRDs and their controllers are trusted cluster extensions.
- If an object for an API exists a controller WILL reconcile it, no matter where it is in the cluster.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a controller is running with a service account with namespaced RBAC, would the controller even see a CR created in another namespace?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. My understanding is that the design assumption we are stating here is that Kubernetes assumes that there is a controller somewhere that will reconcile an object for an API no matter where it is in the cluster. As far as I am aware this doesn't necessarily mean it has to be a single instance of a controller. I do think in general it makes more sense to use a single controller over many namespace specific controllers though.


OLMv1 will make the same assumption that Kubernetes does and that users of Kubernetes APIs do. That is: If a user has RBAC to create an object in the cluster, they can expect that a controller exists that will reconcile that object. If this assumption does not hold, it will be considered a configuration issue, not an OLMv1 bug.

This means that it is a best practice to implement and configure controllers to have cluster-wide permission to read and update the status of their primary APIs. It does not mean that a controller needs cluster-wide access to read/write secondary APIs. If a controller can update the status of its primary APIs, it can tell users when it lacks permission to act on secondary APIs.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would be useful to challenge this. Although APIs are inherently cluster-scoped in Kubernetes, can we achieve some level of multi-tenancy by splitting the APIs from the controllers?

Can a cluster admin make an API available -- with no controller installed with cluster-wide RBAC? Can a namespace admin then install a controller in their namespace with limited RBAC to reconcile CRs in that namespace? This may require allowing for the installation of individual components of a bundle.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we are going to intentionally prevent any of those possible approaches to trying to achieve multi-tenancy but I don't think we are going to intentionally support them either. A user absolutely should be able to do all of those things by creating individual bundles that way, but going that route fights the design of Kubernetes and comes with a host of it's own issues, making it difficult for OLM to be able to effectively handle the automatic life cycling of the controllers. Taking this approach, it will be entirely on the user to understand the impacts of making the decision of using this approach and why OLM may not behave as expected in this case.

For example, I would not see it as a bug if automatic upgrades for a controller were to fail and require manual intervention in this scenario. I would also not see it as a bug if OLM successfully installed multiple controllers reconcile the same resource in the same namespace in this scenario.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that the APIs are global means that there isn't true multi-tenancy of controllers either. All of the controllers for that global API MUST agree on the single API they will all use. Therefore tenants will be limited by the choices made by other tenants when it comes to lifecycling controllers.

As Bryce said, OLMv1 will not get in the way of multiple controller installations, but it also won't help de-conflict between them.


### Dependencies based on watched namespaces

Since there will be no first-class support for configuration of watched namespaces, OLMv1 cannot resolve dependencies among bundles based on where controllers are watching.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have the following scenario:

  • Customer goes to install Operator A at the cluster scope (CRD + controller with cluster-wide RBAC)
  • Operator A has a dependency on Operator B (A will be creating CRs from B and expects B to reconcile them)

I'm not sure why the absence of a watched namespaces concept prevents this dependency fulfillment.

OLMv1, given its limited RBAC, can likely only see that CRD B is not present. If it was present, OLMv1 couldn't see if controller B is available and reconciling.

OLMv1 however, if its RBAC was limited to a single namespace, could create a CR for B and see if controller B picks it up and sets some status field. This would give OLMv1 the information it needs about whether controller B is running and reconciling at the cluster scope (which as presented is the recommend -- possibly only? -- install mode).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Customer goes to install Operator A at the cluster scope (CRD + controller with cluster-wide RBAC)

OLMv1 will be completely unaware of the scoping configuration of Operator A. It doesn't know if Operator A is watching the entire cluster or is watching just a subset.

The fact that an opaque configuration applied to the bundle results in cluster-wide RBAC for the service account tied to a deployment is maybe a good enough proxy for watch namespace. But you are correct that a dependency resolver would also need awareness of the RBAC in use by any dependents, which is not available to OLM and may not be available to a user.

OLMv1 however, if its RBAC was limited to a single namespace, could create a CR for B and see if controller B picks it up and sets some status field. This would give OLMv1 the information it needs about whether controller B is running and reconciling at the cluster scope (which as presented is the recommend -- possibly only? -- install mode).

This seems fragile and complex and lots could go wrong. Off the top of my head:

  • OLM would have to evaluate the schema of each CRD and be able to produce a valid object
  • This assumes that all objects have a status
  • Operators may have admission webhooks that reject creates unless arbitrary conditions are met.
  • Creating a CR may have major implications on the operations of a cluster, and would like incur costs.
  • The fact that a CR for B can be created in a particular namespace provides no signal about whether a controller would reconcile B in another namespace.

The most likely dependency resolver implementation given the constraints is probably:

  • Client-based
  • Limited to cluster admins who can see RBAC and Extensions cluster-wide.
  • Required to use RBAC as a proxy for watch namespaces, which may result in assumptions that a controller is watching a namespace, even if it is not (i.e. it has RBAC, but for whatever reason isn't actually watching there)

1. How would a dependency resolver know which extensions were installed (let alone which extensions were watching which namespaces)? If a user is running the resolver, they would be blind to an installed extension that is watching their namespace if they don’t have permission to list extensions in the installation namespace. If a controller is running the resolver, then it might leak information to a user about installed extensions that the user is not otherwise entitled to know.
2. Even if (1) could be overcome, the lack of awareness of watched namespaces means that the resolver would have to make assumptions. If only one controller is installed, is it watching the right set of namespaces to meet the constraint? If multiple controllers are installed, are any of them watching the right set of namespaces? Without knowing the watched namespaces of the parent and child controllers, a correct dependency resolver implementation is not possible to implement.

Note that regardless of the ability of OLMv1 to perform dependency resolution (now or in the future), OLMv1 will not automatically install a missing dependency when a user requests an operator. The primary reasoning is that OLMv1 will err on the side of predictability and cluster-administrator awareness.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given it was properly conveyed via the UI (or logging in a CLI), I think you can achieve administrator awareness while still fulfilling dependencies.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Predictability is definitely the harder thing to achieve though. All sorts of inputs go into which dependency is chosen. In other package managers, there is almost always an imperative flow where an admin has a chance to review the chosen dependencies before they are installed.

If it was possible to overcome the difficulty of building a dependency resolver without awareness of controller scope (that's a big if, and not something we're pursuing), then a client-based resolver that presents the user with the chosen set of Extensions to install and lets them decide how to proceed would be the best of both worlds.

With OLMv1's focus on GitOps friendliness and security posture, we have decided not to pursue a controller-based dependency resolver/installer.

And again, this is all fairly moot because we are not pursuing a dependency resolver. However, part of the beauty of this design is that it lends itself more to extensibility. A third party should be able to implement a dependency resolver over the APIs provided by core OLMv1:

  • Catalog contents are available to clients
  • Catalog metadata is extensible, so third parties could include their own dependency metadata (e.g. about "requires", "provides", "conflicts", etc).
  • Extension API is available to clients


OLMv1 will not provide dependency resolution among packages in the catalog (see [Dependencies based on watched namespaces](#dependencies-based-on-watched-namespaces))

OLMv1 will provide constraint checking based on available cluster state. Constraint checking will be limited to checking whether the existing constraints are met. If so, install proceeds. If not, unmet constraints will be reported and the install/upgrade waits until constraints are met.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In OLMv0 this seems to exist via nativeAPIs defined in CSVs. The way this is built today, if you try to install an operator via the UI and the nativeAPI requirements are not fulfilled, the failure is not made apparent to the user with the missing dependencies. The installation appears to just be stuck pending. I'd suggest we make this more visible and easily debuggable in OLMv1.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there are a few rough edges of OLMv0 like this. Another is minKubeVersion in the CSV. The goal is to get all of the constraint-related information into the catalog where it can be evaluated before pulling, extracting, and applying bundle contents.


TL;DR: OLMv1 cannot feasibly support multi-tenancy or any feature that assumes multi-tenancy. All multi-tenancy features end up falling over because of the global API system of Kubernetes. While this short conclusion may be unsatisfying, the reasons are complex and intertwined.

Nearly every engineer in the Operator Framework group contributed to design explorations and prototypes over an entire year. For each of these design explorations, there are complex webs of features and assumptions that are necessary to understand the context that ultimately led to a conclusion of infeasibility that led us to today’s conclusion.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Nearly every engineer in the Operator Framework group contributed to design explorations and prototypes over an entire year. For each of these design explorations, there are complex webs of features and assumptions that are necessary to understand the context that ultimately led to a conclusion of infeasibility that led us to today’s conclusion.
Nearly every engineer in the Operator Framework group contributed to design explorations and prototypes over an entire year. For each of these design explorations, there are complex webs of features and assumptions that are necessary to understand the context that ultimately led to a conclusion of infeasibility.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole paragraph seems odd. It's describing the process/history but not necessarily the present state of OLMv1. (I.e. The subjects of these paragraphs is not OLMv1, but people and tasks.) I think we may want to reconsider this paragraph, or put it into a historical section, even if it means just adding a ### Historical Context above it.

README.md Outdated

OLM v1 is the follow-up to OLM v0, located [here](https://github.com/operator-framework/operator-lifecycle-manager).

It consists of four different components, including this one, which are as follows:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last "it" referenced was "OLM v0". You may want to distinguish this as OLM v1 (even though it's repetitive, it's more precise).


OLM v1 is the follow-up to OLM v0, located [here](https://github.com/operator-framework/operator-lifecycle-manager).

It consists of four different components, including this one, which are as follows:
* operator-controller
* [deppy](https://github.com/operator-framework/deppy)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bye-bye deppy?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is still in use by the ClusterExtension controller. As is rukpak. Let's leave those in until we actually stop using them?

README.md Outdated
* operator-controller
* [deppy](https://github.com/operator-framework/deppy)
* [rukpak](https://github.com/operator-framework/rukpak)
* [catalogd](https://github.com/operator-framework/catalogd)

For a more complete overview of OLM v1 and how it will differ from OLM v0, see our [overview](./docs/olmv1_overview.md).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Future tense ("will differ") vs present tense ("differs")?


TL;DR: OLMv1 cannot feasibly support multi-tenancy or any feature that assumes multi-tenancy. All multi-tenancy features end up falling over because of the global API system of Kubernetes. While this short conclusion may be unsatisfying, the reasons are complex and intertwined.

Nearly every engineer in the Operator Framework group contributed to design explorations and prototypes over an entire year. For each of these design explorations, there are complex webs of features and assumptions that are necessary to understand the context that ultimately led to a conclusion of infeasibility that led us to today’s conclusion.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole paragraph seems odd. It's describing the process/history but not necessarily the present state of OLMv1. (I.e. The subjects of these paragraphs is not OLMv1, but people and tasks.) I think we may want to reconsider this paragraph, or put it into a historical section, even if it means just adding a ### Historical Context above it.


### Watched namespaces cannot be configured in a first-class API

OLMv1 will not have a first-class API for configuring the namespaces that a controller will watch.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than saying "not", let's say what it does provide. e.g. "Namespace watching is provided via "

However, Kubernetes does not assume that a controller will be successful when it reconciles an object.

The Kubernetes design assumptions are:
- CRDs and their controllers are trusted cluster extensions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A controller can be namespaced, but it requires that all controllers for a given CRD manage non-overlapping namespaces. That can be tricky. This is why we are/were considering splitting up "global components" (e.g. CRDs) from potentially "namespace-able components" (i.e. the controllers).


## What will OLM doo that a generic package manager doesn't?

OLM will provide multiple features that are absent in generic package managers. Some items listed below are already implemented, while others are most likely planned for the future.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove "are most likely". Prefer simply "are", or "may be".

Copy link

netlify bot commented Mar 13, 2024

Deploy Preview for olmv1 ready!

Name Link
🔨 Latest commit dadeb9f
🔍 Latest deploy log https://app.netlify.com/sites/olmv1/deploys/65fadda3b5050600089a1dc0
😎 Deploy Preview https://deploy-preview-692--olmv1.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Signed-off-by: Joe Lanford <joe.lanford@gmail.com>
Copy link

@durera durera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To sum up my thoughts on reviewing this:

OLM v1 feels like a wasted opportunity in the form presented and I really hoped to see more focus around the previously discussed separation of API (cluster-scoped) from controller (not necessarily cluster-scoped) and the impact on lifecycle that such a split would necessitate.

I thought/hoped that OLM v1 would be trying to provide a solution for that challenging problem, rather than what appears to me to be a view of "that's someone else's problem to solve, we are going to design OLM to not get in the way of whoever tries to solve that"

In a model where the CRD and the controller(s) that implement that CRD are distinct entities I don't think something that wants to be a lifecycle manager can just "opt out" of such a key aspect that drives lifecycle events as the relationship between CRD and controller(s).

I think that the discussion around tenancy in the cluster has got in the way a little here perhaps, because this is not about multi-tenancy really, and - unlike multi-tenancy - this seperation of CRD from controller is something that is a natural fit for Kubernetes, certainly a more natural fit then having them bound together as it is today IMO.

OLM v1 should be the glue that binds and manages the CRD and it's controllers, that's what I'd expect of a lifecycle manager for cluster extensions. Put the discussion about tenancy aside, this isn't about being able to isolate one namespace from changes in another (which as had been said many times really isn't possible due to the nature of Kubernetes), but as a cluster administrator I should be able to install an API in my cluster, and then control which namespaces can use that API.

  • Manage a Kubernetes API extension (CRD)
  • Manage a single cluster-scoped controller for an installed API extension
  • Manage one or more namespace-scoped controllers for an installed API extension

I'd expect OLM to be the thing that can manage the extension as a whole including:

  • Ensuring either use of multiple controllers with namespace scopes or a single cluster scoped controller, and preventing a mix or the two
  • Managing the relationship between controller and CRD, and e.g. preventing controller updates happening which would create incompatibiity with the CRD version or at least flagging them as moving into a warning state if the controller and the CRD are not compatible
  • Moving from a model where the CRD and controller are delivered in one package to offering distinct packages for CRDs and controllers, as well as a convenience bundle for both perhaps

The customers with large clusters that I have worked with in the last 2 years do not want to be using cluster-scoped controllers, and they are not seeking to address tenancy concerns through the use of multiple namespace-scoped controllers; they see namespace-scoped controllers as a way to enable the extensions on a namespace-by-namespace basis. Yes, the extension is installed to the cluster, but it's only been enabled for use in namespace1, 2, & 3.

They are seeking to limit the exposure/impact of introducing a new API and updating controllers in their large clusters rather than the ability to independently operate in namespace1 and namespace2 through some form of isolation/tenancy.

This is what I think the next evolution of OLM should be, transition to a first-class data model and lifecycle built around the seperation of CRD from controller.

Example: Strimzi and Red Hat AMQ Streams
Today use of these two major Kafka operators is problematic, if you install both in a cluster things go bad for you because both respond to the same API (kafka.kafka.strimzi.io).

I would expect/hope for OLM v1 to address this with first class support for something like this:

  • Install the kafka.strimzi.io API extension to say "we support Kafka in this cluster" (no controller(s) included in this action)
  • Install the Strimzi controller in namespace1 (no change to the API included in this action)
  • Install the AMQ Streams controller in namespace2 (no change to the API included in this action, it can not break anything in namespace1)
  • Update the kafka.strimzi.io API extension ... at this point we are performing a cluster-scoped operation that may impact all namespaces with a controller (namespace1 and namespace2)
  • Update the AMQ Streams controller in namespace2 (no need to worry about impacting namespace1)

To me, this is what I think about I hear people talking about tenancy, because this is what the customers I work with are seeking, and it's a perfect fit for Kubernetes if we just broke apart the delivery mechanism for CRDs and controllers.

I would liken this change in approach to the difference it made when file-based catalogs came around and removed the channel graph information from the operator bundles. It never made sense that an individual operator bundle had to know "what channel will I be in", or "what's the default channel of the package I belong to" and once we were able to define that relationship in the correct place (in the catalog itself) it was a game-changer for managing operator packages. I feel that the same thing needs to happen for CRD/Controllers as part of the evolution of OLM.

However, Kubernetes does not assume that a controller will be successful when it reconciles an object.

The Kubernetes design assumptions are:
- CRDs and their controllers are trusted cluster extensions.
Copy link

@pgodowski pgodowski Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example which is counter argument is the Ingress API, which has reference to the ingress class name, which may or may not having a running controller on the cluster.

I am nor arguing APIs are global.
I am arguing that APIs could be separated from the controllers, with different lifecycle


### "Watch namespace"-aware operator discoverability

When operators add APIs to a cluster, these APIs are globally visible. As stated before, there is an assumption in this design that a controller will reconcile an object of that API anywhere it exists in the cluster.
Copy link

@pgodowski pgodowski Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such assumption is dangereous. Controllers might have privileges to read and modify Secrets, to properly manage the operands. Cluster admins will not allow global access to the all Secrets on the cluster, they will allow such access only to the selected namespaces only.

Example: lets assume we are installing Kafka operator at the cluster scope, which will have Secrets get/update permission. Cluster admins will not let Kafka operator access to namespaces running some other workloads where the confidential/sensitive data is stored as Secrets.

Therefore, controllers must be provided a way to restrict their access. It could be done as such that RBAC is granted only in selected namespaces and it is up to the controller to somehow know which namespaces to watch. But then, it begs for some API to understand what is such scope of controller, or at least some though being given what is the controller developer best practices how to handle scope discovery.

Copy link
Contributor

@grokspawn grokspawn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RH maintainers had a series of meetings last week and determined some updates/clarifications needed here, but it would be much easier to discern the updates in a separate PR, so let's merge this and do a follow-up to capture updates.

Reviewers: please interpret this as an attempt to ensure that your comment is interpreted in the updated context

@joelanford joelanford added this pull request to the merge queue Mar 26, 2024
Merged via the queue into operator-framework:main with commit 03ac22d Mar 26, 2024
15 checks passed
@joelanford joelanford deleted the olmv1-overview branch June 20, 2024 19:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants