Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

📖 F28 - Provided service versions (aka 'Operand versioning') #180

Merged
merged 4 commits into from
Feb 19, 2024

Conversation

tlwu2013
Copy link
Contributor

@tlwu2013 tlwu2013 commented Apr 20, 2023

This is the feature request received in the past for OLM to expose the "provided service versions" (aka "Operand versioning") offered by the Operator/extension.

As extensions can only be installed once in a cluster as cluster-wide singletons with OLM 1.0, it is likely for an extension to oversee multiple versions of service and utilize a distinct versioning scheme. This should be a part of the OLM v1 requirement.

Related work in the past (Feb, 2021): https://hackmd.io/IEYEaOYkR-eBkMQkbVrC1A?view

@dmesser
Copy link

dmesser commented Apr 20, 2023

cc @cdjohnson @Jamstah @pgodowski

@@ -77,6 +77,8 @@ _Priority Rating: 1 highest, 2 medium, 3 lower (e.g. P2 = Medium Priority)_

**F27 - Pluggable certificate management (P2):** OLM should rely on other controllers to create and lifecycle TLS certificates required to drive the functionality of certain extensions, like webhooks that need to trust / need to be trusted by the cluster's API server. OLM should not implement any certificate handling itself. In a first implementation support should be established for cert-manager.

**F28 - Provided service versions (P2):** Extensions can offer multiple services with specific version ranges. To help cluster users make informed decisions about which extension version to install or update, OLM should provide a separate field for extension developers to specify the provided services and their respective versions. This will allow for declaring extension dependencies on specific service versions and enable flexibility in releasing extensions and their corresponding services at different cadences. In disconnected environments, cluster users should be able to deselect provided services and versions during image mirroring.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will allow for declaring extension dependencies on specific service versions and enable flexibility in releasing extensions and their corresponding services at different cadences.

Can we walk through some examples of how this would work? If an operator author declared a dependency like "I need an operator that provides Kafka v3.4.0", and two different operators (call them A and B) with two different APIs say they manage Kafka v3.4.0, OLM might pick B, the "wrong" operator. The depender may only know about A's API. It seems like the service version dependency would need to be an addition to the required GVK or operator version constraints. So you would have to say "I depend on this GVK and service version range" OR "I depend on this package, package version range, and service version range". Is that what we have in mind here?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't really thought about operand versioning and its use in operator-to-operator dependencies but it does make sense as a use case.

I feel that a service is represented by a GVK. If an operator provide v3.4.0 of Kafka then I would expect to make a Kafka CR with a spec.version: 3.4.0, or something along those lines. So the dependency would be:

  • GVK: kafkas.apigroup.provider.com/v1beta1
  • Operand version: 3.4.0
  • Specific package range: (optional)

In this instance, we wouldn't even need OLM to understand that its the spec.version field in the CR that identifies the version being used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this instance, we wouldn't even need OLM to understand that its the spec.version field in the CR that identifies the version being used.

Correct.

My main point here is that this can't be a standalone constraint. It has to be tied to the operator name or the provided API.

Copy link
Contributor Author

@tlwu2013 tlwu2013 Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an operator author declared a dependency like "I need an operator that provides Kafka v3.4.0", and two different operators (call them A and B) with two different APIs say they manage Kafka v3.4.0, OLM might pick B, the "wrong" operator. The depender may only know about A's API. It seems like the service version dependency would need to be an addition to the required GVK or operator version constraints. So you would have to say "I depend on this GVK and service version range" OR "I depend on this package, package version range, and service version range". Is that what we have in mind here?

Correct, this "service version dependency" should be used as an addition to the "required GVK" or "Operator version constraint" to narrow down to:

  • a specific range of Operator versions (if used with "GVK" which is shared across multiple operand versions)
  • or "a specific image of Operand/provided service" (if used "Operator version constraint" and a single Operator version provides multiple versions of operand/provided service).

In this instance, we wouldn't even need OLM to understand that its the spec.version field in the CR that identifies the version being used.

Correct.

My main point here is that this can't be a standalone constraint. It has to be tied to the operator name or the provided API.

Correct, the main point is this "service version dependency" is not a standalone thing.

But in this Kafka example, Operand version: 3.4.0 is also useful because the GVK kafkas.apigroup.provider.com/v1beta1 is the same for multiple Kafka version. So Kafka/Operand version: 3.4.0 is useful to narrow down to a tighter Operator version range and pin point the desired Operand image for Kafka 3.4.0 within the target Operator version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will allow for declaring extension dependencies as an additional constraint on specific service versions and enable flexibility in releasing extensions and their corresponding services at different cadences.

Add "as an additional constraint" to clarify it is not intended as a standalone constraint

@@ -77,6 +77,8 @@ _Priority Rating: 1 highest, 2 medium, 3 lower (e.g. P2 = Medium Priority)_

**F27 - Pluggable certificate management (P2):** OLM should rely on other controllers to create and lifecycle TLS certificates required to drive the functionality of certain extensions, like webhooks that need to trust / need to be trusted by the cluster's API server. OLM should not implement any certificate handling itself. In a first implementation support should be established for cert-manager.

**F28 - Provided service versions (P2):** Extensions can offer multiple services with specific version ranges. To help cluster users make informed decisions about which extension version to install or update, OLM should provide a separate field for extension developers to specify the provided services and their respective versions. This will allow for declaring extension dependencies on specific service versions and enable flexibility in releasing extensions and their corresponding services at different cadences. In disconnected environments, cluster users should be able to deselect provided services and versions during image mirroring.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In disconnected environments, cluster users should be able to deselect provided services and versions during image mirroring.

This seems like it will lead to a poor UX for end users of the operator. There are several scenarios that could be problematic:

  1. An operator provides services A and B, but the mirroring user mirrors only service A. The operator API for service B is still present, so an end user tries to use that API. The operator doesn't know which services have been mirrored (i.e. which operand images have been mirrored), so it happily attempts to handle the request for service B. The result is "ImagePullBackoff" on all of the service B images with no indication to the user about why. To them, it looks like a bug in the operator.
  2. Operators that support multiple versions are very likely to have complex upgrade logic between those versions. If mirroring users are able to selectively remove arbitrary versions at mirror time, an operator may not be able to perform its essential function of performing operand migrations via a series of service version upgrades. Again, the operator and end user will see ImagePullBackoff if a mirror user has inadvertently opted not to mirror an intermediate service version required for a migration.

Are these acceptable outcomes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find myself thinking whether this type of configuration (reducing the the operators behavior surface) should be OLMs concern, or rather that of some operator side library/framework which makes this kind of configuration easy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to reach some consensus on the boundary between the OLM's and the operator's responsibilities. IMHO OLM should only concern itself with installation/update/removal of operators. Configuration is something outside of that. Though, I'd be curious to see if we could have some kind of separate component within the OLM ecosystem that can handle this in a nice way.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that OLM should only be concerned with installation/update/removal.

I can't see a good way for OLM (or the operator) to know if an image pull is going to succeed or fail before it is attempted. Although it would be great to be able to indicate to a user that the problem is that something wasn't mirrored vs a registry auth error, a missing image, or any other multitude of problems, I wouldn't consider that to be in scope either.

I do think though that we need the metadata available in the package to allows the user to select the service versions that they want to mirror. This does leave potential sharp edges at runtime, where if you attempt to use a version that wasn't mirrored you will get pull errors, but the alternative is that the user has to mirror everything. As we're explicitly saying that fan-out operators are going to be more common because all operators will be cluster singletons, the mirror size expands extremely fast.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The potential problem I'm trying to highlight is that if OLM has a first class API for declaring service versions AND a first class mirror API for selecting specific service versions, then OLM (or maybe the mirroring tool) will be pinned with the blame when an image pull inevitably fails because an intermediate version wasn't mirrored.

"I used the mirror API to select the service version images I cared about, and now I'm getting image pull errors. OLM and the mirror tool should mirror all necessary images based on my stated desires" is what the bug reports will say.

And that means we'll have to extend the service version API with an upgrade edge specification, we'll have to tell operator authors to keep that up-to-date with their operator code, and the mirror tool API will have to allow the mirror user to tell it which service versions have already been mirrored. With all of that, then and only then will the mirror tool be able to follow the graph to mirror all the intermediate versions.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To confirm, you're talking about the case where the operator needs to pass a workload through intermediate versions for upgrade, so:

  • Operand version 1.0.0 exists
  • The user mirrors 1.0.0 and starts using it
  • Operand versions 1.1.0, 1.2.0 are released in the latest operator
  • The user adds 1.2.0 to their mirror, because that's what they want to upgrade to
  • The user changes the workload CR to 1.2.0
  • The operator starts by attempting a 1.1.0 upgrade
  • The pull fails!

Is that the scenario?

Copy link
Contributor Author

@tlwu2013 tlwu2013 Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In disconnected environments, cluster users should be able to deselect provided services and versions during image mirroring.

This is coming from a requirement of Operators that provide multiple Add-Ons in addition to the primary provided services. e.g., Upstream OCM (or RHACM) Operator that comes with VolSync and Submariner Add-ons.

Enable users to deselect mirroring images of a certain provided service that are not interested in can greatly improve the mirroring time, traffic, and storage size.

There are several scenarios that could be problematic:

  1. An operator provides services A and B, but the mirroring user mirrors only service A. The operator API for service B is still present, so an end user tries to use that API. The operator doesn't know which services have been mirrored (i.e. which operand images have been mirrored), so it happily attempts to handle the request for service B. The result is "ImagePullBackoff" on all of the service B images with no indication to the user about why. To them, it looks like a bug in the operator.

Right, so in that case, it's very clear this is the intended/expected behavior explicitly set up by the admins, so it should be on the admins to make this clear to their teams. And as you mentioned, end-users/developers should be able to see the ImagePullBackof too (so they know the image is missing in their internal registry).

  1. Operators that support multiple versions are very likely to have complex upgrade logic between those versions. If mirroring users are able to selectively remove arbitrary versions at mirror time, an operator may not be able to perform its essential function of performing operand migrations via a series of service version upgrades. Again, the operator and end user will see ImagePullBackoff if a mirror user has inadvertently opted not to mirror an intermediate service version required for a migration.

The thinking here is to set up a minVersion and maxVersion as the Operand/provided service version range for mirroring. e.g.,

A user specifies the desired Kafka versions as:

  • minVersion: 2.8.0
  • maxVersion: 3.2.0

, given Strimzi Operator declares:

  • version: 0.27.1, provides Kafka: 2.8.0, 2.8.1, 3.0.0
  • version: 0.28.0, provides Kafka: 3.0.0, 3.1.0
  • version: 0.29.0, provides Kafka: 3.0.0, 3.0.1, 3.1.0, 3.1.1, 3.2.0

, so mirroring tool can derive the target versions for mirroring Strimzi Operator are:

  • version: 0.27.1
  • version: 0.28.0
  • version: 0.29.0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enable users to deselect mirroring images of a certain provided service that are not interested in can greatly improve the mirroring time, traffic, and storage size.

This thread makes me think we need to provide a more robust "relatedImages" API. Basically by definition, an operator is complex code that orchestrates $something in place of an expert human. In that context, the set of related images needed for that orchestration is also likely complex. Mirroring by service version is somewhat simplistic and may not align with the ways that operators want to expose what to mirror when.

Copy link
Member

@joelanford joelanford Jun 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a separate thread @Jamstah said:

I don't think we can rely on operand versions being semver

I agree. Which means using minVersion and maxVersion for operand versions is likely not feasible.

@pgodowski
Copy link

cc @cdjohnson @Jamstah @pgodowski

Will review early next week, thanks for sharing!

@Jamstah
Copy link

Jamstah commented Apr 25, 2023

This PR focuses mainly on operator-to-operator dependencies, which is a great use case for being able to determine the service versions and keep the operator dependency graph complete.

The other high level use case for this is user selection of service versions, which I feel we should include more strongly. For example, if the user has a kafkas.apigroup.provider.com CR that is at operand version 3.4.0 then we shouldn't allow (or should at least warn) if the operator providing that API is upgraded past the point where it supports 3.4.0 any more. This would require that OLM have an idea of which field in the CR indicates the service version that is currently in use for that API.

Potentially, even if the user doesn't have a CR using that operand version right now, if that's what they asked for at install time then they should be warned if that service version is going away, although I'm not sure how/where that dependency would be recorded, or if we need it. Perhaps in the subscription? I guess this would be an issue for job style CRs that are created temporarily. Certainly a less urgent use case.

@joelanford
Copy link
Member

if the operator providing that API is upgraded past the point where it supports 3.4.0 any more.

That's a breaking change and that would warrant a major version bump of the operator. And I would expect:

  • operator authors to disclose this breakage in release notes
  • users not to upgrade to a version that contains a breaking change without understanding the ramifications.

Beyond that, the implications here seems to be that:

  1. APIs for services have a spec field somewhere that contain a service version string
  2. OLM knows how to find that field for each service API.

IMO, that is too opinionated. What if an API wants to expose the desired version as a range or as a channel containing multiple versions? IMO, going this far is beyond the scope of what I think OLM should have awareness of.

@Jamstah
Copy link

Jamstah commented Apr 27, 2023

APIs for services have a spec field somewhere that contain a service version string

I would have said that it would be a status field not a spec field, where the operator has resolved the constraints and reconciled a specific version, that then needs to be supported through any operator change.

OLM knows how to find that field for each service API.

Yes, OLM would need to know which field to look for, and that would need to be specific in some way (in the CSV?).

IMO, that is too opinionated. What if an API wants to expose the desired version as a range or as a channel containing multiple versions? IMO, going this far is beyond the scope of what I think OLM should have awareness of.

I'm in two minds here. Part of me thinks that its a good way to stop users shooting themselves in the foot, but I do agree that its a lot of additional complexity just for that use case. Removal of supported operand versions being a breaking change might be the right call, which would mean the user can rely on being in the right operator channel instead.

I was kind of equating it to the function in OpenShift that checks for deprecated API usage in the cluster and says "are you sure you want to upgrade? You're still using these API versions which won't be in the next OpenShift". I guess that's easier to track.

@joelanford
Copy link
Member

I guess that's easier to track.

That's the case because the thing that removes the deprecated API is the same thing that is doing this tracking, right? The analogy holds if the operator (which owns this API) is the thing doing this checking.

We're planning to have an OperatorHealth sort of API where the operator can raise these sorts of things and potentially prevent upgrades.

@Jamstah
Copy link

Jamstah commented Apr 27, 2023

Right, but the currently installed operator doesn't know the capabilities of the next one, so it doesn't know to prevent the upgrade.

Maybe that's why we need a pre-install hook of some kind. I'm definitely all for the operator making the call, because I don't think we can rely on operand versions being semver, which means only the operator can safely parse them.

@tlwu2013 tlwu2013 force-pushed the f28 branch 2 times, most recently from eea388b to 0778bfd Compare June 7, 2023 23:30
@joelanford
Copy link
Member

My overall perspective I think remains the same.

  1. There are two separate use cases here (dependency resolution and disconnected mirroring) that, IMO, are sufficiently different that the suggested API (declaration of supported service versions per bundle) can't solve them both in a way that supports the various approaches operators take to orchestrate their operands.
  2. I think adding this specific API, even if optional, will lead to us enshrining undesirable incentives. Use of this API will be perceived as a maturity thing: "Immature operators don't specify it, mature operators do." But that's an invalid assumption. Some operators don't have service versions at all. Others might have complex logic around choosing an operand version without actually exposing a direct choice by the end user. If we move forward with this API, I worry that we will pre-dispose operator authors to build their operators in a way that requires them to accept operand versions in their CRDs, and that might be a worse end-user experience than the operator managing the operand version.
  3. This opens a door that makes it more likely for operator authors to remove support for operand versions in a non-major-version-bump of an operator. It gives them a way to say, "Yeah, I removed support for an operand version, but I told you I did that, so it's not my responsibility to prevent upgrades if existing operands are at versions unsupported by the upgrade". This is not something OLM would be able to know (easily at least). If operator v1.28.0 supports operand versions v3.0.0 and v3.1.0 and lets an end user state their desire for v3.0.0 in an API, then all future operator v1 versions need to continue supporting v3.0.0. Otherwise the existing CR that says "use v3.0.0" will break.

TL;DR: I think we're trying to make OLM "all things for all people", and in doing so, I think we're wrapping ourselves into knots. I'm worried this is another knot. It solves one set of use cases, but adds friction to other -- equally valid -- use cases.

@tlwu2013
Copy link
Contributor Author

There are two separate use cases here (dependency resolution and disconnected mirroring) that, IMO, are sufficiently different that the suggested API (declaration of supported service versions per bundle) can’t solve them both in a way that supports the various approaches operators take to orchestrate their operands.

This API being optional is just so that the API does not need to support all kinds of approaches Operators take to orchestrate their operands, but the majority of the use cases (e.g., operand versioning following semver, etc).

I think adding this specific API, even if optional, will lead to us enshrining undesirable incentives. Use of this API will be perceived as a maturity thing: “Immature operators don’t specify it, mature operators do.” But that’s an invalid assumption. Some operators don’t have service versions at all. Others might have complex logic around choosing an operand version without actually exposing a direct choice by the end user. If we move forward with this API, I worry that we will pre-dispose operator authors to build their operators in a way that requires them to accept operand versions in their CRDs, and that might be a worse end-user experience than the operator managing the operand version.

This API does incentivize Operator authors to denote provided service versions clearly to their users following the guidance if their Operators do provide managed service(s) to showcase their maturity and I’d argue it’s in a good way. There are Operators out there that don’t ship any CRDs at all, but they are not viewed as inferior simply because of that. But if they do provide managed service(s), and they don’t clearly denote the supported version, then that’s a valid maturity issue.

This opens a door that makes it more likely for operator authors to remove support for operand versions in a non-major-version-bump of an operator. It gives them a way to say, “Yeah, I removed support for an operand version, but I told you I did that, so it’s not my responsibility to prevent upgrades if existing operands are at versions unsupported by the upgrade”. This is not something OLM would be able to know (easily at least). If operator v1.28.0 supports operand versions v3.0.0 and v3.1.0 and lets an end user state their desire for v3.0.0 in an API, then all future operator v1 versions need to continue supporting v3.0.0. Otherwise the existing CR that says “use v3.0.0” will break.

I don’t see how adding this will exacerbate “operator authors to remove support for operand versions in a non-major-version-bump of an operator” but rather with this Operand versioning thing, it helps greatly to expose this to the end-users, so users can either better prepared for that, or simply walked away toward another Operator does better in honoring semver. Just as you mentioned, without this Operand versioning, even OLM is having a hard time telling “if existing operands are at versions unsupported by the upgrade”. This is exactly the reason this API is critical to improving the visibility of this to both OLM and the end-users.

@dmesser
Copy link

dmesser commented Jun 22, 2023

Late to the party, it seems like we are discussing a couple of things in parallel so here is my attempt to bring into one larger picture:

A) users, relying on workload-managing operators, generally have a great desire to know upfront (before install or upgrade) what versions of the workload the desired operator version can manage

To your point @joelanford - not all operators are workload-managing operators. This kind of data will not make sense for something like Tekton or Isitio, which generally represent control planes and event-based automation facilities instead of managing a workload that is the actual thing that is consumed. Also automation-focussed operators like a namespace-configuration operator or cert-manager will not take value from this.
But the workload-managing operators category is real and is providing a lot of value to users. By hiding this critical piece of information from them, that value will be hard to get to. One would imagine how successful managed cloud services like AWS RDS would be, if they simply would never tell users what version of Postgres the service is able to manage.

That said, I don't think it incentives breaking changes in minor release versions anymore than OLM currently incentivises breaking changes in the operator APIs due to OLMs enforcement of semantic version strings for bundles. OLM doesn't bark if you remove an entire CRD in a minor version. Still, we see value in enshrining the semver concept in operator versioning to convey the recommendations associated with that. That value isn't lowered just because we aren't always able to enforce it at runtime in all cases. Rather the presence of this data allows test systems to check for this behavior. Equally, with operand information we would allow test systems to call our operators that are removing operand versions within a minor or patch release.
I also argue that this leading to a notion of more vs. less mature operators is a good thing. An operator that provides more metadata and supports multiple versions of a managed workload and also calls this out publicly with a well-maintained list of versions is more mature IMHO, because it gives users what they need. But this can only be a comparison between workload operators. I think we can expect people to be aware enough of the intent of this, to not call out cert-manager as immature because it doesn't provide operand versions.

On the other hand, I would be curious to learn more about the example:

Others might have complex logic around choosing an operand version without actually exposing a direct choice by the end user.

Do we know of an example? I almost seems like a missed opportunity by the operator to provide more controls and nuance. And even when an operator does that at runtime, isn't the logic around that caused by variable input from the user? I.e. a version of another component. Regardless, if an operator has a good reason not to expose a choice her, they can opt out of providing the operand version data as well.

B) OLM resolving based on operand information

I think it's important to remind ourselves that OLM is not a workload manager. It is generally not aware of what the operators are exactly doing, nor should it be.
So I think it would be out of scope for OLM to try to predict what happens to operator-managed things when the operator updates. There are a couple of obvious scenarios, like removing CRDs that will delete the CRs and therefore most likely also the workloads they represent, that we catch. But I don't see us going beyond that. So OLM would not make assumptions about how the user specifies the versions for a desired workload, e.g. on which CR or in which field. This knowledge is in the operator and IMHO that is the best place for it. It can still communicate with OLM based on this knowledge:
for instance, an operator updating to a newer version that drops support for an operand version that is found on cluster could leverage the preflight check facility in F6 to block the update from going through. There is no need for OLM to try to solve that scenario generically.

What is in OLMs realm is using this information as additional constraints when resolving dependencies. If operator A says it needs operator B and operand B.v1.1, OLM shouldn't update B to a version that only provides operand B.v2.0 and not v1.1 anymore (of course an admin would always be able to force it through). That's where this feature provides additional value in the form of safety. The scenario I see is: popular operators used by multiple products and open source projects as a shared dependency but serving different requirements in regards to the version of the managed workload provided.

I would go as far saying that without operand versioning information here, the concept of shared dependencies of operator will not be something the community can adopt, ultimately hampering reuseability and growth of open-source projects.

C) Mirroring based on operand information

There seem to be two issues here:

  1. we can't expect all operands to follow well-known versioning schemas, so useful things like version ranges or sorting cannot be applied.

We could tackle this in a couple of ways: we could either say that like operators, also operands should follow semver. And by looking at the ecosystem, I think we stand a good chance to capture the majority of users / projects / software here.
We could also say we fall back to simple lists, which are harder to maintain than ranges, but not insurmountably hard. We will likely never find a solution that works for everyone and everything out there, so I'd encourage us to have an opinion that enables a majority of the software and incentives a useful pattern.

  1. we are afraid that if people don't mirror the right operands, the operator will inevitably run into image pull errors and we think that this will be on OLM to blame

IMHO, with our current mirroring concept, would likely end up doing the equivalent for operand versioning of what we are already doing for channels and packages in catalogs today: we mingle with the list of stuff in the mirrored catalog. That is, if a users filters out an operand version during mirroring, the resulting catalog would also not contain that operand version reference. This way, we convey to the user correctly what's available in the mirrored environment.
I know we are also thinking about different approaches to mirrored catalog, where we would have other ways of denoting if an operator channel, package or individual version was mirrored or not. We could do the same with the operands.

@dmesser
Copy link

dmesser commented Jul 24, 2023

@Jamstah @perdasilva @joelanford @tlwu2013 What do we think, how do we move forward here?

@joelanford
Copy link
Member

joelanford commented Jul 24, 2023

I'm seeing four requirements:

  1. Operator admins should be able to see which operand versions each workload-based operator supports, by operator version.
  2. Operator admins have guarantees that they can install an operator that supports their desired operand version(s).
  3. Operator authors have guarantees that they can list operator dependencies that support the necessary dependency operand version(s).
  4. When mirroring a catalog, mirroring users can select a subset of the related images to mirror, based on desired operand version(s).

Maybe I'm getting hung up on the prescriptiveness of the implementation in the PR (i.e. there shall be a new field for a list of service versions, operand version shall be constrainable). Perhaps we could reword it to look more like the above?

@Jamstah
Copy link

Jamstah commented Jul 27, 2023

I think that's a good list, I might add:

  • Operator admins have guarantees that the operator won't be upgraded to the point where it no longer supports the operands.

The easy bit here is an operator knowing itself what it supports. The hard bit is knowing what something that isn't yet installed supports to know its going to be suitable for the job. To write that down in a way that is flexible enough may be impossible, we're expecting to be able to communicate knowledge to OLM that it really shouldn't need to have, like:

  • Which field on a CR is an operand version?
  • How do we match that to images?
  • Are there fields for specific features that may have been deprecated?

If we do build that model, I'm 90% sure that there will immediately be requests to extend it with additional checks for more specific cases.

I feel like we may need a way for an uninstalled operator to look at a cluster and say if it can or cannot reconcile what is already existing on it. Perhaps a CSV could include a pre-install/pre-update Job of some kind for that purpose.... it wouldn't solve all the use cases, but I feel many of them could be solved with documentation if we take the sharp edges out of install.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 28, 2023
@tlwu2013
Copy link
Contributor Author

tlwu2013 commented Jul 31, 2023

I'm seeing four requirements:

1. Operator admins should be able to see which operand versions each workload-based operator supports, by operator version.

2. Operator admins have guarantees that they can install an operator that supports their desired operand version(s).

3. Operator authors have guarantees that they can list operator dependencies that support the necessary dependency operand version(s).

4. When mirroring a catalog, mirroring users can select a subset of the related images to mirror, based on desired operand version(s).

Maybe I'm getting hung up on the prescriptiveness of the implementation in the PR (i.e. there shall be a new field for a list of service versions, operand version shall be constrainable). Perhaps we could reword it to look more like the above?

Done with the editing.

@Jamstah @perdasilva @joelanford @dmesser
Can we start another round of review? thanks!

@codecov
Copy link

codecov bot commented Jul 31, 2023

Codecov Report

Merging #180 (bb501bf) into main (949c06a) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             main     #180   +/-   ##
=======================================
  Coverage   80.69%   80.69%           
=======================================
  Files          18       18           
  Lines         803      803           
=======================================
  Hits          648      648           
  Misses        112      112           
  Partials       43       43           
Flag Coverage Δ
e2e 55.79% <ø> (ø)
unit 77.00% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Signed-off-by: Tony Wu <tlwu2013@gmail.comexport>
@tlwu2013
Copy link
Contributor Author

I think that's a good list, I might add:

  • Operator admins have guarantees that the operator won't be upgraded to the point where it no longer supports the operands.

@Jamstah, I've also added the "upgrade" piece to the latest change:

Extension admins are guaranteed to install or upgrade an extension that supports their desired operand version(s).

@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 1, 2023
@dmesser
Copy link

dmesser commented Oct 26, 2023

This looks good to me.

@ncdc ncdc changed the title F28 - Provided service versions (aka 'Operand versioning') 📖 F28 - Provided service versions (aka 'Operand versioning') Nov 6, 2023
Copy link
Contributor

@perdasilva perdasilva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm ^^ thank you!!!

@joelanford
Copy link
Member

High-level, I'm comfortable with some sort of optional API operator authors can use to declare which versions of an operand are supported by a particular operator version. This seems to dovetail with some recent conversations we've had about making operators more multi-tenant capable.

However, I think there is still MUCH to discuss about the mechanics of this and how it would work. E.g. what if an operator supports upgrading an existing workload from a particular operand version, but not installing or upgrading to that version?

I still think this feature description seems too presumptive about how operators would (or wouldn't) expose operand version selection to users.

So my stance is that I'm okay to merge if the goal here is to capture a high-level idea of supporting some sort of feature about operand version support awareness for admins. But I am not ready to say that what is described can or should be implemented without more design discussion.

@joelanford joelanford added this pull request to the merge queue Feb 19, 2024
Merged via the queue into operator-framework:main with commit 87f2428 Feb 19, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants