Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OTA-1029: Add a CVO Log Level API #1492

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

DavidHurta
Copy link
Contributor

@DavidHurta DavidHurta commented Oct 9, 2023

This enhancement describes the API changes needed to provide a simple way of dynamically changing the verbosity level of Cluster Version Operator's logs.

This pull request references https://issues.redhat.com/browse/OTA-1029

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Oct 9, 2023
@openshift-ci-robot
Copy link

openshift-ci-robot commented Oct 9, 2023

@Davoska: This pull request references OTA-1029 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but no target version was set.

In response to this:

This enhancement describes the API changes needed to provide a simple way of dynamically changing the Cluster Version Operator's log level.

This pull request references https://issues.redhat.com/browse/OTA-1029

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 9, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 9, 2023

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@DavidHurta
Copy link
Contributor Author

/test all

@DavidHurta
Copy link
Contributor Author

/test all

@petr-muller
Copy link
Member

/cc

@DavidHurta DavidHurta force-pushed the cvo-logging branch 5 times, most recently from 1056f05 to 8380ff6 Compare October 12, 2023 14:51
@DavidHurta DavidHurta requested a review from wking October 12, 2023 14:54
@DavidHurta DavidHurta marked this pull request as ready for review October 17, 2023 11:06
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 17, 2023
@DavidHurta
Copy link
Contributor Author

PTAL, reviewers @LalatenduMohanty @wking @petr-muller

@DavidHurta
Copy link
Contributor Author

PTAL, API approver @deads2k

Copy link
Member

@petr-muller petr-muller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, minor comment inline

enhancements/update/cvo-log-level-api.md Outdated Show resolved Hide resolved
Copy link
Member

@petr-muller petr-muller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 1, 2024
Copy link
Contributor

openshift-ci bot commented Oct 1, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: petr-muller
Once this PR has been reviewed and has the lgtm label, please assign sdodson for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

enhancements/update/cvo-log-level-api.md Outdated Show resolved Hide resolved
enhancements/update/cvo-log-level-api.md Show resolved Hide resolved
enhancements/update/cvo-log-level-api.md Show resolved Hide resolved
Comment on lines 118 to 120
A hosted CVO is located in the management cluster and accesses the hosted API
server. As the new CR will be part of the OCP payload, it will be applied to the
hosted cluster.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that seems dissonant to me. If CVO acts within the management cluster, why is the configuration for the CVO not also at the management cluster level. Have you spoken to anyone from HCP about this EP yet?

Copy link
Contributor Author

@DavidHurta DavidHurta Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like the current solution for a bunch of hosted operators placed in the management cluster (cluster-image-registry-operator, dns-operator, cluster-network-operator,...). These operators are running in the management cluster; however, they access the hosted API server. However, some of them can have potentially access even to the management cluster API server, and access any configuration resources there. Let me reach out to the HyperShift team.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main issue with putting CRDs in the management cluster is that we have to handle multiple versions of OCP on the same management cluster.

We could add this to the HostedCluster API and have that be the source of truth, while simply reconciling (and preventing changes to) the CR within the hosted cluster

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other alternative to consider is making this CRD namespaced.

Copy link
Contributor Author

@DavidHurta DavidHurta Oct 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add this to the HostedCluster API and have that be the source of truth, while simply reconciling (and preventing changes to) the CR within the hosted cluster

I really like the idea of this design. No compatibility issues regarding the management cluster. Utilizing an existing pattern of flow of the information from the HostedCluster API to the hosted cluster. The hosted CVO is not the wiser and does not need to even access the management cluster API server. The complexity seems much lower. We only need to come up with the HyperShift API change.

The ClusterConfiguration API seems like a nice place; however, its purpose seems to be oriented around the configuration API (github.com/openshift/api/config/v1), meaning configuration for OCP components rather than just operators.
I don't see any applicable existing APIs that we could use to reference the new proposed ClusterVersionOperator API. We could introduce a new OperatorConfiguration API where various APIs from the github.com/openshift/api/operator/v1 package could be referenced, including the newly proposed ClusterVersionOperator API. What do you think?

Something like:

// OperatorConfiguration specifies configuration for individual OCP operators in the
// cluster, represented as embedded resources that correspond to the openshift
// operator API.
type OperatorConfiguration struct {
	// ClusterVersionOperator specifies the configuration for the Cluster Version Operator in the HostedCluster.
	// +optional
	ClusterVersionOperator *operatorv1.ClusterVersionOperatorSpec `json:"clusterVersionOperator,omitempty"`
}
type HostedClusterSpec struct {
        ...
	// OperatorConfiguration specifies configuration for individual OCP operators in the
	// cluster, represented as embedded resources that correspond to the openshift
	// operator API.
	//
	// +kubebuilder:validation:Optional
	// +optional
	OperatorConfiguration *OperatorConfiguration `json:"operatorConfiguration,omitempty"`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@csrwng, @enxebre, let me know what do you think, please 🙌 I can also follow-up on Slack or in the office-hours.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other alternative to consider is making this CRD namespaced.

That doesn't make sense from an OCP standpoint, since you'll likely need different fields for a namespaced version of this.

We could add this to the HostedCluster API and have that be the source of truth, while simply reconciling (and preventing changes to) the CR within the hosted cluster

If CVO is running in the management cluster, why would we want to have this CRD in the workload cluster at all? Would it not be better to have a separate mode of operation so that in an HCP, the CVO gets is config from CLI flags instead of the CRD, where hypershift operators can configure it directly based on some hypershift specific API (be that a new version of this CRD or part of an existing HCP API)

Copy link
Contributor Author

@DavidHurta DavidHurta Oct 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why I am a fan of the CRD being in the hosted cluster is:

  • We simplify and streamline the flow of configuration.
    • As a maintainer of the CVO, I have to simply support the CVO to reconcile a new one configuration CR in the cluster. It does not matter whether the CVO is in the hosted cluster or in a standalone cluster. This means only one logic, less development, less prone to bugs, easier to maintain, etc.
    • The HyperShift does not have to parse a new API into flags/environment variables/config file (and the CVO parsing the flags back to the internal representation of the configuration). It simply propagates one API into another verbatim. Minimal risk of breaking things, simpler logic. More future-proof. Adding new fields into the ClusterVersionOperator CRD requires minimal effort on the HyperShift side and the CVO parsing side.
    • We do not introduce a new CRD in the management cluster.
    • No disruptions to a running hosted CVO.

It does have its cons. A new validating admission policy and a new CRD and a CR in the hosted cluster. Although the configuration itself does not seem to be heavily utilized in the future (it is just a configuration CR to the CVO), the operational impact seems minimal (a new one CRD, one CR, and an admission policy in the hosted cluster). So a one-time constant cost to a hosted cluster that will result in more simple development and testing.

However, you have a great point, and I do truly appreciate you pushing for not creating new resources unnecessarily and just because it seems easier (my words). It resulted in me thinking of more alternatives, which is a great thing.

the CVO gets is config from CLI flags instead of the CRD

It's easy to parse things such as "Normal" to --v=2 or "Debug" to --v4 in the hosted CVO deployment. I think we can do that for the currently proposed ClusterVersionOperator CRD, and I will probably propose to do that this week. But I do worry that in a distant potential future we could have this discussion again when new complex fields are introduced to the ClusterVersionOperator CRD. However, the CVO can also just start supporting configuration files/env vars.

@JoelSpeed, may I ask, out of curiosity, what do you see as the main downside of the CR being the hosted cluster? At the end of the day, I see it as a trade-off between development effort and the operational costs. However, you are a more experienced engineer, so I am probably missing some aspects.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This thread is converging with another, so you should review my comment here as well.

But, what I'm trying to avoid is a dissonance in UX. What I mean by this is when we expose APIs to the workload cluster admin that they cannot use. You are intending to write an API that you will have to guard against users updating, because "this isn't the right way to update this field", they have to "go over there and eventually it will get updated here"

It would be better not to expose that API at all to the end user, it is useless to them as an API, and becomes implementation for us.

Another consideration is whether there are differing requirements for CVOs in HCP vs standalone. Often our projects don't work quite the same way in HCP as they do on a standalone cluster, and, I can imagine a future where you want a feature either only on HCP, or only on standalone.

If the API to control CVO in an HCP is baked into an HCP specific type, we choose exactly what subset of the configuration we want to expose. If you directly embed the ClusterVersionOperator Spec, every time HCP updates the o/api dependency, you end up with potentially new fields being added to the HCP API, potentially, unintentionally. Which then might lead to awkward situations again where you're exposing API fields that the user cannot use.

This is one of the reasons that HyperShift uses the NodePool abstraction, to prevent users from having to worry about lots of fields in underlying APIs that we do not considered supported on HCP, it's a similar story here IMO

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for the insights, I appreciate them. I was missing the dissonance in UX. I now see the benefits of your suggestions more clearly.

The enhancement now proposes to use a configuration file for a hosted CVO. No new CRDs in a management cluster. No new CRDs in a hosted cluster.

enhancements/update/cvo-log-level-api.md Outdated Show resolved Hide resolved
enhancements/update/cvo-log-level-api.md Outdated Show resolved Hide resolved
enhancements/update/cvo-log-level-api.md Outdated Show resolved Hide resolved
enhancements/update/cvo-log-level-api.md Outdated Show resolved Hide resolved
enhancements/update/cvo-log-level-api.md Outdated Show resolved Hide resolved
enhancements/update/cvo-log-level-api.md Show resolved Hide resolved
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Oct 15, 2024
Copy link
Contributor

openshift-ci bot commented Oct 15, 2024

New changes are detected. LGTM label has been removed.

enhancements/update/cvo-log-level-api.md Outdated Show resolved Hide resolved
enhancements/update/cvo-log-level-api.md Outdated Show resolved Hide resolved
enhancements/update/cvo-log-level-api.md Outdated Show resolved Hide resolved
enhancements/update/cvo-log-level-api.md Outdated Show resolved Hide resolved

This enhancement proposes to create a new `CustomResourceDefinition` (CRD)
called `clusterversionoperators.operator.openshift.io`. The new type will be part
of the [`github.com/openshift/api/operator/v1alpha1`][github.com/openshift/api/operator/v1alpha1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this link is broken.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird. The link works for me. At least after the hyperlink is rendered. It utilizes reference style links [1].

[1] https://www.markdownguide.org/basic-syntax/#reference-style-links

`github.com/openshift/api/operator/v1` package can be referenced, including a
hosted CVO configuration API. The following changes are proposed for the
[HyperShift API][api/hypershift].

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, yes, thanks for noticing. The ClusterVersionOperatorConfiguration feature gate is now proposed to be used here as well. I am not 100% sure whether a different new feature gate should be created just for the OperatorConfiguration field.

Copy link
Contributor

openshift-ci bot commented Nov 4, 2024

@DavidHurta: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants