-
Notifications
You must be signed in to change notification settings - Fork 471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OTA-1029: Add a CVO Log Level API #1492
base: master
Are you sure you want to change the base?
Conversation
@Davoska: This pull request references OTA-1029 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Skipping CI for Draft Pull Request. |
/test all |
923db57
to
9ad307f
Compare
/test all |
/cc |
1056f05
to
8380ff6
Compare
PTAL, reviewers @LalatenduMohanty @wking @petr-muller |
PTAL, API approver @deads2k |
8380ff6
to
3e5a661
Compare
3e5a661
to
a99c2f5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, minor comment inline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: petr-muller The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
A hosted CVO is located in the management cluster and accesses the hosted API | ||
server. As the new CR will be part of the OCP payload, it will be applied to the | ||
hosted cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, that seems dissonant to me. If CVO acts within the management cluster, why is the configuration for the CVO not also at the management cluster level. Have you spoken to anyone from HCP about this EP yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like the current solution for a bunch of hosted operators placed in the management cluster (cluster-image-registry-operator
, dns-operator
, cluster-network-operator
,...). These operators are running in the management cluster; however, they access the hosted API server. However, some of them can have potentially access even to the management cluster API server, and access any configuration resources there. Let me reach out to the HyperShift team.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main issue with putting CRDs in the management cluster is that we have to handle multiple versions of OCP on the same management cluster.
We could add this to the HostedCluster API and have that be the source of truth, while simply reconciling (and preventing changes to) the CR within the hosted cluster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other alternative to consider is making this CRD namespaced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could add this to the HostedCluster API and have that be the source of truth, while simply reconciling (and preventing changes to) the CR within the hosted cluster
I really like the idea of this design. No compatibility issues regarding the management cluster. Utilizing an existing pattern of flow of the information from the HostedCluster API to the hosted cluster. The hosted CVO is not the wiser and does not need to even access the management cluster API server. The complexity seems much lower. We only need to come up with the HyperShift API change.
The ClusterConfiguration
API seems like a nice place; however, its purpose seems to be oriented around the configuration API (github.com/openshift/api/config/v1
), meaning configuration for OCP components rather than just operators.
I don't see any applicable existing APIs that we could use to reference the new proposed ClusterVersionOperator
API. We could introduce a new OperatorConfiguration
API where various APIs from the github.com/openshift/api/operator/v1
package could be referenced, including the newly proposed ClusterVersionOperator
API. What do you think?
Something like:
// OperatorConfiguration specifies configuration for individual OCP operators in the
// cluster, represented as embedded resources that correspond to the openshift
// operator API.
type OperatorConfiguration struct {
// ClusterVersionOperator specifies the configuration for the Cluster Version Operator in the HostedCluster.
// +optional
ClusterVersionOperator *operatorv1.ClusterVersionOperatorSpec `json:"clusterVersionOperator,omitempty"`
}
type HostedClusterSpec struct {
...
// OperatorConfiguration specifies configuration for individual OCP operators in the
// cluster, represented as embedded resources that correspond to the openshift
// operator API.
//
// +kubebuilder:validation:Optional
// +optional
OperatorConfiguration *OperatorConfiguration `json:"operatorConfiguration,omitempty"`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other alternative to consider is making this CRD namespaced.
That doesn't make sense from an OCP standpoint, since you'll likely need different fields for a namespaced version of this.
We could add this to the HostedCluster API and have that be the source of truth, while simply reconciling (and preventing changes to) the CR within the hosted cluster
If CVO is running in the management cluster, why would we want to have this CRD in the workload cluster at all? Would it not be better to have a separate mode of operation so that in an HCP, the CVO gets is config from CLI flags instead of the CRD, where hypershift operators can configure it directly based on some hypershift specific API (be that a new version of this CRD or part of an existing HCP API)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason why I am a fan of the CRD being in the hosted cluster is:
- We simplify and streamline the flow of configuration.
- As a maintainer of the CVO, I have to simply support the CVO to reconcile a new one configuration CR in the cluster. It does not matter whether the CVO is in the hosted cluster or in a standalone cluster. This means only one logic, less development, less prone to bugs, easier to maintain, etc.
- The HyperShift does not have to parse a new API into flags/environment variables/config file (and the CVO parsing the flags back to the internal representation of the configuration). It simply propagates one API into another verbatim. Minimal risk of breaking things, simpler logic. More future-proof. Adding new fields into the
ClusterVersionOperator
CRD requires minimal effort on the HyperShift side and the CVO parsing side. - We do not introduce a new CRD in the management cluster.
- No disruptions to a running hosted CVO.
It does have its cons. A new validating admission policy and a new CRD and a CR in the hosted cluster. Although the configuration itself does not seem to be heavily utilized in the future (it is just a configuration CR to the CVO), the operational impact seems minimal (a new one CRD, one CR, and an admission policy in the hosted cluster). So a one-time constant cost to a hosted cluster that will result in more simple development and testing.
However, you have a great point, and I do truly appreciate you pushing for not creating new resources unnecessarily and just because it seems easier (my words). It resulted in me thinking of more alternatives, which is a great thing.
the CVO gets is config from CLI flags instead of the CRD
It's easy to parse things such as "Normal"
to --v=2
or "Debug"
to --v4
in the hosted CVO deployment. I think we can do that for the currently proposed ClusterVersionOperator
CRD, and I will probably propose to do that this week. But I do worry that in a distant potential future we could have this discussion again when new complex fields are introduced to the ClusterVersionOperator
CRD. However, the CVO can also just start supporting configuration files/env vars.
@JoelSpeed, may I ask, out of curiosity, what do you see as the main downside of the CR being the hosted cluster? At the end of the day, I see it as a trade-off between development effort and the operational costs. However, you are a more experienced engineer, so I am probably missing some aspects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This thread is converging with another, so you should review my comment here as well.
But, what I'm trying to avoid is a dissonance in UX. What I mean by this is when we expose APIs to the workload cluster admin that they cannot use. You are intending to write an API that you will have to guard against users updating, because "this isn't the right way to update this field", they have to "go over there and eventually it will get updated here"
It would be better not to expose that API at all to the end user, it is useless to them as an API, and becomes implementation for us.
Another consideration is whether there are differing requirements for CVOs in HCP vs standalone. Often our projects don't work quite the same way in HCP as they do on a standalone cluster, and, I can imagine a future where you want a feature either only on HCP, or only on standalone.
If the API to control CVO in an HCP is baked into an HCP specific type, we choose exactly what subset of the configuration we want to expose. If you directly embed the ClusterVersionOperator Spec, every time HCP updates the o/api dependency, you end up with potentially new fields being added to the HCP API, potentially, unintentionally. Which then might lead to awkward situations again where you're exposing API fields that the user cannot use.
This is one of the reasons that HyperShift uses the NodePool abstraction, to prevent users from having to worry about lots of fields in underlying APIs that we do not considered supported on HCP, it's a similar story here IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for the insights, I appreciate them. I was missing the dissonance in UX. I now see the benefits of your suggestions more clearly.
The enhancement now proposes to use a configuration file for a hosted CVO. No new CRDs in a management cluster. No new CRDs in a hosted cluster.
New changes are detected. LGTM label has been removed. |
195f97f
to
3bd86f4
Compare
|
||
This enhancement proposes to create a new `CustomResourceDefinition` (CRD) | ||
called `clusterversionoperators.operator.openshift.io`. The new type will be part | ||
of the [`github.com/openshift/api/operator/v1alpha1`][github.com/openshift/api/operator/v1alpha1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this link is broken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weird. The link works for me. At least after the hyperlink is rendered. It utilizes reference style links [1].
[1] https://www.markdownguide.org/basic-syntax/#reference-style-links
`github.com/openshift/api/operator/v1` package can be referenced, including a | ||
hosted CVO configuration API. The following changes are proposed for the | ||
[HyperShift API][api/hypershift]. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This API change needs to be behind a feature gate initially.
See https://github.com/openshift/hypershift/blob/main/api/hypershift/v1beta1/nodepool_types.go#L438-L441 for an example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, yes, thanks for noticing. The ClusterVersionOperatorConfiguration
feature gate is now proposed to be used here as well. I am not 100% sure whether a different new feature gate should be created just for the OperatorConfiguration
field.
@DavidHurta: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
This enhancement describes the API changes needed to provide a simple way of dynamically changing the verbosity level of Cluster Version Operator's logs.
This pull request references https://issues.redhat.com/browse/OTA-1029