Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MON-3902: add initial Monitoring CRD api #1929

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

marioferh
Copy link
Contributor

@marioferh marioferh commented Jun 12, 2024

Initial Implementation of Configuration CRD for the Cluster Monitoring Operator

Related: [link]

As we have a FeatureGate, the idea is to implement component by component to verify the correct behavior and facilitate the reviews.

The API comes from https://github.com/openshift/cluster-monitoring-operator/blob/7015893ece5ed0a6f888fffe09b6d10a807d6901/pkg/manifests/config.go#L51

@marioferh marioferh marked this pull request as ready for review June 12, 2024 12:03
@marioferh marioferh changed the title WIP; add initial ClusterMonitoring CRD api WIP: add initial ClusterMonitoring CRD api Jun 12, 2024
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 12, 2024
Copy link
Contributor

openshift-ci bot commented Jun 12, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link
Contributor

openshift-ci bot commented Jun 12, 2024

Hello @marioferh! Some important instructions when contributing to openshift/api:
API design plays an important part in the user experience of OpenShift and as such API PRs are subject to a high level of scrutiny to ensure they follow our best practices. If you haven't already done so, please review the OpenShift API Conventions and ensure that your proposed changes are compliant. Following these conventions will help expedite the api review process for your PR.

@openshift-ci openshift-ci bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 12, 2024
config/v1/types_cmo.go Outdated Show resolved Hide resolved
@marioferh marioferh changed the title WIP: add initial ClusterMonitoring CRD api MON-3902: add initial ClusterMonitoring CRD api Jun 12, 2024
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 12, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented Jun 12, 2024

@marioferh: This pull request references MON-3902 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the sub-task to target the "4.17.0" version, but no target version was set.

In response to this:

Initial Implementation of Configuration CRD for the Cluster Monitoring Operator

Related: [link]

As we have a FeatureGate, the idea is to implement component by component to verify the correct behavior and facilitate the reviews.

The API comes from https://github.com/openshift/cluster-monitoring-operator/blob/7015893ece5ed0a6f888fffe09b6d10a807d6901/pkg/manifests/config.go#L51

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 12, 2024
@marioferh marioferh force-pushed the initial_cmo_api branch 3 times, most recently from 9603c29 to 9c70852 Compare June 12, 2024 17:14
config/v1/types_cmo.go Outdated Show resolved Hide resolved
@marioferh marioferh changed the title MON-3902: add initial ClusterMonitoring CRD api MON-3902: add initial Monitoring CRD api Jun 18, 2024
config/v1/types_monitoring.go Outdated Show resolved Hide resolved
config/v1/types_monitoring.go Outdated Show resolved Hide resolved
config/v1/types_monitoring.go Outdated Show resolved Hide resolved
config/v1/types_monitoring.go Outdated Show resolved Hide resolved
@marioferh
Copy link
Contributor Author

/retest-required

@JoelSpeed
Copy link
Contributor

/test e2e-upgrade-minor

@marioferh
Copy link
Contributor Author

/retest-required

Copy link
Contributor

@jan--f jan--f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some changes are needed here. I added a couple of suggestions as well. While we will go over this when things move to v1 once more in detail, it's worth getting details right from the get go too I think.

config/v1/types_cluster_monitoring.go Outdated Show resolved Hide resolved
config/v1/types_cluster_monitoring.go Outdated Show resolved Hide resolved
config/v1/types_cluster_monitoring.go Outdated Show resolved Hide resolved
config/v1/types_cluster_monitoring.go Outdated Show resolved Hide resolved
config/v1/types_cluster_monitoring.go Outdated Show resolved Hide resolved
config/v1/types_cluster_monitoring.go Outdated Show resolved Hide resolved
config/v1/types_cluster_monitoring.go Outdated Show resolved Hide resolved
config/v1/types_cluster_monitoring.go Outdated Show resolved Hide resolved
config/v1/types_cluster_monitoring.go Outdated Show resolved Hide resolved
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 23, 2024
Copy link
Contributor

openshift-ci bot commented Jul 23, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jan--f, marioferh
Once this PR has been reviewed and has the lgtm label, please assign knobunc for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@marioferh
Copy link
Contributor Author

@JoelSpeed

Copy link
Contributor

@JoelSpeed JoelSpeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


// MonitoringOperatorSpec defines the desired state of Cluster Monitoring Operator
type ClusterMonitoringSpec struct {
// `UserWorkload` set the deployment mode for user-defined monitoring.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this actually doing? What is user defined monitoring?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, User Workload Monitoring is monitoring for user-defined Projects. This is also mentioned on the OpenShift documentation but IMHO it'd be a bit overkill to link that within the api comments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah but we need some sort of description, at the moment someone doing an oc explain on this would have no cluster what this field controls

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I see, then we could use something along the lines of

Extension to the existing cluster monitoring stack to enable observability of user namespaces

config/v1/types_cluster_monitoring.go Outdated Show resolved Hide resolved
config/v1/types_cluster_monitoring.go Show resolved Hide resolved
const (
// UserDefinedDisabled disables monitoring for user-defined projects.
UserDefinedDisabled UserDefinedMode = "Disabled"
// UserDefinedNamespace enables monitoring for user-defined projects with namespace scoped tenancy.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"namespace scoped tenancy" is too vague for users. We'd need to go in greater details (not sure if the reference docs are doing a better job).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a recent discussion you mentioned isolation in the context of this. I thought that might lead to a better name. Perhaps UserDefinedNamespaceIsolation UserDefinedMode = "NamespaceIsolation"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please flesh out and update the godoc here, need to explain what this actually does

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Aug 1, 2024
Copy link
Contributor

openshift-ci bot commented Aug 1, 2024

New changes are detected. LGTM label has been removed.

config/v1/types_cluster_monitoring.go Outdated Show resolved Hide resolved
config/v1/types_cluster_monitoring.go Outdated Show resolved Hide resolved
config/v1/types_cluster_monitoring.go Outdated Show resolved Hide resolved
Comment on lines 76 to 78
// mode defines the different configurations of UserDefinedMontiring
// +kubebuilder:validation:Enum:="Disabled";"SingleNamespace"
Mode UserDefinedMode `json:"mode,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to expand the godoc to explain what the different available options are, and what they achieve

config/v1/types_cluster_monitoring.go Outdated Show resolved Hide resolved
const (
// UserDefinedDisabled disables monitoring for user-defined projects.
UserDefinedDisabled UserDefinedMode = "Disabled"
// UserDefinedNamespace enables monitoring for user-defined projects with namespace scoped tenancy.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please flesh out and update the godoc here, need to explain what this actually does

@marioferh marioferh force-pushed the initial_cmo_api branch 3 times, most recently from f8befdc to 0f7468a Compare September 23, 2024 16:09
@marioferh
Copy link
Contributor Author

/retest-required

config/v1/types_cluster_monitoring.go Outdated Show resolved Hide resolved
config/v1/types_cluster_monitoring.go Outdated Show resolved Hide resolved
@marioferh
Copy link
Contributor Author

/test e2e-aws-ovn-hypershift

// +openshift:api-approved.openshift.io=https://github.com/openshift/api/pull/1929
// +openshift:file-pattern=cvoRunLevel=0000_10,operatorName=config-operator,operatorOrdering=01
// +kubebuilder:object:root=true
// +kubebuilder:resource:path=clustermonitoring,scope=Cluster
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is going to be used in HyperShift, is this considered to be a part of the guest cluster, and therefore not visible from the management layer? If so, Cluster still makes sense

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is used in HyperShift

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know on which side it is used? Is it on the management or guest cluster side?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about it. @simonpasquier do you have this info?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iiuc, this should be used on the management cluster side, as this will eventually contain the settings for cluster admins. The guest clusters would rather use the UserDefinedMonitoring struct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to have a discussion with the HyperShift folks about this API then.

If this is cluster scoped, then it makes sense in the OCP world.

If it is namespace scoped, the it makes sense in HyperShift.

The converse of both of my previous statements are not true.

To be correct with our APIs, it would be preferable to have a Hosted<CRD> that is namespaced, and exists only in the management cluster, and then have no semblance of the feature exist within the guest cluster.

This pattern is coming up a lot at the moment with new designs for CRDs, will need to come up with a common path forward

CC @enxebre

config/v1/types_cluster_monitoring.go Show resolved Hide resolved

// ClusterMonitoringSpec defines the desired state of Cluster Monitoring Operator
type ClusterMonitoringSpec struct {
// userDefined set the deployment mode for user-defined monitoring in addition to the default platform monitoring.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is not present, what does it mean?

Copy link
Contributor Author

@marioferh marioferh Sep 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at the begging was optional and default disabled, but there were some issues with mandatory fields, at this point I think we can move it to mandatory and select between disabled and NamespaceIsolation.
In ConfigMap default is Disabled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Up to you, you could default it if you prefer, though we tend to avoid defaulting configuration APIs. What we typically do if there is a default is allow the value to be omitted and write something like

// Valid values are Disabled, NamespaceIsolation and omitted.
// When omitted this means no opinion and the platform is left to choose a reasonable default, which is subject to change over time.
// The current default is Disabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We declared it as mandatory, so the user has to add the field.

config/v1/types_cluster_monitoring.go Show resolved Hide resolved
config/v1/types_cluster_monitoring.go Show resolved Hide resolved
Comment on lines 82 to 85
// UserDefinedDisabled disables monitoring for user-defined projects. This restrics the default monitoring stack, installed in the openshift-monitoring project, to monitor only platform namespaces, which prevents any custom monitoring configurations or resources from being applied to user-defined namespaces.
UserDefinedDisabled UserDefinedMode = "Disabled"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this supersede the presence of the user monitoring CRs that you plan to add? Could we validate somehow that disabling this disallows the creation of those CRs? And this cannot be disabled if user monitoring CRs are present?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is disabled the user-defined projects are not enabled. The CR could be there just its not taking the config to create user-defined projects. As in the current ConfigMap we can switch the mode and activate user-defined projects.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CR could be there just its not taking the config to create user-defined projects.

Would this be surprising to a user who has configured their monitoring, how do they understand that the monitoring is not currently enabled, is something going to tell them?

Copy link
Contributor

@jan--f jan--f Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we add the user defined monitoring CRD, we'll communicate in its status if user defined monitoring is enabled or not.
I think this is better then preventing its creation. The prevention path would raise the question of what to do with existing user defined objects when user defined is disabled. We wouldn't want to delete them (disabling them might be temporary) so preventing addition (and editing) seems like not the best solution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the use case for being able to disable?

Could you prevent disabling, unless there were no user defined CRDs?

marioferh and others added 2 commits September 24, 2024 18:38
Add UserDefinedMonitoring
Add tests
Add FeatureGate
generate files

Signed-off-by: Mario Fernandez <mariofer@redhat.com>
Co-authored-by: Eliska Romanova <eromanov@redhat.com>
Signed-off-by: Mario Fernandez <mariofer@redhat.com>
Signed-off-by: Jan Fajerski <jfajersk@redhat.com>
@jan--f
Copy link
Contributor

jan--f commented Oct 1, 2024

@JoelSpeed Just fyi I'll take this over in @marioferh's absence.

@jan--f
Copy link
Contributor

jan--f commented Oct 14, 2024

/retest

@jan--f
Copy link
Contributor

jan--f commented Oct 14, 2024

@JoelSpeed PTAL?

// UserDefinedMonitoring config for user-defined projects.
type UserDefinedMonitoring struct {
// mode defines the different configurations of UserDefinedMonitoring
// Valid values are UserDefinedDisabled and UserDefinedNamespaceIsolation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The valid values should be the yaml, not Go versions

Suggested change
// Valid values are UserDefinedDisabled and UserDefinedNamespaceIsolation
// Valid values are Disabled and NamespaceIsolation.

// UserDefinedDisabled disables monitoring for user-defined projects. This restricts the default monitoring stack, installed in the openshift-monitoring project, to monitor only platform namespaces, which prevents any custom monitoring configurations or resources from being applied to user-defined namespaces.
UserDefinedDisabled UserDefinedMode = "Disabled"
// UserDefinedNamespaceIsolation enables monitoring for user-defined projects with namespace-scoped tenancy. This ensures that metrics, alerts, and monitoring data are isolated at the namespace level.
UserDefinedNamespaceIsolation UserDefinedMode = "NamespaceIsolation"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this really be NamespaceIsolated?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel strongly about this but I'd argue the current NamespaceIsolation is better. Its not the namespace that is isloated but the metrics access that is isolated by namespace values.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its not the namespace that is isloated but the metrics access that is isolated by namespace values.

You've used the word isolated to describe how the metrics are accessed here, and I think to you're point, if it were IsolatedNamespace, I'd agree that this would imply the namespace is isolated, but my suggestions was NamespaceIsolated

I'm fine with either, but my thought was that NamespaceIsolated felt more declarative, and inline with Disabled. Isolation is a process, Isolated is a state, and it's the state that declarative APIs are supposed to represent

Signed-off-by: Jan Fajerski <jfajersk@redhat.com>
Copy link
Contributor

openshift-ci bot commented Oct 22, 2024

@marioferh: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/verify 439f7fd link true /test verify
ci/prow/minor-e2e-upgrade-minor 439f7fd link true /test minor-e2e-upgrade-minor

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants