WIP: MON-3547: Configuration CRD for the Cluster Monitoring Operator #1627

marioferh · 2024-05-15T13:44:52Z

No description provided.

openshift-ci · 2024-05-15T13:46:54Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign jan--f for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

enhancements/monitoring/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

simonpasquier · 2024-05-23T07:58:13Z

/cc @jan--f

openshift-ci-robot · 2024-05-23T14:43:14Z

@marioferh: This pull request references MON-3547 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.17.0" version, but no target version was set.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Signed-off-by: Mario Fernandez <mariofer@redhat.com>

enhancements/monitoring/crd-based-cmo.md

simonpasquier · 2024-05-30T13:23:56Z

enhancements/monitoring/crd-based-cmo.md

+
+### Transition to the user
+
+- How the user could adopt CR instead of configmap.


IMHO the upgrade experience brings many questions (not limited to):

Which tools/mechanisms do we provide to cluster admins to minimize the migration burden? Should CMO automatically create the custom resource from the current configmap when the CRD becomes GA? Or do we want a semi-automated approach with explicit approval from the cluster admins?

Should we provide an escape hatch (via some annotation?) telling CMO to ignore the custom resource and only use the configmap in case something goes bad?

How do we tell cluster admins that they need to migrate? CMO conditions could be an option as well as alerting and Insights?

How do we know how many clusters have migrated to the new CRD? Telemetry I guess.

How long should we keep the support for the configmap?

Thinking about it

Given how CMO works I think we might want to prioritize a low friction experience. Ideally admins won't have to do anything even after we enable the feature gate by default...if possible.
It seems that either the configmaps and CRs coexisting or a translation from configmap to CRD will have to be part of this.

Currently I lean towards the coexisting approach. I think this allows more flexibility when adding CRD fields. For example we might want to add stricter validations to a field in the CRD. With the translation approach we might not be able to pull something in without user intervention.
Otoh both config artifacts coexisting might raise issues with how a feature is configured. Perhaps that could be clarified through status conditions and additional logging?

enhancements/monitoring/crd-based-cmo.md

Signed-off-by: Mario Fernandez <mariofer@redhat.com>

enhancements/monitoring/crd-based-cmo.md

Co-authored-by: Jan Fajerski <jan--f@users.noreply.github.com>

openshift-bot · 2024-07-30T01:15:37Z

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

JoelSpeed

At the moment, it seems there are a lot of open questions and things that have not yet been thought through. It would be good to get some of those open questions discussed before we get too far down the road of implementing an API

JoelSpeed · 2024-07-30T12:22:42Z

enhancements/monitoring/crd-based-cmo.md

+    ClusterMonitoringConfiguration
+    UserWorkloadConfiguration


Why 2 CRDs instead of one?

Because they can have different personas managing each. For instance with managed OpenShift clusters (OSD/ROSA), the platform SREs own the cluster (platform) monitoring stack while platform users (e.g. customers) own the user-defined monitoring stack.

Signed-off-by: Mario Fernandez <mariofer@redhat.com>

simonpasquier

It looks like the proposal lacks a few sections from the template (like "Risks and mitigations" for instance). Can you add them?

enhancements/monitoring/monitoring-config.md

simonpasquier · 2024-08-06T12:43:11Z

enhancements/monitoring/monitoring-config.md

+
+### Overview
+
+- Replace confimgaps with CRD:


Can you add a bit more details on the current implementation and the overall context for readers not familiar with the product?

Describe the 2 stacks (platform & UWM)

How does an admin configure each stack? Relationship between the 2?

Explain that there can be 2 personas responsible for each stack (the managed OpenShift use case).

A link to https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/multi-tenant-alerting.md would be useful.

Add improvements

simonpasquier · 2024-08-06T12:45:32Z

enhancements/monitoring/monitoring-config.md

+	UserWorkloadConfiguration      *UserWorkloadConfiguration      `json:"-"`
+}
+```
+


Maybe add here the general direction that we're going to keep more or less the structure of ClusterMonitoringConfiguration and UserWorkloadConfiguration but align them to the OpenShift API standards.

Add improvements

enhancements/monitoring/monitoring-config.md

Co-authored-by: Simon Pasquier <spasquie@redhat.com>

Signed-off-by: Mario Fernandez <mariofer@redhat.com>

openshift-ci · 2024-08-09T13:05:55Z

@marioferh: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/markdownlint	`1ed7227`	link	true	`/test markdownlint`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

JoelSpeed · 2024-08-12T07:39:19Z

enhancements/monitoring/monitoring-config.md

+## Motivation
+
+* The specification is well known and to a degree self-documenting.
+* The Kubernetes API Server validate custom resources based on their API specification, so users get immediate feedback on errors instead of checking the ClusterOperator object and the cluster monitoring operator's logs. Custom expression language (CEL) even allows to write complex validation logic, including cross-field tests.


Nit: Common expressions language, not Custom expressions language

JoelSpeed · 2024-08-12T07:43:10Z

enhancements/monitoring/monitoring-config.md

+- Replace confimgaps with CRD:
+
+```
+  type Config struct {


Is this the existing config?

JoelSpeed · 2024-08-12T07:43:24Z

enhancements/monitoring/monitoring-config.md

+
+
+```
+type ClusterMonitoringConfiguration struct {


Is this the existing state? Or desired?

JoelSpeed · 2024-08-12T07:44:45Z

enhancements/monitoring/monitoring-config.md

+
+
+- Switch mecanishm
+- Both ConfigMap and CRD will coexist for a while until the transition is complete.


How will clusters that do not currently have a CR configured, but do have the configmap, continue to function without manual intervention as you go through upgrades?

JoelSpeed · 2024-08-12T07:45:51Z

enhancements/monitoring/monitoring-config.md

+- Extensive testing will be required to validate the new behavior.
+- If there are different fields in CRD and ConfigMap, the field in the ConfigMap will take precedence.
+- Some fields in CRD and ConfigMap diverge due to API requirements. Helper functions will be needed to resolve this.
+- The end user will be able to choose to use ConfigMap or CRD at any time, keeping in mind that the fields in the CRD will gradually expand.


Will this be a conscious choice? Is there some explicit switch for this or is it just a case of if one or the other exists?

does user still can choose to use ConfigMap or CRD after the feature GA?

juzhao · 2024-08-12T09:54:29Z

enhancements/monitoring/monitoring-config.md

+[More info](https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/multi-tenant-alerting.md)
+
+
+- Replace confimgaps with CRD:


Suggested change

- Replace confimgaps with CRD:

- Replace configmaps with CRD:

juzhao · 2024-08-12T09:54:49Z

enhancements/monitoring/monitoring-config.md

+        Add example CustomResource: config/examples/clustermonitoringoperator.yaml
+    - Client codegen 
+    - Reconcile logic: https://github.com/openshift/cluster-monitoring-operator/pull/2350
+    - Add decoupling Confimgap / CustomResource:


Suggested change

- Add decoupling Confimgap / CustomResource:

- Add decoupling Configmap / CustomResource:

juzhao · 2024-08-12T14:13:15Z

enhancements/monitoring/monitoring-config.md

+## Motivation
+
+* The specification is well known and to a degree self-documenting.
+* The Kubernetes API Server validate custom resources based on their API specification, so users get immediate feedback on errors instead of checking the ClusterOperator object and the cluster monitoring operator's logs. Custom expression language (CEL) even allows to write complex validation logic, including cross-field tests.


Suggested change

* The Kubernetes API Server validate custom resources based on their API specification, so users get immediate feedback on errors instead of checking the ClusterOperator object and the cluster monitoring operator's logs. Custom expression language (CEL) even allows to write complex validation logic, including cross-field tests.

* The Kubernetes API Server validates custom resources based on their API specification, so users get immediate feedback on errors instead of checking the ClusterOperator object and the cluster monitoring operator's logs. Custom expression language (CEL) even allows to write complex validation logic, including cross-field tests.

juzhao · 2024-08-12T14:14:02Z

enhancements/monitoring/monitoring-config.md

+### Migration path
+
+
+- Switch mecanishm


Suggested change

- Switch mecanishm

- Switch mechanism

juzhao · 2024-08-12T14:15:30Z

enhancements/monitoring/monitoring-config.md

+
+## Design Details
+
+To initiate the process, let's establish a feature gate that will serve as the entry point for implementing a CRD configuration approach. This strategy enables us to make incremental advancements without the immediate burden of achieving complete feature equivalence with the config map. We can commence with a the basics and progressively incorporate additional functionalities as they develop.


Suggested change

To initiate the process, let's establish a feature gate that will serve as the entry point for implementing a CRD configuration approach. This strategy enables us to make incremental advancements without the immediate burden of achieving complete feature equivalence with the config map. We can commence with a the basics and progressively incorporate additional functionalities as they develop.

To initiate the process, let's establish a feature gate that will serve as the entry point for implementing a CRD configuration approach. This strategy enables us to make incremental advancements without the immediate burden of achieving complete feature equivalence with the config map. We can commence with the basics and progressively incorporate additional functionalities as they develop.

juzhao · 2024-08-12T14:17:17Z

enhancements/monitoring/monitoring-config.md

+  - "@danielmellado"
+  - "@jan--f"
+  - "@moadz"
+  - "simonpasquier"


Suggested change

- "simonpasquier"

- "@simonpasquier"

juzhao · 2024-08-12T14:17:33Z

enhancements/monitoring/monitoring-config.md

+reviewers:
+  - "@jan--f"
+  - "@moadz"
+  - "simonpasquier"


Suggested change

- "simonpasquier"

- "@simonpasquier"

juzhao · 2024-08-12T14:17:43Z

enhancements/monitoring/monitoring-config.md

+approvers:
+  - "@jan--f"
+  - "@moadz"
+  - "simonpasquier"


Suggested change

- "simonpasquier"

- "@simonpasquier"

juzhao · 2024-08-12T14:25:52Z

enhancements/monitoring/monitoring-config.md

+
+### Overview
+
+Currently in CMO a config map provides a way to inject configuration data into pods. There are two configmaps for the different stacks:


Suggested change

Currently in CMO a config map provides a way to inject configuration data into pods. There are two configmaps for the different stacks:

Currently in CMO, a config map provides a way to inject configuration data into pods. There are two configmaps for the different stacks:

juzhao · 2024-08-12T14:27:16Z

enhancements/monitoring/monitoring-config.md

+
+Currently in CMO a config map provides a way to inject configuration data into pods. There are two configmaps for the different stacks:
+
+    cluster-monitoring-configmap: Default platform monitoring components. A set of platform monitoring components are installed in the openshift-monitoring project by default during an OpenShift Container Platform installation. This provides monitoring for core cluster components including Kubernetes services. The default monitoring stack also enables remote health monitoring for clusters.


Suggested change

cluster-monitoring-configmap: Default platform monitoring components. A set of platform monitoring components are installed in the openshift-monitoring project by default during an OpenShift Container Platform installation. This provides monitoring for core cluster components including Kubernetes services. The default monitoring stack also enables remote health monitoring for clusters.

cluster-monitoring-config: Default platform monitoring components. A set of platform monitoring components are installed in the openshift-monitoring project by default during an OpenShift Container Platform installation. This provides monitoring for core cluster components including Kubernetes services. The default monitoring stack also enables remote health monitoring for clusters.

juzhao · 2024-08-12T14:40:22Z

enhancements/monitoring/monitoring-config.md

+
+One proposal for a minimal DoD was:
+    - Feature gate in openshift/api
+    - Api types moved to openshift/api


Suggested change

- Api types moved to openshift/api

- API types moved to openshift/api

juzhao · 2024-08-12T15:01:34Z

enhancements/monitoring/monitoring-config.md

+To initiate the process, let's establish a feature gate that will serve as the entry point for implementing a CRD configuration approach. This strategy enables us to make incremental advancements without the immediate burden of achieving complete feature equivalence with the config map. We can commence with a the basics and progressively incorporate additional functionalities as they develop.
+
+One proposal for a minimal DoD was:
+    - Feature gate in openshift/api


Suggested change

- Feature gate in openshift/api

- Feature gate in openshift/api

it seems you want to get a list, but the format is

One proposal for a minimal DoD was: - Feature gate in openshift/api - Api types moved to openshift/api - CRD Initial dev https://github.com/openshift/api/pull/1929 Add controller-gen logic to makefile Add API to api/config/v1 Add Generated CRD: config/v1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitoring.crd.yaml Add example CustomResource: config/examples/clustermonitoringoperator.yaml - Client codegen - Reconcile logic: https://github.com/openshift/cluster-monitoring-operator/pull/2350 - Add decoupling Confimgap / CustomResource: Controller logic is strongly dependant of *manifests.Config struct.

juzhao · 2024-08-12T15:05:40Z

enhancements/monitoring/monitoring-config.md

+          resources:
+            requests:
+              storage: 20Gi
+


could remove the empty line

juzhao · 2024-08-12T15:06:52Z

enhancements/monitoring/monitoring-config.md

+      spec:
+        resources:
+          requests:
+            storage: 20G`


Suggested change

storage: 20G`

storage: 20G

or

Suggested change

storage: 20G`

storage: 20Gi

juzhao · 2024-08-12T15:08:01Z

enhancements/monitoring/monitoring-config.md

+### Test Plan
+
+- Unit tests for the feature
+- e2e tests covering the feature


I think upgrade testing should also be covered

openshift-bot · 2024-08-20T00:45:19Z

Stale enhancement proposals rot after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Rotten proposals close after an additional 7d of inactivity.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 15, 2024

openshift-ci bot requested review from moadz and simonpasquier May 15, 2024 13:46

openshift-ci bot requested a review from jan--f May 23, 2024 07:58

marioferh force-pushed the crd_based_cmo branch 2 times, most recently from f9b89e6 to 0705ade Compare May 23, 2024 14:41

marioferh changed the title ~~WIP: CRD based CMO~~ WIP: MON-3547: CRD based CMO May 23, 2024

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 23, 2024

marioferh force-pushed the crd_based_cmo branch from 0705ade to 895ed7c Compare May 30, 2024 07:48

CRD based CMO

751fbae

Signed-off-by: Mario Fernandez <mariofer@redhat.com>

marioferh force-pushed the crd_based_cmo branch from 895ed7c to 751fbae Compare May 30, 2024 07:48

marioferh commented May 30, 2024

View reviewed changes

enhancements/monitoring/crd-based-cmo.md Outdated Show resolved Hide resolved

jan--f reviewed May 30, 2024

View reviewed changes

enhancements/monitoring/crd-based-cmo.md Outdated Show resolved Hide resolved

simonpasquier reviewed May 30, 2024

View reviewed changes

add comments from review

97e5692

Signed-off-by: Mario Fernandez <mariofer@redhat.com>

marioferh force-pushed the crd_based_cmo branch from 9d3be00 to 97e5692 Compare June 10, 2024 12:16

marioferh changed the title ~~WIP: MON-3547: CRD based CMO~~ WIP: MON-3547: Configuration CRD for the Cluster Monitoring Operator Jun 12, 2024

marioferh mentioned this pull request Jun 12, 2024

MON-3902: add initial Monitoring CRD api openshift/api#1929

Open

marioferh commented Jun 17, 2024

View reviewed changes

enhancements/monitoring/crd-based-cmo.md Outdated Show resolved Hide resolved

jan--f reviewed Jun 19, 2024

View reviewed changes

enhancements/monitoring/crd-based-cmo.md Outdated Show resolved Hide resolved

Update enhancements/monitoring/crd-based-cmo.md

81e8830

Co-authored-by: Jan Fajerski <jan--f@users.noreply.github.com>

openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 30, 2024

JoelSpeed reviewed Jul 30, 2024

View reviewed changes

update with comments

d5b2dac

Signed-off-by: Mario Fernandez <mariofer@redhat.com>

marioferh force-pushed the crd_based_cmo branch from f803b85 to d5b2dac Compare August 2, 2024 09:21

simonpasquier reviewed Aug 6, 2024

View reviewed changes

marioferh and others added 2 commits August 7, 2024 15:17

Apply suggestions from code review

0c65801

Co-authored-by: Simon Pasquier <spasquie@redhat.com>

Add review suggestions

1ed7227

Signed-off-by: Mario Fernandez <mariofer@redhat.com>

JoelSpeed reviewed Aug 12, 2024

View reviewed changes

juzhao reviewed Aug 12, 2024

View reviewed changes

enhancements/monitoring/monitoring-config.md

### Migration path

- Switch mecanishm

Copy link

juzhao Aug 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

- Switch mecanishm

- Switch mechanism

juzhao reviewed Aug 12, 2024

View reviewed changes

enhancements/monitoring/monitoring-config.md

resources:

requests:

storage: 20Gi

Copy link

juzhao Aug 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could remove the empty line

juzhao reviewed Aug 12, 2024

View reviewed changes

openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 20, 2024


		### Transition to the user

		- How the user could adopt CR instead of configmap.



		- Switch mecanishm
		- Both ConfigMap and CRD will coexist for a while until the transition is complete.

		[More info](https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/multi-tenant-alerting.md)


		- Replace confimgaps with CRD:

	- Replace confimgaps with CRD:
	- Replace configmaps with CRD:

	- Add decoupling Confimgap / CustomResource:
	- Add decoupling Configmap / CustomResource:

	* The Kubernetes API Server validate custom resources based on their API specification, so users get immediate feedback on errors instead of checking the ClusterOperator object and the cluster monitoring operator's logs. Custom expression language (CEL) even allows to write complex validation logic, including cross-field tests.
	* The Kubernetes API Server validates custom resources based on their API specification, so users get immediate feedback on errors instead of checking the ClusterOperator object and the cluster monitoring operator's logs. Custom expression language (CEL) even allows to write complex validation logic, including cross-field tests.


		## Design Details

		To initiate the process, let's establish a feature gate that will serve as the entry point for implementing a CRD configuration approach. This strategy enables us to make incremental advancements without the immediate burden of achieving complete feature equivalence with the config map. We can commence with a the basics and progressively incorporate additional functionalities as they develop.


		### Overview

		Currently in CMO a config map provides a way to inject configuration data into pods. There are two configmaps for the different stacks:

	Currently in CMO a config map provides a way to inject configuration data into pods. There are two configmaps for the different stacks:
	Currently in CMO, a config map provides a way to inject configuration data into pods. There are two configmaps for the different stacks:


		Currently in CMO a config map provides a way to inject configuration data into pods. There are two configmaps for the different stacks:

		cluster-monitoring-configmap: Default platform monitoring components. A set of platform monitoring components are installed in the openshift-monitoring project by default during an OpenShift Container Platform installation. This provides monitoring for core cluster components including Kubernetes services. The default monitoring stack also enables remote health monitoring for clusters.

	cluster-monitoring-configmap: Default platform monitoring components. A set of platform monitoring components are installed in the openshift-monitoring project by default during an OpenShift Container Platform installation. This provides monitoring for core cluster components including Kubernetes services. The default monitoring stack also enables remote health monitoring for clusters.
	cluster-monitoring-config: Default platform monitoring components. A set of platform monitoring components are installed in the openshift-monitoring project by default during an OpenShift Container Platform installation. This provides monitoring for core cluster components including Kubernetes services. The default monitoring stack also enables remote health monitoring for clusters.

	- Api types moved to openshift/api
	- API types moved to openshift/api

	- Feature gate in openshift/api
	- Feature gate in openshift/api

WIP: MON-3547: Configuration CRD for the Cluster Monitoring Operator #1627

Are you sure you want to change the base?

WIP: MON-3547: Configuration CRD for the Cluster Monitoring Operator #1627

Conversation

marioferh commented May 15, 2024

openshift-ci bot commented May 15, 2024

simonpasquier commented May 23, 2024

openshift-ci-robot commented May 23, 2024 • edited by openshift-ci bot Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

openshift-bot commented Jul 30, 2024

JoelSpeed left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

simonpasquier left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

openshift-ci bot commented Aug 9, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

juzhao Aug 12, 2024 • edited Loading

Choose a reason for hiding this comment

juzhao Aug 12, 2024 • edited Loading

Choose a reason for hiding this comment

openshift-bot commented Aug 20, 2024

openshift-ci-robot commented May 23, 2024 •

edited by openshift-ci bot

Loading

juzhao Aug 12, 2024 •

edited

Loading

juzhao Aug 12, 2024 •

edited

Loading