-
Notifications
You must be signed in to change notification settings - Fork 469
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Signed-off-by: Mario Fernandez <mariofer@redhat.com>
- Loading branch information
Showing
1 changed file
with
152 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,152 @@ | ||
--- | ||
title: CRD Based CMO | ||
authors: | ||
- "@marioferh" | ||
- "@danielmellado" | ||
reviewers: | ||
approvers: | ||
creation-date: 2024-04-26 | ||
last-updated: 2024-04-26 | ||
status: provisional | ||
--- | ||
|
||
# CRD Based CMO | ||
|
||
## Release Signoff Checklist | ||
|
||
- [ ] Enhancement is `implementable` | ||
- [ ] Design details are appropriately documented from clear requirements | ||
- [ ] Test plan is defined | ||
- [ ] Graduation criteria for dev preview, tech preview, GA | ||
- [ ] User-facing documentation is created in [openshift-docs](https://github.com/openshift/openshift-docs/) | ||
|
||
## Summary | ||
|
||
* Currently, the monitoring stack is configured using a configmap. In OpenShift though the best practice is to configure operators using custom resources. | ||
|
||
|
||
## Motivation | ||
|
||
* The specification is well known and to a degree self-documenting | ||
* We can specify validation and legal values right in the CRD. | ||
* The APIServer will validate user resources based on our specifications, so users get immediate feedback on errors instead of having to check if their config was applied and check logs. | ||
* Many users expect to interact with operators through a CRD | ||
* Compatible with GitOps workflows. | ||
* We can add [cross]validation rules to CRD fields to avoid misconfigurations | ||
* End users get a much faster feedback loop. No more applying the config and scanning logs if things don't look right. The API server will give immediate feedback | ||
* Organizational users (such as ACM) can manage a single resource and observe its status | ||
|
||
|
||
### Goals | ||
|
||
- Replace configmaps with CRD | ||
- Smooth transition for users | ||
|
||
## Proposal | ||
|
||
## Design Details | ||
|
||
To initiate the process, let's establish a feature gate that will serve as the entry point for implementing a CRD configuration approach. This strategy enables us to make incremental advancements without the immediate burden of achieving complete feature equivalence with the config map. We can commence with a the basics and progressively incorporate additional functionalities as they develop. | ||
|
||
One proposal for a minimal DoD was: | ||
- Feature gate | ||
- CRD Initial dev https://github.com/openshift/cluster-monitoring-operator/pull/2347/ | ||
Add controller-gen logic to makefile | ||
Add API to pkg/apis/cmo/v1 | ||
Add Generated CRD: config/crd/bases/example.com_clustermonitoringoperators.yaml | ||
Add example CustomResource: config/examples/clustermonitoringoperator.yaml | ||
- Client codegen: https://github.com/openshift/cluster-monitoring-operator/pull/2369 | ||
- Reconcile logic: https://github.com/openshift/cluster-monitoring-operator/pull/2350 | ||
- Add decoupling Confimgap / CustomResource: | ||
Controller logic is strongly dependant of *manifests.Config struct. | ||
|
||
|
||
|
||
### Overview | ||
|
||
- Replace confimgaps with CRD: | ||
cluster-monitoring-configmap | ||
user-workload-monitoring-config | ||
|
||
|
||
### Migration path | ||
|
||
Feature gate. | ||
Switch mecanishm? | ||
What to do if there are both CRD and confimgap? | ||
Precedence CRD over configmap? | ||
Should we compare CRD and configmap to check differences? | ||
|
||
### Issues | ||
|
||
- Bump golang 1.22 as it was not possible to generate code in 1.21 due to some restrictoins of generation scripts on 1.21 | ||
Waiting to bump go 1.22 in cmo | ||
|
||
- Decoupling Confimgap / CustomResource: | ||
Controller logic is strongly dependant of *manifests.Config struct. | ||
Should we translate CR into confimap? | ||
|
||
- Correct name for apiVersion? In monitoring.coreos.com/v1 are all prometheus operator components, should we create a new one? | ||
|
||
- Reconcile logic is not SDK compliand. Phase 2 of feature could be rely on sdk reconcile mecanism. | ||
|
||
- Operator best practises is one CRD by controller, how could we take this approach as we have different configmaps for CMO and UWM. | ||
|
||
- Refactor and clean up operator.go client.go manifests.config | ||
|
||
|
||
### Transition to the user | ||
|
||
- How the user could adopt CR instead of configmap. | ||
|
||
### Example configuration | ||
|
||
|
||
#### CRD | ||
|
||
apiVersion: cmo.example.com/v1 | ||
kind: ClusterMonitoringOperator | ||
metadata: | ||
name: clustermonitoringoperator | ||
namespace: openshift-monitoring | ||
spec: | ||
telemeterClient: | ||
enabled: true | ||
nodeSelector: | ||
kubernetes.io/os: linux | ||
tolerations: | ||
- operator: Exists | ||
prometheusK8s: | ||
volumeClaimTemplate: | ||
metadata: | ||
name: prometheus-data | ||
annotations: | ||
openshift.io/cluster-monitoring-drop-pvc: "yes" | ||
spec: | ||
resources: | ||
requests: | ||
storage: 20Gi | ||
|
||
|
||
### Test Plan | ||
|
||
- Unit tests for the feature | ||
- e2e tests covering the feature | ||
|
||
### Graduation Criteria | ||
|
||
From Tech Preview to GA | ||
|
||
#### Tech Preview -> GA | ||
|
||
- Ensure feature parity with OpenShift SDN egress router | ||
|
||
### Upgrade / Downgrade Strategy | ||
|
||
N/A | ||
|
||
### Version Skew Strategy | ||
|
||
N/A | ||
|
||
## Implementation History |