Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[metrics 2/x] Configure Prometheus Operator #687

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

zeeke
Copy link
Member

@zeeke zeeke commented Apr 23, 2024

Deploy the needed configuration to make the prometheus
operator to find and scrape the sriov-network-metrics-exporter
endpoints, including the ServiceMonitor, Role and RoleBinding

depends on:

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@zeeke zeeke changed the title [metrics 2/3] Configure Prometheus Operator [metrics 2/x] Configure Prometheus Operator Apr 23, 2024
@zeeke zeeke force-pushed the metrics-exporter-prometheus branch from 06e880c to 75e8305 Compare April 24, 2024 15:51
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@zeeke zeeke force-pushed the metrics-exporter-prometheus branch from 75e8305 to 06c86e1 Compare April 24, 2024 16:51
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@coveralls
Copy link

coveralls commented Apr 24, 2024

Pull Request Test Coverage Report for Build 9746872108

Details

  • 14 of 41 (34.15%) changed or added relevant lines in 3 files are covered.
  • 7 unchanged lines in 3 files lost coverage.
  • Overall coverage decreased (-0.08%) to 40.546%

Changes Missing Coverage Covered Lines Changed/Added Lines %
controllers/sriovoperatorconfig_controller.go 9 12 75.0%
controllers/helper.go 5 10 50.0%
pkg/utils/cluster.go 0 19 0.0%
Files with Coverage Reduction New Missed Lines %
controllers/drain_controller.go 1 68.06%
controllers/sriovoperatorconfig_controller.go 1 63.95%
controllers/generic_network_controller.go 5 74.53%
Totals Coverage Status
Change from base Build 9745388570: -0.08%
Covered Lines: 5520
Relevant Lines: 13614

💛 - Coveralls

@zeeke zeeke force-pushed the metrics-exporter-prometheus branch from 06c86e1 to 37fbdf4 Compare April 24, 2024 17:06
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@zeeke zeeke force-pushed the metrics-exporter-prometheus branch from 37fbdf4 to 7999f7e Compare May 17, 2024 16:31
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@zeeke zeeke force-pushed the metrics-exporter-prometheus branch from 7999f7e to 58b9fb3 Compare May 17, 2024 16:53
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@@ -18,6 +18,7 @@ spec:
name: sriov-network-metrics
port: {{ .MetricsExporterPort }}
targetPort: {{ .MetricsExporterPort }}
{{ if .IsOpenshift }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here we can also support k8s

lets check if the ServiceMonitor CRD exist in the cluster and deploy it instead of checking only for openshift WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea, working on that

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, I used the unstructured client to check if the ServiceMonitore resource definition is available in the cluster. Does it sound good?

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@zeeke zeeke force-pushed the metrics-exporter-prometheus branch from 9b5e4cf to 2bddf42 Compare May 28, 2024 08:10
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@zeeke zeeke force-pushed the metrics-exporter-prometheus branch from 2bddf42 to cec70bd Compare May 28, 2024 08:12
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@zeeke zeeke force-pushed the metrics-exporter-prometheus branch from cec70bd to 6da499a Compare May 28, 2024 08:16
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@zeeke zeeke force-pushed the metrics-exporter-prometheus branch from 6da499a to e87a82d Compare May 28, 2024 09:29
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

deploy/role.yaml Outdated
@@ -32,6 +32,7 @@ rules:
verbs:
- get
- create
- list
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need this one also under the config folder so it will be generated for the bundle

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the contents of the config/ folder is currently out of sync with openshit/sriov-network-operator (e.g. see config/rbac/role.yaml here and here).

I would align that in a separated PR. Does it sound good?


func isPrometheusOperatorInstalled(ctx context.Context, client k8sclient.Reader) bool {
u := &uns.UnstructuredList{}
u.SetGroupVersionKind(schema.GroupVersionKind{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking (maybe that is not the right way) to do a kubectl get crd servicemonitor not to search if there is any server monitor object in the cluster :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can try getting the CRD (e.g. see this).

The drawback is that we have to add the permission (ClusterRole,ClusterRoleBinding,...) to make the operator read that CustomResourceDefinition resource, but it might end up cleaner.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be ok to add get for CRD that should not expose the operator to any security issues :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to add the permission to the ClusterRole (instead of Role), as the CustomResourceDefinition is not namespaced.
For the same reason, I had to add a non-namespace client to the SriovOperatorConfigReconicler.
please, take a look

}

if r.PlatformHelper.IsOpenshiftCluster() {
err = utils.AddLabelToNamespace(ctx, vars.Namespace, "openshift.io/cluster-monitoring", "true", r.Client)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to put this in the namespace creation template and not let the operator have permission to upgrade namespace that sounds a security risk

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Maybe we can leverage the operatorframework.io/cluster-monitoring annotation in the CSV, in the openshift fork

https://github.com/openshift/enhancements/blob/master/enhancements/olm/olm-managed-operator-metrics.md#fulfilling-namespace-and-rbac-requirements

@adrianchiris
Copy link
Collaborator

@zeeke can you rebase this one ?

@zeeke zeeke force-pushed the metrics-exporter-prometheus branch from e87a82d to 17dabd0 Compare June 27, 2024 09:33
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@zeeke
Copy link
Member Author

zeeke commented Jul 1, 2024

/hold

@github-actions github-actions bot added the hold label Jul 1, 2024
@zeeke zeeke force-pushed the metrics-exporter-prometheus branch from 391a8ca to 686a48c Compare July 2, 2024 13:16
Copy link

github-actions bot commented Jul 2, 2024

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@zeeke zeeke force-pushed the metrics-exporter-prometheus branch from 686a48c to 63f4fe3 Compare July 3, 2024 09:15
Copy link

github-actions bot commented Jul 3, 2024

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Package `github.com/prometheus-operator/prometheus-operator/pkg/client`
can be used for testing purpose.

Signed-off-by: Andrea Panattoni <apanatto@redhat.com>
@zeeke zeeke force-pushed the metrics-exporter-prometheus branch from 63f4fe3 to ed8f86f Compare July 3, 2024 09:20
Copy link

github-actions bot commented Jul 3, 2024

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@zeeke zeeke force-pushed the metrics-exporter-prometheus branch from ed8f86f to de99a66 Compare July 3, 2024 11:40
Copy link

github-actions bot commented Jul 3, 2024

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@zeeke zeeke force-pushed the metrics-exporter-prometheus branch from de99a66 to 7aacaae Compare July 3, 2024 15:15
Copy link

github-actions bot commented Jul 3, 2024

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@zeeke zeeke force-pushed the metrics-exporter-prometheus branch from 7aacaae to 2faab1d Compare July 3, 2024 15:34
Copy link

github-actions bot commented Jul 3, 2024

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Signed-off-by: Andrea Panattoni <apanatto@redhat.com>
@zeeke zeeke force-pushed the metrics-exporter-prometheus branch from 2faab1d to 19e5be7 Compare July 3, 2024 16:04
Copy link

github-actions bot commented Jul 3, 2024

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@zeeke zeeke removed the hold label Jul 5, 2024
@zeeke zeeke force-pushed the metrics-exporter-prometheus branch from 19e5be7 to 41106a5 Compare July 5, 2024 08:47
Copy link

github-actions bot commented Jul 5, 2024

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@zeeke
Copy link
Member Author

zeeke commented Jul 5, 2024

@adrianchiris , @SchSeba please take another look

Copy link
Collaborator

@SchSeba SchSeba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just some small comments :)

subjects:
- kind: ServiceAccount
name: prometheus-k8s
namespace: openshift-monitoring
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is good for openshift on vanilla k8s please add a variable to the helmchart something like
https://github.com/metallb/metallb/blob/21dd75560f3b8614c14b1bb55a79dbcc231e36a7/charts/metallb/templates/servicemonitor.yaml#L192

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added environment variables and Helm stuff to make the subject configurable.

@@ -36,6 +36,9 @@ rules:
- apiGroups: ["config.openshift.io"]
resources: ["infrastructures"]
verbs: ["get", "list", "watch"]
- apiGroups: ["apiextensions.k8s.io"]
resources: ["customresourcedefinitions"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add this one also to the config folder this way when we generate the csv file for OLM it should be also in place

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, though I think we still need some cleanup in that folder.

@@ -161,3 +161,28 @@ func AnnotateNode(ctx context.Context, nodeName string, key, value string, c cli

return AnnotateObject(ctx, node, key, value, c)
}

func AddLabelToNamespace(ctx context.Context, namespaceName, key, value string, c client.Client) error {
ns := &corev1.Namespace{}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not able to find where we use this one.

in general I think we should document the need to add a label for monitoring on namespace creation and not add a rbac to allow the operator to update namespace object

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the function. I will re-add it in the e2e test PR.

About permissions, the operator already had the RBAC to write on namespaces (see deploy/clusterrole.yaml and openshift CSV). No permission has been added in this PR for namespaces.

I'll take care of documenting the namespace configuration in OpenShift

Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

Deploy the needed configuration to make the prometheus
operator to find and scrape the sriov-network-metrics-exporter
endpoints, including the ServiceMonitor, Role and RoleBinding.

Resources are installed only if the Prometheus operator is installed.

Signed-off-by: Andrea Panattoni <apanatto@redhat.com>
When useing `ServiceMonitors`, Prometheus Operator needs permissions
to read Services,Endpoint and Pods in the monitored namespace (i.e. the SRIOV
operator ns).

Make the ServiceAccount subject configurable via environment variables.

Signed-off-by: Andrea Panattoni <apanatto@redhat.com>
@zeeke zeeke force-pushed the metrics-exporter-prometheus branch from a39a440 to 108ef15 Compare July 12, 2024 06:55
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@zeeke
Copy link
Member Author

zeeke commented Jul 16, 2024

@adrianchiris , @SchSeba please take another look

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants