Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable operator metrics collection #273

Merged
merged 2 commits into from
Mar 14, 2023

Conversation

OlivierCazade
Copy link
Contributor

Enable collection for operator metrics.

@@ -12,7 +12,6 @@ spec:
- path: /metrics
port: https
scheme: https
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is missing for having a TLS without skipping verify?
It's basically the same reason why we have this bug: https://issues.redhat.com/browse/NETOBSERV-765

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to what we did for FLP we need to add the labels to the metric services to generate certificate and mount them in the kube-rbac-proxy container so they can be used.

@OlivierCazade
Copy link
Contributor Author

Added SSL and auth configuration for downstream collection, this made operator metrics endpoint incompatible with monitoring user workload.

@memodi appart from review comments, the PR is ready to be tested.
After deploying this branch and setting DOWNSTREAM_DEPLOYMENT to true, you should see operator metrics without enabling user workload.

image

@codecov
Copy link

codecov bot commented Mar 9, 2023

Codecov Report

Merging #273 (434ced9) into main (8a4db3c) will decrease coverage by 0.04%.
The diff coverage is 0.00%.

@@            Coverage Diff             @@
##             main     #273      +/-   ##
==========================================
- Coverage   47.61%   47.57%   -0.04%     
==========================================
  Files          43       43              
  Lines        5015     5019       +4     
==========================================
  Hits         2388     2388              
- Misses       2416     2420       +4     
  Partials      211      211              
Flag Coverage Δ
unittests 47.57% <0.00%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
controllers/flowcollector_controller.go 52.73% <0.00%> (ø)
controllers/flowcollector_objects.go 17.77% <0.00%> (-1.74%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@memodi
Copy link
Contributor

memodi commented Mar 9, 2023

Added SSL and auth configuration for downstream collection, this made operator metrics endpoint incompatible with monitoring user workload.

@OlivierCazade - is the idea, they would be compatible with SSL and auth configuration once operator is deployed with DOWNSTREAM_DEPLOYMENT to true?

@OlivierCazade
Copy link
Contributor Author

Yes, SSL and authentication for the metrics endpoint. And the prometheus deployment should automatically use it from the ServiceMonitor configuration

@memodi
Copy link
Contributor

memodi commented Mar 9, 2023

/ok-to-test

@openshift-ci openshift-ci bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Mar 9, 2023
@github-actions
Copy link

github-actions bot commented Mar 9, 2023

New image: ["quay.io/netobserv/network-observability-operator:1b5e2fb"]. It will expire after two weeks.

@memodi
Copy link
Contributor

memodi commented Mar 9, 2023

@OlivierCazade - I tried with image from this PR, I am not able to see those metrics:

$ oc get pod/netobserv-controller-manager-7794cf659b-vcnhn -o yaml | egrep -A 1 "DOWN"
    - --downstream-deployment=$(DOWNSTREAM_DEPLOYMENT)
    command:
--
    - name: DOWNSTREAM_DEPLOYMENT
      value: "true"

$ oc get servicemonitor
NAME                        AGE
netobserv-metrics-monitor   5m19s

image

am I missing something?

@OlivierCazade
Copy link
Contributor Author

Most of the changes in this PR are at the bundle level, did you also used the bundle from this branch?

@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Mar 13, 2023
@memodi
Copy link
Contributor

memodi commented Mar 13, 2023

/ok-to-test

@openshift-ci openshift-ci bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Mar 13, 2023
@github-actions
Copy link

New images:

  • quay.io/netobserv/network-observability-operator:a5309c1
  • quay.io/netobserv/network-observability-operator-bundle:v0.0.0-a5309c1
  • quay.io/netobserv/network-observability-operator-catalog:v0.0.0-a5309c1

They will expire after two weeks.

Catalog source:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: netobserv-dev
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-a5309c1
  displayName: NetObserv development catalog
  publisher: Me
  updateStrategy:
    registryPoll:
      interval: 1m

@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Mar 13, 2023
@memodi
Copy link
Contributor

memodi commented Mar 13, 2023

/ok-to-test

@openshift-ci openshift-ci bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Mar 13, 2023
@github-actions
Copy link

New images:

  • quay.io/netobserv/network-observability-operator:7a7c5bf
  • quay.io/netobserv/network-observability-operator-bundle:v0.0.0-7a7c5bf
  • quay.io/netobserv/network-observability-operator-catalog:v0.0.0-7a7c5bf

They will expire after two weeks.

Catalog source:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: netobserv-dev
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-7a7c5bf
  displayName: NetObserv development catalog
  publisher: Me
  updateStrategy:
    registryPoll:
      interval: 1m

@memodi
Copy link
Contributor

memodi commented Mar 13, 2023

/label qe-approved

Verified NOO metrics and FLP metrics are getting scrapped by cluster prometheus when DOWNSTREAM_DEPLOYMENT == true

Screenshot 2023-03-13 at 12 24 38 PM

image

@openshift-ci openshift-ci bot added the qe-approved QE has approved this pull request label Mar 13, 2023
monitor.yaml Outdated
@@ -0,0 +1,8 @@
apiVersion: v1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want this file, do we? Is it included in the bundles, or is it just there as a helper (e.g. for make targets) ? afaict it should not be included in bundles

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No sorry, this was one of my test file.

tlsConfig:
insecureSkipVerify: true
caFile: /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt
serverName: netobserv-metrics-service.netobserv.svc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, my understanding is that we should hardcode here the installation namespace, openshift-operators, right?

@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Mar 14, 2023
@OlivierCazade
Copy link
Contributor Author

/approve

@openshift-ci
Copy link

openshift-ci bot commented Mar 14, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: OlivierCazade

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 4e15d42 into netobserv:main Mar 14, 2023
@memodi
Copy link
Contributor

memodi commented Mar 22, 2023

adding NETOBSERV-900 as JIRA reference

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm qe-approved QE has approved this pull request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants