Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NETOBSERV-565 added service monitors for flp prometheus and console plugin #188

Merged
merged 21 commits into from
Dec 22, 2022

Conversation

KalmanMeth
Copy link
Contributor

To view the metrics in the console, apply the following:

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    enableUserWorkload: true

@KalmanMeth KalmanMeth changed the title added flp prometheus service monitor added service monitors for flp prometheus and console plugin Nov 1, 2022
return &monitoringv1.ServiceMonitor{
ObjectMeta: metav1.ObjectMeta{
Name: constants.PluginServiceMonitorName,
Namespace: constants.DefaultOperatorNamespace,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Namespace: constants.DefaultOperatorNamespace,
Namespace: b.namespace,

Namespace should be taken from CRD

Copy link
Contributor

@jpinsonneau jpinsonneau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in terms of code, not tested yet. Thanks @KalmanMeth !

@openshift-ci openshift-ci bot added the lgtm label Nov 8, 2022
@jotak
Copy link
Member

jotak commented Nov 9, 2022

I'm going to test it
thanks for the PR @KalmanMeth

@jotak jotak self-assigned this Nov 9, 2022
@jotak
Copy link
Member

jotak commented Nov 9, 2022

@KalmanMeth it doesn't work for me, there is a permission issue when trying to read the servicemonitor CRD to determine if they can be installed.
In fact, we use a discovery mechanism to check when external APIs are available (we already do this to check whether ConsolePlugin API exists, for instance) : look at this file: https://github.com/netobserv/network-observability-operator/blob/main/pkg/discover/apis.go

So to check ServiceMonitor, you can edit this file to add HasServiceMonitor, similar to the existing HasConsole / HasCNO. Doing this way, we don't have to add a new read permission on CRD kind itself.

Copy link
Member

@jotak jotak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's something more to do: manage namespace changes. It is possible to update the netobserv installation to change its namespace, so what the operator does in that case is to remove all the created resources and recreate them in the other namespace.

You don't have to implement all of this, you just need to declare the servicemonitor resource when the reconcilers are created, see here: https://github.com/netobserv/network-observability-operator/blob/main/controllers/consoleplugin/consoleplugin_reconciler.go#L32-L56

  • you need to add a reference in ownedObjects
  • you need to call nobjMngr.AddManagedObject

once it's done, upon namespace change, the controller will take care of deleting these resources and re-creating them.

Same thing to do for FLP monolith, FLP ingester and FLP transformer

@openshift-ci
Copy link

openshift-ci bot commented Nov 14, 2022

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from jotak by writing /assign @jotak in a comment. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@KalmanMeth
Copy link
Contributor Author

@KalmanMeth it doesn't work for me, there is a permission issue when trying to read the servicemonitor CRD to determine if they can be installed. In fact, we use a discovery mechanism to check when external APIs are available (we already do this to check whether ConsolePlugin API exists, for instance) : look at this file: https://github.com/netobserv/network-observability-operator/blob/main/pkg/discover/apis.go

So to check ServiceMonitor, you can edit this file to add HasServiceMonitor, similar to the existing HasConsole / HasCNO. Doing this way, we don't have to add a new read permission on CRD kind itself.

@jotak I added code to check for HasServiceMonitor. I'm not sure what the environment was in which you got a failure, so I could not check it.

@jotak
Copy link
Member

jotak commented Nov 22, 2022

@KalmanMeth sorry for the delay, I'll take another look today

@jotak
Copy link
Member

jotak commented Nov 22, 2022

So I still get the same error (plus also an error saying missing permissions to get/create servicemonitors)
I get this error with the nominal test deployment, which is deploying the operator on a running cluster + the default FlowCollector resource. Basically I just build & push the image, then run IMG=<that image> make deploy deploy-sample-cr

The exact error I get (in logs) is:

W1122 08:20:08.309526       1 reflector.go:324] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: failed to list *v1.CustomResourceDefinition: customresourcedefinitions.apiextensions.k8s.io is forbidden: User "system:serviceaccount:netobserv:netobserv-controller-manager" cannot list resource "customresourcedefinitions" in API group "apiextensions.k8s.io" at the cluster scope
E1122 08:20:08.309548       1 reflector.go:138] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: Failed to watch *v1.CustomResourceDefinition: failed to list *v1.CustomResourceDefinition: customresourcedefinitions.apiextensions.k8s.io is forbidden: User "system:serviceaccount:netobserv:netobserv-controller-manager" cannot list resource "customresourcedefinitions" in API group "apiextensions.k8s.io" at the cluster scope

This is logical: we must tell that we need extra permissions, to read/write ServiceMonitors. This is done via kubebuilder annotations, cf in flowcollector_controller.go L58-69 : you need to add a new line

//+kubebuilder:rbac:groups=monitoring.coreos.com,resources=servicemonitors,verbs=get;create;delete;update;patch;list

Then run:

make generate

It will amend automatically config/rbac/role.yaml to add new permissions. This will then be used when deploying the operator to a running cluster.

The other thing is, my previous comment to add the discovery mechanism was to avoid having to read the servicemonitor CRD as you do in consoleplugin_reconciler.go (L202-210) and flp_service_monitor.go (L29-37). So we should remove these lines, as it saves us from requiring an extra permission to read customresourcedefinitions.apiextensions.k8s.io.

flag.StringVar(&config.EBPFAgentImage, "ebpf-agent-image", "", "The image of the eBPF agent")
flag.StringVar(&config.FlowlogsPipelineImage, "flowlogs-pipeline-image", "", "The image of Flowlogs Pipeline")
flag.StringVar(&config.ConsolePluginImage, "console-plugin-image", "", "The image of the Console Plugin")
flag.StringVar(&config.EBPFAgentImage, "ebpf-agent-image", "quay.io/netobserv/netobserv-ebpf-agent:main", "The image of the eBPF agent")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KalmanMeth this will not be needed once we merge #203

@KalmanMeth
Copy link
Contributor Author

So I still get the same error (plus also an error saying missing permissions to get/create servicemonitors)
I get this error with the nominal test deployment, which is deploying the operator on a running cluster + the default FlowCollector resource. Basically I just build & push the image, then run IMG=<that image> make deploy deploy-sample-cr

@jotak I made updates according to your comments. I hope it is now satisfactory.

@jotak
Copy link
Member

jotak commented Dec 22, 2022

Hi Kalman,
Thanks for the update, it now deploys correctly without an error. I get the servicemonitors created and FLP metrics feeding prometheus 👍
But I see 2 issues:

  1. I don't get the console plugin metrics. It seems to be a certificate issue. Not sure what exactly the problem is.. Setting InsecureSkipVerify: true would work, but that's not a good fix.
  2. When I deploy with Kafka, we loose the FLP metrics. At first glance it seems because the FLP ServiceMonitor is not correctly adapted for this case (when using Kafka, the FLP deployment model changes from monolithic to publisher/subscriber, that's the reason why we have this distinction of flp-ingester / flp-transformer in code, and we need to adapt the service monitor to point to these workloads.

Let me know if you have an idea for point 1. For point 2, I can try doing a fix and adding it to your PR (I know this part of the code is tricky to apprehend) ; If we can't get that fixed quickly we can go ahead & merge, and address them in follow-ups

- ServiceMonitor in FLP uses the builder's label selector for targets
- Refactor a bit FLP reconcilers to move some data in a common struct
- Add tests on generated names/labels in pub/sub/monolith scenarios
- More consistent naming in builders
@jotak jotak changed the title added service monitors for flp prometheus and console plugin NETOBSERV-565 added service monitors for flp prometheus and console plugin Dec 22, 2022
Fix servicemonitors for pub/sub FLP
Copy link
Member

@jotak jotak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
thanks!

Follow-up JIRA for the console plugin issue: https://issues.redhat.com/browse/NETOBSERV-765

@openshift-ci openshift-ci bot added the lgtm label Dec 22, 2022
@jotak jotak merged commit d39049e into netobserv:main Dec 22, 2022
@KalmanMeth KalmanMeth deleted the service-monitor branch February 13, 2023 14:56
KalmanMeth pushed a commit to KalmanMeth/network-observability-operator that referenced this pull request Feb 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants