Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Prometheus datasource is not imported until the controller-manager is restarted #652

Closed
samuelvl opened this issue Dec 14, 2021 · 2 comments
Labels
bug Something isn't working triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@samuelvl
Copy link

Describe the bug
Prometheus datasource is not imported until the controller-manager is restarted.

Version
Grafana operator version is 4.1.0 and Openshift is 4.9.

$ oc get csv 
NAME                                     DISPLAY                       VERSION   REPLACES                            PHASE
grafana-operator.v4.1.0                  Grafana Operator              4.1.0     grafana-operator.v4.0.2             Succeeded

$ oc version
Server Version: 4.9.8
Kubernetes Version: v1.22.2+c8538fc

To Reproduce
Install the operator in the opendatahub namespace (or any other) using the following Subscription and OperatorGroup:

---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: grafana-operator
  namespace: opendatahub
spec:
  channel: v4
  installPlanApproval: Automatic
  name: grafana-operator
  source: community-operators
  sourceNamespace: openshift-marketplace
  startingCSV: grafana-operator.v4.1.0
  
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: opendatahub
  labels:
    component.opendatahub.io/name: odh-common
    opendatahub.io/component: "true"
spec:
  targetNamespaces:
  - opendatahub

Wait until the operator is installed:

$ oc get csv -n opendatahub    
NAME                                     DISPLAY                       VERSION   REPLACES                            PHASE
grafana-operator.v4.1.0                  Grafana Operator              4.1.0     grafana-operator.v4.0.2             Succeeded

$ oc get pods                                       
NAME                                                   READY   STATUS    RESTARTS   AGE
grafana-operator-controller-manager-7845f878f8-hxppr   2/2     Running   0          70s

Create a Grafana instance and a GrafanaDatasource object with the following information:

---
apiVersion: integreatly.org/v1alpha1
kind: Grafana
metadata:
  name: odh-grafana
  labels:
    opendatahub.io/component: "true"
    component.opendatahub.io/name: grafana
spec:
  ingress:
    enabled: True
  client:
    preferService: true
  config:
    log:
      mode: "console"
      level: "warn"
    security:
      admin_user: "root"
      admin_password: "secret"
    auth:
      disable_login_form: False
      disable_signout_menu: True
    auth.basic:
      enabled: True
    auth.anonymous:
      enabled: True
  dashboardLabelSelector:
    - matchExpressions:
        - {key: app, operator: In, values: [grafana]}

---
apiVersion: integreatly.org/v1alpha1
kind: GrafanaDataSource
metadata:
  name: odh-datasources
  labels:
    opendatahub.io/component: "true"
    component.opendatahub.io/name: grafana
spec:
  name: odh-prometheus.yaml
  datasources:
    - name: opendatahub
      type: prometheus
      access: proxy
      url: http://prometheus-operated:9090
      isDefault: true
      version: 1
      editable: true
      jsonData:
        tlsSkipVerify: true
        timeInterval: "5s"

Wait some time (I waited more than 5 min) and list the available datasources to check that it is not being imported:

$ grafana=$(oc get route grafana-route -o jsonpath="{$.status.ingress[0].host}")
$ curl -sk -u root:secret https://${grafana}/api/datasources | jq '.'
[]

If I go the GUI it says There are no data sources defined yet so it is not an API specific related issue.

The operator logs does not show anything relevant:

$ oc logs -f grafana-operator-controller-manager-7845f878f8-hxppr -c manager
I1214 12:41:40.148585       1 request.go:655] Throttling request took 1.038261433s, request: GET:https://172.30.0.1:443/apis/apiserver.openshift.io/v1?timeout=32s

Workaround
Restart the controller-manager:

$ oc rollout restart deploy/grafana-operator-controller-manager

Wait until the new version is deployed:

$  oc get pods
NAME                                                   READY   STATUS    RESTARTS   AGE
grafana-deployment-68944479bb-qvrn8                    1/1     Running   0          17s
grafana-operator-controller-manager-7845f878f8-sh4vx   2/2     Running   0          29s

Verify the datasource is successfully imported:

$ curl -sk -u root:secret https://${grafana}/api/datasources | jq '.'
[
  {
    "id": 1,
    "uid": "7ucRHe2nk",
    "orgId": 1,
    "name": "opendatahub",
    "type": "prometheus",
    "typeName": "Prometheus",
    "typeLogoUrl": "public/app/plugins/datasource/prometheus/img/prometheus_logo.svg",
    "access": "proxy",
    "url": "http://prometheus-operated:9090",
    "password": "",
    "user": "",
    "database": "",
    "basicAuth": false,
    "isDefault": true,
    "jsonData": {
      "timeInterval": "5s",
      "tlsSkipVerify": true,
      "tracesToLogs": {}
    },
    "readOnly": false
  }
]

Expected behavior
The datasource should be automatically imported without restarting the operator manually.

@samuelvl samuelvl added bug Something isn't working needs triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 14, 2021
@samuelvl samuelvl changed the title Prometheus datasource is not imported until the controller-manager is restarted [Bug] Prometheus datasource is not imported until the controller-manager is restarted Dec 14, 2021
@pb82
Copy link
Collaborator

pb82 commented Dec 14, 2021

I was able to reproduce this. The problem is that both resources are created at the same time and the Grafana pod is started before the operator updates the config map for data sources.

We should ideally start using the http API to interact with data sources. I'll see if there is another fix for this that we can apply before switching to the datasource api.

@NissesSenap NissesSenap added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 14, 2021
@ufateh
Copy link

ufateh commented Jan 7, 2022

I am facing the same issue. I am adding two scenarios which i tested to reproduce this issue:

  1. If i create GrafanaDataSource first and then Grafana, I can see this issue.
  2. If Grafana is created first and GrafanaDataSource later, it works fine. But if i delete this instance of Grafana and recreate another, it is not picking up the datasources unless i recreate the GrafanaDataSource. I have not tried to restart the controller manager though.

I also do not see any datasource reconcile logs in controller manager.

Operator version: 4.1.1
Openshift version: 4.7.23
K8S version: v1.20.0+558d959

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

4 participants