Dashboard sidecar stops working after 10 minutes #18

PeterGerrard · 2020-09-14T06:27:40Z

Describe the bug
kiwigrid/k8s-sidecar#85
After 10 minutes the dashboard sidecar stops receiving watch notifications

Originally posted on old repo: helm/charts#23565

Version of Helm and Kubernetes:
Helm version N/A
Kubernetes 1.16.10

Which chart:
stable/grafana

What happened:
After 10 minutes the dashboard sidecar stops receiving watch notifications

What you expected to happen:
The sidecar to always pick up new dashboards

How to reproduce it (as minimally and precisely as possible):
See kiwigrid/k8s-sidecar#85

qaiserali · 2020-09-26T07:21:02Z

+1

I have the exact same issue, and after a couple of minutes sidecar stops receiving notifications on configmap changes. Also sidecar doesn't detect and import new dashboards via configmaps.

All changes to the dashboards and new dashboards appear after restarting the grafana pod or sidecar container.

Signed-off-by: Reinhard Nägele <unguiculus@gmail.com>

dry4ng · 2020-12-23T16:40:15Z

Same issue.

bergerx · 2021-01-14T13:36:02Z

using https://github.com/OmegaVVeapon/kopf-k8s-sidecar instead of kiwigrid/k8s-sidecar may address the issue, here is a comment from @djsly indicating success with that one: kiwigrid/k8s-sidecar#85 (comment)

qaiserali · 2021-01-20T11:02:39Z

Any updates on this?

bergerx · 2021-02-08T17:10:13Z

I tried using OmegaVVeapon/kopf-k8s-sidecar as a replacement for kiwigrid/k8s-sidecar but just hit this issue: OmegaVVeapon/kopf-k8s-sidecar#14

bergerx · 2021-02-09T18:51:38Z

As the OmegaVVeapon/kopf-k8s-sidecar#14 turned out to be a dead end to use it along with this chart, i came up with this very ugly workaround which i'm not proud of, but in 2 hours this is what ai was able to do, we need to craft a PR to address the underlying issue in kiwigrid/k8s-sidecar#85

Here is the values file snippet i used:

rbac:
  # The extra rules below are required to run the `kubectl exec ...`
  # command in the force-grafana-dashboard-reload container below.
  extraRoleRules:
  - apiGroups: [""]
    resources: ["pods"]
    verbs: ["get"] 
  - apiGroups: [""]
    resources: ["pods/exec"]
    verbs: ["create"]
# This is a workaround for an upstream bug,
# this will force grafana dashboards to be reloaded every 15 minutes.
# Its 15 minutes because the grafana-sc-dashboard pod will regularly be in
# CrashLoopBackOff if we dont allow it to run at least 10 minutes,
# CrashLoopBackOff timer is reset after 10 minutes without any problems.
# See these issues for details:
# https://github.com/kiwigrid/k8s-sidecar/issues/85
# https://github.com/grafana/helm-charts/issues/18
extraContainers: |
  - name: force-grafana-dashboard-reload
    image: bitnami/kubectl:1.19
    env:    
    - name: POD_NAME
      valueFrom:
        fieldRef:
          fieldPath: metadata.name
    - name: POD_NAMESPACE
      valueFrom:
        fieldRef:
          fieldPath: metadata.namespace
    command: ["bash", "-c"]
    args:   
    - while sleep 900; do 
        echo Forcing grafana sidecar to restart ...;
        kubectl exec -n $POD_NAMESPACE $POD_NAME -c grafana-sc-dashboard -- kill 1; 
      done

And the status of the grafana pod:

OmegaVVeapon · 2021-02-11T00:11:56Z

The original issue on k8s-sidecar has been open for 176 days at the time of writing so I think an update on the situation is warranted.

Having the process hang constantly hang is what prompted me to do an entire rewrite in kopf-k8s-sidecar and what made @bergerx spawn a container with exec RBAC permissions to constantly kill the original sidecar. I think it's fair to say we're reaching a point of desperation here. 😄

I've been giving this some thought and these are the possible solutions I see for this problem atm, from best to worst:

1. Fix the original kiwigrid/k8s-sidecar

I'm not hopeful this is going to happen...
After going through all the comments on the original issue my feeling is that, if it were that easy, it wouldn't have taken nearly half a year (and counting) to do. I went through the codebase several times before deciding to rewrite it...

2. Add the functionality to kopf-k8s-sidecar to run once only (issue)

I've inquired about how to accomplish such a behaviour in this question and it looks like it won't be trivial... I could give it a try regardless.

3. Avoid initContainers!

This is the solution I'm currently using since it can be readily used with minimal tweaking.

After inspecting this Helm chart I discovered that it uses the k8s-sidecar as an initContainer ONLY IF either of these 2 things happen:
a) if .Values.sidecar.datasources.enabled is true
b) if .Values.sidecar.notifiers.enabled is true

If they are, the sidecar is started as an initContainer to load up the files and die. There's no dynamic updates for Grafana datasources or notifier files. Whatever got loaded at the start is what you will keep for the remainder of that pod's life.

So, given that this is a one-off operation, I decided to simply disable .Values.sidecar.datasources.enabled use this chart's ability to set the datasources.

and use kopf-k8s-sidecar for the image

I currently have a fully-functional Grafana from which I'm able to add/modify/delete dashboards without issues via configmaps/secrets.

The only downside I see here is if for whatever reason you can't provide your datasources/notifiers via the values.yaml and you HAVE to read them from configmaps/secrets... which brings me to solution 4.

4. Allow users to use different sidecar images depending on the use case

So, if you've followed so far, the tl;dr is that the original sidecar isn't very good at staying alive. My sidecar isn't very good at dying. So why not let each sidecar do what they do best?

Rather than using the same .Values.sidecar.image for the datasources, notifiers and dashboard containers, maybe it'd be beneficial to give the user more flexibility to override each image if desired.
For example, we could set .Values.sidecar.datasources.image to use the kiwigrid sidecar and set .Values.sidecar.dashboards.image to use my sidecar and everything is able to run as intended.

I could submit a small PR for this if deemed useful.

Anyways, those are my current thoughts on the state of things. Feedback is welcome.

bergerx · 2021-02-11T01:27:52Z

Solution 3 seems reasonable to me but i dont get this part:

The only downside I see here is if for whatever reason you can't provide your datasources/notifiers via the values.yaml and you HAVE to read them from configmaps/secrets... which brings me to solution 4.

This exact design has already been a problem for us, as in some clusters we want to be able to define datasources on the go as we create new configmaps/secrets. Current design is preventing that, this design also kind of conflicts with the idea of declaring datasources as configmaps.

OmegaVVeapon · 2021-02-11T01:43:51Z

This exact design has already been a problem for us, as in some clusters we want to be able to define datasources on the go as we create new configmaps/secrets. Current design is preventing that, this design also kind of conflicts with the idea of declaring datasources as configmaps.

Yeah... I found this to be odd as well.
It looks like Grafana is perfectly capable of loading/modifying datasources on-the-fly according to this documentation
so I'm not sure why the decision to only inject them in an initContainer was taken.

Maybe one of the Grafana maintainers can elaborate a bit more on that.

That being said, I think I know how we can add new datasources on-the-fly but I'll have to test it tomorrow.

OmegaVVeapon · 2021-02-12T22:35:51Z

@bergerx I have good and bad news.

So I tested loading up datasources dynamically (as part of the normal long-running sidecar, not the init one) and it worked... sorta.

│ t=2021-02-12T22:18:37+0000 lvl=info msg="inserting datasource from configuration " logger=provisioning.datasources name=Prometheus uid=                                                                                                                                                                          │
│ t=2021-02-12T22:18:37+0000 lvl=info msg="inserting datasource from configuration " logger=provisioning.datasources name=Something uid=                                                                                                                                                                           │
│ t=2021-02-12T22:18:37+0000 lvl=info msg="HTTP Server Listen" logger=http.server address=[::]:3000 protocol=http subUrl= socket=

That Prometheus and Something datasources came from ConfigMaps that were injected by the normal sidecar and they work as expected.

However... once you reach the "HTTP Server Listen" listen part, Grafana no longer cares about what YAMLs the sidecar throws into /etc/grafana/provisioning/datasources/... you can delete, modify or add new YAMLs and they won't be picked up.

So it's a no go unless the Grafana devs add the same reloading logic for datasources as they do for dashboards I'm afraid...

This also sucks because, if it picked up datasources at any point in time, then we could ABSOLUTELY get rid of both sidecar initContainers (who cares if the files get loaded before or after Grafana starts?). That would vastly decrease the complexity of the chart and the Deployment, 1 container instead of potentially 3.

In my case, the datasource loading worked even without initContainers but I can't guarantee my sidecar (or kiwigrid's) will always beat Grafana in initializing, you'd be rolling the dice.

So the only real choice remains to:
a) Use the chart's .Values.datasources and .Values.notifiers
b) Keep using initContainers

Either way, you won't get "on the go" updates.

zanhsieh · 2021-02-15T11:04:42Z

FYI, sidecar upgrade merged.

#211

qaiserali · 2021-02-15T13:02:48Z

@zanhsieh Is the issue with the sidecar has been fixed with the latest merged #211 ?

zanhsieh · 2021-02-16T10:56:55Z

@qaiserali
Could you give a try yourself please?

djsly · 2021-02-16T21:46:37Z

I looked at all diff between 1.1.0 and 1.10.0 and the only related fix would be

https://github.com/kiwigrid/k8s-sidecar/releases/tag/1.3.0

While this does not fix the kubernetes watch issues in #98 it
at least removes the unnecessary requests.

doesn't look too promising

OmegaVVeapon · 2021-02-19T19:01:40Z

It looks like noone is jumping at the chance to test 1.10.0 so I'll share what I'm seeing. 😅

As you can see, it picked up my initial testing-dashboard-2.json, then some seconds later, I created another configmap for testing-dashboard.json, that got picked up too. Great.

Then I left it alone to work on other stuff. It was around a couple of hours...

I came back and I deleted the configmap that held testing-dashboard.json. No change.
Re-added the configmap, still no change...
Did it a few more times while I was exec'ed inside the container (maybe it was still working but not logging anymore?) and the dashboard file wasn't being affected at all.
It hung again.

So I went back to my sidecar image...

qaiserali · 2021-05-18T15:12:56Z

Are there any updates on this issue?

zanhsieh · 2021-05-19T02:29:12Z

@OmegaVVeapon
Would you PR your sidecar image so we go from there?

OmegaVVeapon · 2021-05-19T03:48:13Z

@OmegaVVeapon
Would you PR your sidecar image so we go from there?

Not sure, will need to sleep on it.

My image is basically a rewrite of the k8s-sidecar so the PR would be quite massive.

I'd need confirmation from the original maintainers of the k8s-sidecar that this is something they would even be interested in accepting before trying to change their entire codebase.

Perhaps a better starting point would be for people struggling with this issue to switch to my image and see if it resolves it for them?

I documented the very simple instructions to use it with this helm chart here

If it's helping enough people then I'd say PR'ing it back to k8s-sidecar is a much easier sell.

Otherwise, I'll just leave it as an alternative.

djsly · 2021-05-19T04:58:34Z

@OmegaVVeapon I think he was asking to PR it into the helm-chart repo. Maybe we could add a toggle to switch between the original sidecar and yours.

OmegaVVeapon · 2021-05-19T21:05:58Z

Ah, I see.

If that's the case, I think I'll still leave it as an opt-in via the values.yaml for now.

There's a few more things I want to add before PR'ing it as a default.

gowrisankar22 · 2021-11-10T03:55:08Z

Issue is reproducible with 1.14.2 sidecar as well.

[2021-11-10 03:47:48] ProtocolError when calling kubernetes: ("Connection broken: ConnectionResetError(104, 'Connection reset by peer')", ConnectionResetError(104, 'Connection reset by peer'))

prein · 2022-02-07T14:46:20Z

Reading the comments on two issues linked above leaves me unsure about current status of this. Is it fixed in kiwigrid v1.15.4? Is there a PR coming for the chart to allow setting .Values.sidecar.datasources.image to one sidecar and .Values.sidecar.dashboards.image to another?

Arabus · 2022-12-08T10:41:30Z

The issue seems to persist with the 1.19.2 Sidecar. Upgrading to 1.20 made the issue disappear.

Fixes grafana#18 Signed-off-by: Bernd May <b.may@syseleven.de>

unguiculus added a commit to unguiculus/grafana-helm-charts that referenced this issue Dec 9, 2020

Rename headless service for memberlist (grafana#18)

c07376d

Signed-off-by: Reinhard Nägele <unguiculus@gmail.com>

unguiculus added a commit to unguiculus/grafana-helm-charts that referenced this issue Dec 10, 2020

Rename headless service for memberlist (grafana#18)

657bde7

Signed-off-by: Reinhard Nägele <unguiculus@gmail.com>

OmegaVVeapon mentioned this issue Feb 10, 2021

One-off run? nolar/kopf#678

Open

2 tasks

This was referenced Dec 17, 2021

Watching stops working after 10 minutes kiwigrid/k8s-sidecar#85

Closed

[grafana] grafana-sc-dashboard sidecard works only at initializatrion phase, subsequent connection failed by ProtocolError when calling kubernetes #525

Open

Arabus added a commit to syseleven/helm-charts that referenced this issue Dec 8, 2022

OBS-18 - Bump sidecar image version

7e202ea

Fixes grafana#18 Signed-off-by: Bernd May <b.may@syseleven.de>

Arabus mentioned this issue Dec 8, 2022

[grafana] Bump sidecar image version #2057

Merged

zanhsieh closed this as completed in #2057 Dec 9, 2022

Arabus mentioned this issue Dec 9, 2022

[kube-prometheus-stack] Bump Grafana Chart Version prometheus-community/helm-charts#2800

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dashboard sidecar stops working after 10 minutes #18

Dashboard sidecar stops working after 10 minutes #18

PeterGerrard commented Sep 14, 2020

qaiserali commented Sep 26, 2020

dry4ng commented Dec 23, 2020

bergerx commented Jan 14, 2021

qaiserali commented Jan 20, 2021

bergerx commented Feb 8, 2021

bergerx commented Feb 9, 2021 •

edited

Loading

OmegaVVeapon commented Feb 11, 2021 •

edited

Loading

bergerx commented Feb 11, 2021

OmegaVVeapon commented Feb 11, 2021 •

edited

Loading

OmegaVVeapon commented Feb 12, 2021 •

edited

Loading

zanhsieh commented Feb 15, 2021

qaiserali commented Feb 15, 2021

zanhsieh commented Feb 16, 2021

djsly commented Feb 16, 2021

OmegaVVeapon commented Feb 19, 2021

qaiserali commented May 18, 2021

zanhsieh commented May 19, 2021

OmegaVVeapon commented May 19, 2021

djsly commented May 19, 2021

OmegaVVeapon commented May 19, 2021

gowrisankar22 commented Nov 10, 2021

prein commented Feb 7, 2022

Arabus commented Dec 8, 2022

Dashboard sidecar stops working after 10 minutes #18

Dashboard sidecar stops working after 10 minutes #18

Comments

PeterGerrard commented Sep 14, 2020

qaiserali commented Sep 26, 2020

dry4ng commented Dec 23, 2020

bergerx commented Jan 14, 2021

qaiserali commented Jan 20, 2021

bergerx commented Feb 8, 2021

bergerx commented Feb 9, 2021 • edited Loading

OmegaVVeapon commented Feb 11, 2021 • edited Loading

1. Fix the original kiwigrid/k8s-sidecar

2. Add the functionality to kopf-k8s-sidecar to run once only (issue)

3. Avoid initContainers!

4. Allow users to use different sidecar images depending on the use case

bergerx commented Feb 11, 2021

OmegaVVeapon commented Feb 11, 2021 • edited Loading

OmegaVVeapon commented Feb 12, 2021 • edited Loading

zanhsieh commented Feb 15, 2021

qaiserali commented Feb 15, 2021

zanhsieh commented Feb 16, 2021

djsly commented Feb 16, 2021

OmegaVVeapon commented Feb 19, 2021

qaiserali commented May 18, 2021

zanhsieh commented May 19, 2021

OmegaVVeapon commented May 19, 2021

djsly commented May 19, 2021

OmegaVVeapon commented May 19, 2021

gowrisankar22 commented Nov 10, 2021

prein commented Feb 7, 2022

Arabus commented Dec 8, 2022

bergerx commented Feb 9, 2021 •

edited

Loading

OmegaVVeapon commented Feb 11, 2021 •

edited

Loading

OmegaVVeapon commented Feb 11, 2021 •

edited

Loading

OmegaVVeapon commented Feb 12, 2021 •

edited

Loading