Prometheus config reload livenessprobe #18391

aweiteka · 2018-02-01T16:50:32Z

Use case: config updated via configmap. When the change lands in the container (~60 seconds delay) the config directory hash changes and the process is killed. This does not kill the pod but results in a silent reload of config with a corresponding metric timestamp, prometheus_config_last_reload_success_timestamp_seconds.

Signed-off-by: Aaron Weitekamp aweiteka@redhat.com

aweiteka · 2018-02-01T16:51:28Z

/cc @smarterclayton @simonpasquier

openshift-ci-robot · 2018-02-01T16:51:29Z

@aweiteka: GitHub didn't allow me to request PR reviews from the following users: simonpasquier.

Note that only openshift members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @smarterclayton @simonpasquier

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

smarterclayton · 2018-02-02T05:53:59Z

examples/prometheus/prometheus.yaml

+            exec:
+              command:
+              - /bin/sh
+              - -i


Our convention for this is:

command: - /bin/bash - -c args: - | #!/bin/bash set -euo pipefail ....

It appears the exec liveness probe only supports 'command', not 'args'.

smarterclayton · 2018-02-02T05:54:26Z

examples/prometheus/prometheus.yaml

+            failureThreshold: 3
+            initialDelaySeconds: 60
+            periodSeconds: 60
+            successThreshold: 1


Remove any of these which are defaults.

smarterclayton · 2018-02-02T05:54:47Z

You have to call hack/update-generated-bindata.sh when you change these configs.

zgalor · 2018-02-06T15:02:59Z

@aweiteka could you please PR here https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_prometheus/templates/prometheus.j2#L83 once this PR gets merged to keep the installation aligned with this template?

aweiteka · 2018-02-06T16:07:00Z

@zgalor yes, that is the plan. We want to keep these in sync. My understanding is we try stuff out in examples, port to openshift-ansible. Not clear if that's the best approach.

aweiteka · 2018-02-06T16:20:22Z

FYI, log output of prometheus container:

configmap edited
+/- 60s

level=warn ts=2018-02-06T16:15:46.891236728Z caller=main.go:377 msg="Received SIGTERM, exiting gracefully..."
level=info ts=2018-02-06T16:15:46.891404166Z caller=main.go:384 msg="See you next time!"
level=info ts=2018-02-06T16:15:46.893127633Z caller=targetmanager.go:87 component="target manager" msg="Stopping target manager..."
level=info ts=2018-02-06T16:15:46.893882636Z caller=targetmanager.go:99 component="target manager" msg="Target manager stopped"
level=info ts=2018-02-06T16:15:46.893959561Z caller=manager.go:455 component="rule manager" msg="Stopping rule manager..."
level=info ts=2018-02-06T16:15:46.894014579Z caller=manager.go:461 component="rule manager" msg="Rule manager stopped"
level=info ts=2018-02-06T16:15:46.894045356Z caller=notifier.go:483 component=notifier msg="Stopping notification handler..."

brief (~5s) 'service unavailable'
service start

level=info ts=2018-02-06T16:15:48.999812989Z caller=main.go:215 msg="Starting Prometheus" version="(version=2.0.0, branch=HEAD, revision=0a74f98628a0463dddc90528220c94de5032d1a0)"
level=info ts=2018-02-06T16:15:49.000012873Z caller=main.go:216 build_context="(go=go1.8.3, user=root@prometheus-binary-7-build, date=20171108-16:47:03)"
level=info ts=2018-02-06T16:15:49.000055345Z caller=main.go:217 host_details="(Linux 3.10.0-693.15.2.el7.x86_64 #1 SMP Thu Jan 4 15:00:51 EST 2018 x86_64 prometheus-0 (none))"
level=info ts=2018-02-06T16:15:49.003728455Z caller=web.go:380 component=web msg="Start listening for connections" address=localhost:9090
level=info ts=2018-02-06T16:15:49.004166188Z caller=main.go:314 msg="Starting TSDB"
level=info ts=2018-02-06T16:15:49.006140679Z caller=targetmanager.go:71 component="target manager" msg="Starting target manager..."
level=info ts=2018-02-06T16:15:53.197838185Z caller=main.go:326 msg="TSDB started"
level=info ts=2018-02-06T16:15:53.19800798Z caller=main.go:394 msg="Loading configuration file" filename=/etc/prometheus/prometheus.yml
level=info ts=2018-02-06T16:15:53.201375345Z caller=kubernetes.go:100 component="target manager" discovery=k8s msg="Using pod service account via in-cluster config"
level=info ts=2018-02-06T16:15:53.203472216Z caller=kubernetes.go:100 component="target manager" discovery=k8s msg="Using pod service account via in-cluster config"
level=info ts=2018-02-06T16:15:53.205671447Z caller=kubernetes.go:100 component="target manager" discovery=k8s msg="Using pod service account via in-cluster config"
level=info ts=2018-02-06T16:15:53.208001974Z caller=kubernetes.go:100 component="target manager" discovery=k8s msg="Using pod service account via in-cluster config"
level=info ts=2018-02-06T16:15:53.210036112Z caller=kubernetes.go:100 component="target manager" discovery=k8s msg="Using pod service account via in-cluster config"
level=info ts=2018-02-06T16:15:53.212384308Z caller=kubernetes.go:100 component="target manager" discovery=k8s msg="Using pod service account via in-cluster config"
level=info ts=2018-02-06T16:15:53.215212678Z caller=kubernetes.go:100 component="target manager" discovery=k8s msg="Using pod service account via in-cluster config"
level=info ts=2018-02-06T16:15:53.220644156Z caller=main.go:371 msg="Server is ready to receive requests."

Signed-off-by: Aaron Weitekamp <aweiteka@redhat.com>

aweiteka · 2018-02-08T02:18:45Z

@spadgett can you take a look? This is the formatting I found I needed to make this work. Tough to debug. :(

spadgett · 2018-02-09T15:00:35Z

This is the formatting I found I needed to make this work. Tough to debug. :(

I don't know of another way :/

aweiteka · 2018-02-12T21:48:06Z

/retest

aweiteka · 2018-02-13T21:07:28Z

@smarterclayton ready for review/merge

pgier · 2018-03-05T16:58:08Z

/lgtm

smarterclayton · 2018-03-07T21:19:17Z

/approve

openshift-ci-robot · 2018-03-07T21:19:22Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aweiteka, pgier, smarterclayton

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~examples/prometheus/OWNERS~~ [smarterclayton]
~~pkg/oc/bootstrap/OWNERS~~ [smarterclayton]
~~test/extended/OWNERS~~ [smarterclayton]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-merge-robot · 2018-03-08T04:44:31Z

Automatic merge from submit-queue (batch tested with PRs 18780, 18802, 18391, 18832, 18808).

openshift-ci-robot requested review from mfojtik and smarterclayton February 1, 2018 16:50

openshift-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Feb 1, 2018

smarterclayton reviewed Feb 2, 2018

View reviewed changes

openshift-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 6, 2018

aweiteka mentioned this pull request Feb 6, 2018

Restart console container when config changes #18411

Merged

aweiteka force-pushed the prometheus-reload branch from 7aad1a2 to ef682b5 Compare February 6, 2018 16:57

Prometheus config reload livenessprobe

058c982

Signed-off-by: Aaron Weitekamp <aweiteka@redhat.com>

aweiteka force-pushed the prometheus-reload branch from ef682b5 to 058c982 Compare February 8, 2018 02:16

aweiteka mentioned this pull request Feb 8, 2018

Prometheus config updates openshift/openshift-ansible#7062

Merged

openshift-ci-robot assigned pgier Mar 5, 2018

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 5, 2018

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 7, 2018

openshift-merge-robot merged commit 5940853 into openshift:master Mar 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prometheus config reload livenessprobe #18391

Prometheus config reload livenessprobe #18391

aweiteka commented Feb 1, 2018

aweiteka commented Feb 1, 2018

openshift-ci-robot commented Feb 1, 2018

smarterclayton Feb 2, 2018 •

edited

Loading

aweiteka Feb 6, 2018

smarterclayton Feb 2, 2018

smarterclayton commented Feb 2, 2018

zgalor commented Feb 6, 2018

aweiteka commented Feb 6, 2018

aweiteka commented Feb 6, 2018

aweiteka commented Feb 8, 2018

spadgett commented Feb 9, 2018

aweiteka commented Feb 12, 2018

aweiteka commented Feb 13, 2018

pgier commented Mar 5, 2018

smarterclayton commented Mar 7, 2018

openshift-ci-robot commented Mar 7, 2018

openshift-merge-robot commented Mar 8, 2018

Prometheus config reload livenessprobe #18391

Prometheus config reload livenessprobe #18391

Conversation

aweiteka commented Feb 1, 2018

aweiteka commented Feb 1, 2018

openshift-ci-robot commented Feb 1, 2018

smarterclayton Feb 2, 2018 • edited Loading

Choose a reason for hiding this comment

aweiteka Feb 6, 2018

Choose a reason for hiding this comment

smarterclayton Feb 2, 2018

Choose a reason for hiding this comment

smarterclayton commented Feb 2, 2018

zgalor commented Feb 6, 2018

aweiteka commented Feb 6, 2018

aweiteka commented Feb 6, 2018

aweiteka commented Feb 8, 2018

spadgett commented Feb 9, 2018

aweiteka commented Feb 12, 2018

aweiteka commented Feb 13, 2018

pgier commented Mar 5, 2018

smarterclayton commented Mar 7, 2018

openshift-ci-robot commented Mar 7, 2018

openshift-merge-robot commented Mar 8, 2018

smarterclayton Feb 2, 2018 •

edited

Loading