Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus config reload livenessprobe #18391

Merged
merged 1 commit into from
Mar 8, 2018

Conversation

aweiteka
Copy link
Contributor

@aweiteka aweiteka commented Feb 1, 2018

Use case: config updated via configmap. When the change lands in the container (~60 seconds delay) the config directory hash changes and the process is killed. This does not kill the pod but results in a silent reload of config with a corresponding metric timestamp, prometheus_config_last_reload_success_timestamp_seconds.

Signed-off-by: Aaron Weitekamp aweiteka@redhat.com

@openshift-ci-robot openshift-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Feb 1, 2018
@aweiteka
Copy link
Contributor Author

aweiteka commented Feb 1, 2018

/cc @smarterclayton @simonpasquier

@openshift-ci-robot
Copy link

@aweiteka: GitHub didn't allow me to request PR reviews from the following users: simonpasquier.

Note that only openshift members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @smarterclayton @simonpasquier

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

exec:
command:
- /bin/sh
- -i
Copy link
Contributor

@smarterclayton smarterclayton Feb 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our convention for this is:

command:
- /bin/bash
- -c
args:
- |
  #!/bin/bash
  set -euo pipefail
  ....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears the exec liveness probe only supports 'command', not 'args'.

failureThreshold: 3
initialDelaySeconds: 60
periodSeconds: 60
successThreshold: 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove any of these which are defaults.

@smarterclayton
Copy link
Contributor

You have to call hack/update-generated-bindata.sh when you change these configs.

@zgalor
Copy link
Contributor

zgalor commented Feb 6, 2018

@aweiteka could you please PR here https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_prometheus/templates/prometheus.j2#L83 once this PR gets merged to keep the installation aligned with this template?

@aweiteka
Copy link
Contributor Author

aweiteka commented Feb 6, 2018

@zgalor yes, that is the plan. We want to keep these in sync. My understanding is we try stuff out in examples, port to openshift-ansible. Not clear if that's the best approach.

@openshift-ci-robot openshift-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 6, 2018
@aweiteka
Copy link
Contributor Author

aweiteka commented Feb 6, 2018

FYI, log output of prometheus container:

  1. configmap edited
  2. +/- 60s
level=warn ts=2018-02-06T16:15:46.891236728Z caller=main.go:377 msg="Received SIGTERM, exiting gracefully..."
level=info ts=2018-02-06T16:15:46.891404166Z caller=main.go:384 msg="See you next time!"
level=info ts=2018-02-06T16:15:46.893127633Z caller=targetmanager.go:87 component="target manager" msg="Stopping target manager..."
level=info ts=2018-02-06T16:15:46.893882636Z caller=targetmanager.go:99 component="target manager" msg="Target manager stopped"
level=info ts=2018-02-06T16:15:46.893959561Z caller=manager.go:455 component="rule manager" msg="Stopping rule manager..."
level=info ts=2018-02-06T16:15:46.894014579Z caller=manager.go:461 component="rule manager" msg="Rule manager stopped"
level=info ts=2018-02-06T16:15:46.894045356Z caller=notifier.go:483 component=notifier msg="Stopping notification handler..."
  1. brief (~5s) 'service unavailable'
  2. service start
level=info ts=2018-02-06T16:15:48.999812989Z caller=main.go:215 msg="Starting Prometheus" version="(version=2.0.0, branch=HEAD, revision=0a74f98628a0463dddc90528220c94de5032d1a0)"
level=info ts=2018-02-06T16:15:49.000012873Z caller=main.go:216 build_context="(go=go1.8.3, user=root@prometheus-binary-7-build, date=20171108-16:47:03)"
level=info ts=2018-02-06T16:15:49.000055345Z caller=main.go:217 host_details="(Linux 3.10.0-693.15.2.el7.x86_64 #1 SMP Thu Jan 4 15:00:51 EST 2018 x86_64 prometheus-0 (none))"
level=info ts=2018-02-06T16:15:49.003728455Z caller=web.go:380 component=web msg="Start listening for connections" address=localhost:9090
level=info ts=2018-02-06T16:15:49.004166188Z caller=main.go:314 msg="Starting TSDB"
level=info ts=2018-02-06T16:15:49.006140679Z caller=targetmanager.go:71 component="target manager" msg="Starting target manager..."
level=info ts=2018-02-06T16:15:53.197838185Z caller=main.go:326 msg="TSDB started"
level=info ts=2018-02-06T16:15:53.19800798Z caller=main.go:394 msg="Loading configuration file" filename=/etc/prometheus/prometheus.yml
level=info ts=2018-02-06T16:15:53.201375345Z caller=kubernetes.go:100 component="target manager" discovery=k8s msg="Using pod service account via in-cluster config"
level=info ts=2018-02-06T16:15:53.203472216Z caller=kubernetes.go:100 component="target manager" discovery=k8s msg="Using pod service account via in-cluster config"
level=info ts=2018-02-06T16:15:53.205671447Z caller=kubernetes.go:100 component="target manager" discovery=k8s msg="Using pod service account via in-cluster config"
level=info ts=2018-02-06T16:15:53.208001974Z caller=kubernetes.go:100 component="target manager" discovery=k8s msg="Using pod service account via in-cluster config"
level=info ts=2018-02-06T16:15:53.210036112Z caller=kubernetes.go:100 component="target manager" discovery=k8s msg="Using pod service account via in-cluster config"
level=info ts=2018-02-06T16:15:53.212384308Z caller=kubernetes.go:100 component="target manager" discovery=k8s msg="Using pod service account via in-cluster config"
level=info ts=2018-02-06T16:15:53.215212678Z caller=kubernetes.go:100 component="target manager" discovery=k8s msg="Using pod service account via in-cluster config"
level=info ts=2018-02-06T16:15:53.220644156Z caller=main.go:371 msg="Server is ready to receive requests."

Signed-off-by: Aaron Weitekamp <aweiteka@redhat.com>
@aweiteka
Copy link
Contributor Author

aweiteka commented Feb 8, 2018

@spadgett can you take a look? This is the formatting I found I needed to make this work. Tough to debug. :(

@spadgett
Copy link
Member

spadgett commented Feb 9, 2018

This is the formatting I found I needed to make this work. Tough to debug. :(

I don't know of another way :/

@aweiteka
Copy link
Contributor Author

/retest

@aweiteka
Copy link
Contributor Author

@smarterclayton ready for review/merge

@pgier
Copy link
Contributor

pgier commented Mar 5, 2018

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 5, 2018
@smarterclayton
Copy link
Contributor

/approve

@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aweiteka, pgier, smarterclayton

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 7, 2018
@openshift-merge-robot
Copy link
Contributor

Automatic merge from submit-queue (batch tested with PRs 18780, 18802, 18391, 18832, 18808).

@openshift-merge-robot openshift-merge-robot merged commit 5940853 into openshift:master Mar 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants