From f74d79dad41dc6dea349f8fb89ccac93c496d027 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Sebastian=20R=C3=B6bke?= Date: Thu, 14 Nov 2019 10:32:19 +0100 Subject: [PATCH 1/2] Update alerting information in README --- README.md | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/README.md b/README.md index 1a340a9..d386434 100644 --- a/README.md +++ b/README.md @@ -81,6 +81,21 @@ Run this controller on Kubernetes with the following commands: ## Alerting on OOM killed pods +There are many different ways to send alerts when an OOM occurs. We just want to +mention two of them here. + +### Forwarding OOM events to Graylog + +Graylog is a popular log management solution, and it includes an alerting feature. +See the [Graylog docs] for more details. + +At XING we forward all Kubernetes cluster events to Graylog using our +[kubernetes-event-forwarder-gelf]. This allows us to configure alerts whenever a +`PreviousContainerWasOOMKilled` event generated by the `kubernetes-oom-event-generator` +occurs. + +### Using kube-state-metrics and Prometheus alerts + When [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics) is deployed in the cluster and a [prometheus](https://prometheus.io) installation is scraping the metrics you can alert on OOM-killed pods using the prometheus alert manager. @@ -96,6 +111,10 @@ Example alert: annotations: description: Critical Pod {{$labels.namespace}}/{{$labels.pod}} was OOMKilled. +The downside is that `kube_pod_container_status_terminated_reason` always returns to 0 once +a container starts back up. See the introduction of +[kube_pod_container_status_last_terminated_reason] for more details. + # Developing You will need a working Go installation (1.11+) and the `make` program. You will also @@ -111,6 +130,9 @@ Make sure to run `go mod tidy` before you check in after changing dependencies i [Go module system]: https://github.com/golang/go/wiki/Modules [`xingse/kubernetes-oom-event-generator`]: https://hub.docker.com/r/xingse/kubernetes-oom-event-generator +[kubernetes-event-forwarder-gelf]: https://github.com/xing/kubernetes-event-forwarder-gelf +[Graylog docs]: https://docs.graylog.org/ +[kube_pod_container_status_last_terminated_reason]: https://github.com/kubernetes/kube-state-metrics/pull/535 ## Releases From f58271835e90802a66dc117fdb3935aa8b8e36a7 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Sebastian=20R=C3=B6bke?= Date: Thu, 14 Nov 2019 10:59:42 +0100 Subject: [PATCH 2/2] Remove inline links in README --- README.md | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index d386434..ec50626 100644 --- a/README.md +++ b/README.md @@ -96,9 +96,8 @@ occurs. ### Using kube-state-metrics and Prometheus alerts -When [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics) is deployed -in the cluster and a [prometheus](https://prometheus.io) installation is scraping the metrics -you can alert on OOM-killed pods using the prometheus alert manager. +When [kube-state-metrics] is deployed in the cluster and a [Prometheus] installation +is scraping the metrics, you can alert on OOM-killed pods using the prometheus alert manager. Example alert: @@ -113,7 +112,7 @@ Example alert: The downside is that `kube_pod_container_status_terminated_reason` always returns to 0 once a container starts back up. See the introduction of -[kube_pod_container_status_last_terminated_reason] for more details. +[`kube_pod_container_status_last_terminated_reason`] for more details. # Developing @@ -130,9 +129,11 @@ Make sure to run `go mod tidy` before you check in after changing dependencies i [Go module system]: https://github.com/golang/go/wiki/Modules [`xingse/kubernetes-oom-event-generator`]: https://hub.docker.com/r/xingse/kubernetes-oom-event-generator -[kubernetes-event-forwarder-gelf]: https://github.com/xing/kubernetes-event-forwarder-gelf [Graylog docs]: https://docs.graylog.org/ -[kube_pod_container_status_last_terminated_reason]: https://github.com/kubernetes/kube-state-metrics/pull/535 +[kubernetes-event-forwarder-gelf]: https://github.com/xing/kubernetes-event-forwarder-gelf +[kube-state-metrics]: https://github.com/kubernetes/kube-state-metrics +[Prometheus]: https://prometheus.io +[`kube_pod_container_status_last_terminated_reason`]: https://github.com/kubernetes/kube-state-metrics/pull/535 ## Releases