diff --git a/README.md b/README.md index 1a340a9..ec50626 100644 --- a/README.md +++ b/README.md @@ -81,9 +81,23 @@ Run this controller on Kubernetes with the following commands: ## Alerting on OOM killed pods -When [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics) is deployed -in the cluster and a [prometheus](https://prometheus.io) installation is scraping the metrics -you can alert on OOM-killed pods using the prometheus alert manager. +There are many different ways to send alerts when an OOM occurs. We just want to +mention two of them here. + +### Forwarding OOM events to Graylog + +Graylog is a popular log management solution, and it includes an alerting feature. +See the [Graylog docs] for more details. + +At XING we forward all Kubernetes cluster events to Graylog using our +[kubernetes-event-forwarder-gelf]. This allows us to configure alerts whenever a +`PreviousContainerWasOOMKilled` event generated by the `kubernetes-oom-event-generator` +occurs. + +### Using kube-state-metrics and Prometheus alerts + +When [kube-state-metrics] is deployed in the cluster and a [Prometheus] installation +is scraping the metrics, you can alert on OOM-killed pods using the prometheus alert manager. Example alert: @@ -96,6 +110,10 @@ Example alert: annotations: description: Critical Pod {{$labels.namespace}}/{{$labels.pod}} was OOMKilled. +The downside is that `kube_pod_container_status_terminated_reason` always returns to 0 once +a container starts back up. See the introduction of +[`kube_pod_container_status_last_terminated_reason`] for more details. + # Developing You will need a working Go installation (1.11+) and the `make` program. You will also @@ -111,6 +129,11 @@ Make sure to run `go mod tidy` before you check in after changing dependencies i [Go module system]: https://github.com/golang/go/wiki/Modules [`xingse/kubernetes-oom-event-generator`]: https://hub.docker.com/r/xingse/kubernetes-oom-event-generator +[Graylog docs]: https://docs.graylog.org/ +[kubernetes-event-forwarder-gelf]: https://github.com/xing/kubernetes-event-forwarder-gelf +[kube-state-metrics]: https://github.com/kubernetes/kube-state-metrics +[Prometheus]: https://prometheus.io +[`kube_pod_container_status_last_terminated_reason`]: https://github.com/kubernetes/kube-state-metrics/pull/535 ## Releases