Skip to content

Commit

Permalink
Merge pull request #8 from xing/update-alerting-info
Browse files Browse the repository at this point in the history
Update alerting information in README
  • Loading branch information
boosty authored Nov 14, 2019
2 parents 33d359c + f582718 commit dc0650e
Showing 1 changed file with 26 additions and 3 deletions.
29 changes: 26 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,9 +81,23 @@ Run this controller on Kubernetes with the following commands:

## Alerting on OOM killed pods

When [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics) is deployed
in the cluster and a [prometheus](https://prometheus.io) installation is scraping the metrics
you can alert on OOM-killed pods using the prometheus alert manager.
There are many different ways to send alerts when an OOM occurs. We just want to
mention two of them here.

### Forwarding OOM events to Graylog

Graylog is a popular log management solution, and it includes an alerting feature.
See the [Graylog docs] for more details.

At XING we forward all Kubernetes cluster events to Graylog using our
[kubernetes-event-forwarder-gelf]. This allows us to configure alerts whenever a
`PreviousContainerWasOOMKilled` event generated by the `kubernetes-oom-event-generator`
occurs.

### Using kube-state-metrics and Prometheus alerts

When [kube-state-metrics] is deployed in the cluster and a [Prometheus] installation
is scraping the metrics, you can alert on OOM-killed pods using the prometheus alert manager.

Example alert:

Expand All @@ -96,6 +110,10 @@ Example alert:
annotations:
description: Critical Pod {{$labels.namespace}}/{{$labels.pod}} was OOMKilled.

The downside is that `kube_pod_container_status_terminated_reason` always returns to 0 once
a container starts back up. See the introduction of
[`kube_pod_container_status_last_terminated_reason`] for more details.

# Developing

You will need a working Go installation (1.11+) and the `make` program. You will also
Expand All @@ -111,6 +129,11 @@ Make sure to run `go mod tidy` before you check in after changing dependencies i

[Go module system]: https://github.com/golang/go/wiki/Modules
[`xingse/kubernetes-oom-event-generator`]: https://hub.docker.com/r/xingse/kubernetes-oom-event-generator
[Graylog docs]: https://docs.graylog.org/
[kubernetes-event-forwarder-gelf]: https://github.com/xing/kubernetes-event-forwarder-gelf
[kube-state-metrics]: https://github.com/kubernetes/kube-state-metrics
[Prometheus]: https://prometheus.io
[`kube_pod_container_status_last_terminated_reason`]: https://github.com/kubernetes/kube-state-metrics/pull/535

## Releases

Expand Down

0 comments on commit dc0650e

Please sign in to comment.