Skip to content
Amnon Heiman edited this page Feb 8, 2018 · 4 revisions

Alerts with Grafana and Prometheus

The main usage of the Grafana/Prometheus is graph display of time series. But there are cases where it is not enough, for example, when a node is down. It is clearer to get a text alert that a node was down with a timestamp of when it has happened.

We are going to explore the node-down example to explain the alert mechanism.

Components in the monitoring stack for alarms

We use Prometheus, Grafana, and alert-manager to report the alarms.

In the context of alarm reporting each plays a different role:

Prometheus

The Prometheus server stores the data and creates the Alarms. Prometheus 1.8 Alarms.

Prometheus alarms are stored in a file, you can find it in prometheus/prometheus.rules

Alert Rule

The basic structure of an alert is

ALERT <alert name>
  IF <expression>
  [ FOR <duration> ]
  [ LABELS <label set> ]
  [ ANNOTATIONS <label set> ]

For the node-down example

ALERT InstanceDown
   IF up == 0
   FOR 30s
   LABELS { severity = "1" }
   ANNOTATIONS {
     summary = "Instance {{ $labels.instance }} down",
     description = "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 30 seconds.",
   }

Meaning that if a node is not responsive for 30 seconds it will trigger a down alert.

The way alerts are implemented in Prometheus, the Prometheus server will continue to generate alerts as long as the condition stands. This is not very clear in the grafana dashboard, so we are adding an alertmanager

Alertmanager

The Alertmanager comes from prometheus.io. We use it to limit the number of alerts we get in Grafana, but make sure to learn more about its capabilities.

The Alertmanager can send notification via various channels like email or slack.

It can apply an extra layer of logic on alerts to group and silence alerts.

For our use case, it serves as a data source for Grafana. A default configuration for the alertmanager is included, but you should check the alertmanager configuration guide to learn how to use it for alerts reporting.

Grafana

Grafana 3.0 and up support the Prometheus alertmanager data source plugin. If you are using scylla containers stack, the start-all command will load the plugin for you.

To see the alerts add a table panel to your dashboard