Skip to content
This repository has been archived by the owner on Mar 27, 2020. It is now read-only.

Latest commit

 

History

History
70 lines (61 loc) · 4.26 KB

README.md

File metadata and controls

70 lines (61 loc) · 4.26 KB

Kafka Lag Monitor with Kamon and InfluxDB

GitHub version Docker Pulls

Example Kafka lag monitoring of consumer groups using Kafka AdminClient, Kamon and InfluxDB.

NOTE: It does not require access to Zookeeper.

This application is dockerized and the latest image can be found on Docker Hub mwizner/kafka-kamon-lag-monitor.

Configuration

The application can be configured using the following environment variables:

  • KAFKA_BOOTSTRAP_SERVERS - List of Kafka servers used to bootstrap connections to Kafka.
  • KAFKA_SECURITY_PROTOCOL - Protocol used to communicate with brokers. PLAINTEXT is the default value, use SSL to communicate with brokers via SSL.
  • MONITOR_CONSUMER_GROUP - Consumer group the monitor belongs to.
  • MONITOR_CLIENT_ID - Client id of the monitor.
  • CONSUMER_GROUPS - Consumer groups which offset will be monitored, e.g. group1 or group1,group2,group3.
  • POLL_INTERVAL - (optional) Time interval for polling Kafka, defaults to 5 seconds.
  • GROUP_LAG_HISTOGRAM_NAME - (optional) Name of the consumer group lag histogram, defaults to kafka-consumer-groups-lag.
  • GROUP_STATE_HISTOGRAM_NAME - (optional) Name of the consumer group state histogram, defaults to kafka-consumer-groups-state.
  • TICK_INTERVAL - (optional) Time interval for collecting the metrics, defaults to 10 seconds.
  • INFLUXDB_HOSTNAME - Hostname on which InfluxDB is running.
  • INFLUXDB_PORT - Port on which InfluxDB is listening.
  • EXIT_ON_ERROR - (optional) Exit with code 1 on any error thrown in the app, defaults to false. If you're running this in docker, you can use the --restart unless-stopped flag to restart the app when it exits.

The following settings are optional and apply only if KAFKA_SECURITY_PROTOCOL is set to SSL.

  • KAFKA_SSL_PROTOCOL - The SSL protocol used to generate the SSLContext, e.g. TLS.
  • KAFKA_SSL_KEY_PASSWORD - The password of the private key in the key store file.
  • KAFKA_SSL_KEYSTORE_TYPE - The file format of the key store file, e.g. PKCS12.
  • KAFKA_SSL_KEYSTORE_PATH - The location of the key store file.
  • KAFKA_SSL_KEYSTORE_PASSWORD - The password for the key store file.
  • KAFKA_SSL_TRUSTSTORE_TYPE - The file format of the trust store file, e.g. JKS.
  • KAFKA_SSL_TRUSTSTORE_PATH - The location of the trust store file.
  • KAFKA_SSL_TRUSTSTORE_PASSWORD - The password for the trust store file.

An example config:

KAFKA_BOOTSTRAP_SERVERS=localhost:9092
KAFKA_SECURITY_PROTOCOL=SSL
KAFKA_SSL_PROTOCOL=TLS
KAFKA_SSL_KEY_PASSWORD=password
KAFKA_SSL_KEYSTORE_PATH=/etc/opt/kafka-kamon-lag-monitor/keystore.p12
KAFKA_SSL_KEYSTORE_PASSWORD=password
KAFKA_SSL_KEYSTORE_TYPE=PKCS12
KAFKA_SSL_TRUSTSTORE_PATH=/etc/opt/kafka-kamon-lag-monitor/truststore.jks
KAFKA_SSL_TRUSTSTORE_PASSWORD=password
KAFKA_SSL_TRUSTSTORE_TYPE=JKS
MONITOR_CONSUMER_GROUP=lag-monitor
MONITOR_CLIENT_ID=lag-monitor
CONSUMER_GROUPS=service1,service2
INFLUXDB_HOSTNAME=localhost
INFLUXDB_PORT=8189
EXIT_ON_ERROR=true

Metrics

The app records Kafka metrics using Kamon histogram instruments - the measurements are exposed in InfluxDB as kafka-lag-monitor-timers and tagged with kafka-consumer-groups-lag and kafka-consumer-groups-state entities. Kafka consumer group states get translated into the following values in the histogram:

  • 0 - Dead or Empty
  • 1 - Stable
  • 2 - PreparingRebalance or CompletingRebalance
  • 3 - any other states

Changelog

  • 1.3.0 - Introduced a new config option to exit the application on error.
  • 1.2.0 - Fixed InfluxDB parsing issue & changed the consumer group state metric value to 3 for any unrecognised consumer group states.
  • 1.1.0 - Added metrics for consumer group state.
  • 1.0.1 - Made KAFKA_SECURITY_PROTOCOL environment variable optional.
  • 1.0.0 - Forked from bfil/kafka-kamon-lag-monitor and dockerized.