#Consul Alerting
A set of python files for Consul for checks, watches, and notifications. By using tags for services and checks, consulalerting will notify the corresponding groups by whichever plugins are also in the tags list. For example the redis service has plugins enabled for "hipchat" and will route notifications via hipchat to "devops" and "dev". These routes are defined in the Consul KV under alerting/notify/, and can be setup using ConsulAlertingKVBoostrap.py, Consul KV, or programatically.
#High Availability Consulalerting is not a separate daemon, each time a watch is triggered Consul itself will trigger the WatchCheckHandler. To ensure notifications are received even if the local instance of the consul server is down, other instances will still notify. This is done by using Consul's session locking feature with a TTL so in case consulalerting fails that specific hash of the catalog will continue to get reported on in the future. By installing consulalerting on each Consul server and registering the watch, the first consulalerting instance to acquire a lock for the current catalog md5sum hash will process the corresponding notifications. As long as Consul servers itself are not in a failed state consulalerting will continue to notify.
{
"service": {
"name": "redis",
"tags": ["devops","master","hipchat","dev"],
"port": 8000,
"checks": [{
"script": "/usr/local/bin/check_redis.py",
"interval": "10s"
}]
}
}
git clone https://github.com/jrxFive/consulalerting
cp -r consulalerting/consulalerting <DESTINATION_FOLDER>
# edit ConsulAlertingKVBoostrap.py or setup manually on ConsulKV interface
python consulalerting/ConsulAlertingKVBoostrap.py
After installing consulalerting in a directory of your choosing, use/edit ConsulAlertingKVBootstrap.py to ensure the scripts can obtain the necessary KV information from Consul.
blacklist_nodes = ["fqdn","other_fqdn"]
blacklist_services = ["redis"]
blacklist_checks = ["service:redis"]
health_check_tags = ["devops","hipchat","techops"]
notify_hipchat= {"api_token":"",
"url":"https://api.hipchat.com/v1/rooms/message",
"rooms":{"devops":3},
{"techops":4},
}
notify_slack= {"api_token":"",
"rooms":{"techops":"#techops"}}
notify_mailgun= {"api_token":"",
"mailgun_domain":"",
"from": "consul@domain.com",
"teams":{"devops":["guy@example.com","girl@example.com"],
"qa": "lonelyqa@example.com"}
}
notify_email= {"mail_domain_address":"email.domain.com",
"username":"",
"password":"",
"from": "consul@domain.com",
"teams":{"devops":["guy@example.com","girl@example.com"],
"qa": "lonelyqa@example.com"}
}
notify_pagerduty = {"teams":{
"devops":"<SERVICE_KEY>"
}
}
}
notify_influxdb= {"url":"http://localhost:8086/write",
"series":"test",
"databases":{"db":"mydb"}
}
notify_elasticsearchlog = {"logpaths": ["/path/to/log1"]}
notify_cachet = {"api_token": "tokenFromCachetUserProfile",
"site_url": "http://status.company.com",
"notify_subscribers": False
}
Variable Name | Type | Description |
---|---|---|
blacklist_nodes | List | Consul agents are not to notify of state changes, by "Node" name in /v1/health/node/ |
blacklist_services | List | Consul agents are not to notify of particular services, by "ServiceName" in /v1/health/node/ |
blacklist_checks | List | Consul agents are not to notify based on checks, by "CheckID" in /v1/health/node/ |
health_check_tags | List | Tags to be used to determine who to alert to and what type of alerts for non-application checks |
If you wish to disable all notification for a certain blacklist type simply use ["*"] as the blacklist array value.
After the script is run, you can always change these within the Consul UI
{
"watches": [
{
"type": "checks",
"handler": "<DESTINATION_FOLDER>/consulalerting/WatchCheckHandler.py >> <LOG_FILE_LOCATION>"
}
]
}
Keyname | Type | Description |
---|---|---|
api_token | string | Hipchat requires an auth_token |
url | string | URL address of API access for corresponding token |
rooms | dict | Create dictionaries within 'rooms' for tags corresponding to hipchat rooms |
Keyname | Type | Description |
---|---|---|
api_token | string | Slack requires an auth_token |
rooms | dict | Create dictionaries within 'rooms' for tags corresponding to slack channels |
Keyname | Type | Description |
---|---|---|
api_token | string | Mailgun requires an auth_token |
mailgun_domain | string | Mailgun domain address |
from | string | From address when receiving an email |
teams | dict | Create dictionaries within 'teams' for tags corresponding to teams or individuals |
Keyname | Type | Description |
---|---|---|
mail_domain_address | string | Email SMTP server to route alert |
username | string | If the email SMTP server requires authentication |
password | string | If the email SMTP server requires authentication |
from | string | From address when receiving an email |
teams | dict | Create dictionaries within 'teams' for tags corresponding to teams or individuals |
Keyname | Type | Description |
---|---|---|
teams | dict | Create dictionaries within 'teams' for tags corresponding to pagerduty teams, value is service_key |
Keyname | Type | Description |
---|---|---|
url | string | URL address of API access for the database |
series | string | Name of the database series that will contain the data |
databases | dict | Create dictionaries within 'databases' for tags corresponding to influxdb databases |
Keyname | Type | Description |
---|---|---|
logpath | array of strings | Absolute path(s) of logfile to write in elasticsearch format |
Keyname | Type | Description |
---|---|---|
api_token | string | The API token provided by Cachet via a user profile page |
site_url | string | The url of the Cachet instance |
notify_subscribers | boolean | Whether or not subscribers should be notified of the incident |
NOTE: In order for this plugin to report Cachet incidents to specific components, it is expected that in addition to the cachet
tag you also provide a "service nice name" as a tag. For example, if in Cachet your component is called "Data Import Service" you would then provided that same string as a tag in your service definition. If a matching tag is not found incidents will be reported with a generic name of "Consul State Change."
HA, install per leader, using locks and md5sums of statePlugin SeparationSettings as an import instead of inherited- Cleanup method documentation
- Influxdb 0.8/
0.9and logstash protocol plugins Wildcard blacklist- Improve KVBootstrap.py
- Integration tests
- Improve code coverage
- Use STDIN of catalog instead of lookup