v2.4.3 (2019-06-24)
Closed issues:
- Tag v2.4.2 release #396
- Tag v2.4.0 release #393
- Predict disk checks tmpfs #391
- Tag v2.3.1 release #388
- Tag v2.3.1 release #385
- Tag v2.3.0 release #382
- Tag v2.3.0 release #379
- Tag v2.3.0 release #376
- Prometheus startup script calls a consul kv value that doesn't exist #374
- Fix metadata error on boot #372
- Tag v2.3.0 release #369
- Tag v2.3.0 release #366
- Tag v2.3.0 release #362
- Tag v2.3.0 release #359
- Scrape Redis CloudWatch metrics #357
- Support gracefully reloading configuration for services that support it #351
- Categorize other alerts #349
- [notifications] Categorize cron alerts as non-critical #346
- Slack notification to all alert routes #344
- Upgrade Traefik to 1.5.4 #342
- Upgrade blackbox_exporter to 0.12.0 #341
- Upgrade AlertManager to 0.14.0 #340
- Upgrade to Prometheus 2.2.1 #339
- Tag v2.2.0 release #336
- [TF] Cleanup for 0.11.x #334
- Rename 'sink' alert route to something more descriptive #330
- Tag project with platform tag #328
- [VarnishCacheHitRateTooLow] Don't alert if overall traffic is minimal #324
- Cleanup old pagerduty integration #322
- [monitoring] TimeSanity: 15 minutes too short of a time for NTP state to settle #320
- [pagerduty] Add more pagerduty integration #318
- Tag v2.1.0 release #314
- Fixing prometheus rules #313
- Tag v2.1.0 release #310
- Use base's swap support in favor of our own solution #308
- Tag v2.1.0 release #305
- [pagerduty] handle <UNSET> as magical value #303
- Tag v2.1.0 release #300
- Improve Monitoring Coverage #298
- [traefik] Upgrade to 1.4.6 #294
Merged pull requests:
- Update CHANGELOG for v2.4.2 release [skip ci] #398 (gozer)
- Update CHANGELOG for v2.4.2 release [skip ci] #397 (gozer)
- Update CHANGELOG for v2.4.0 release [skip ci] #395 (nubis-automation)
- Update CHANGELOG for v2.4.0 release [skip ci] #394 (nubis-automation)
- We don't need to check tmpfs for predict disk #392 (limed)
- Update CHANGELOG for v2.3.1 release [skip ci] #390 (nubis-automation)
- Update CHANGELOG for v2.3.1 release [skip ci] #389 (nubis-automation)
- Update CHANGELOG for v2.3.1 release [skip ci] #387 (nubis-automation)
- Update CHANGELOG for v2.3.1 release [skip ci] #386 (nubis-automation)
- Update CHANGELOG for v2.3.0 release [skip ci] #384 (nubis-automation)
- Update CHANGELOG for v2.3.0 release [skip ci] #383 (nubis-automation)
- Update CHANGELOG for v2.3.0 release [skip ci] #381 (nubis-automation)
- Update CHANGELOG for v2.3.0 release [skip ci] #380 (nubis-automation)
- Update CHANGELOG for v2.3.0 release [skip ci] #378 (nubis-automation)
- Update CHANGELOG for v2.3.0 release [skip ci] #377 (nubis-automation)
- Fixing prometheus startup script #375 (limed)
- Fix metadata error on boot #373 (tinnightcap)
- Update CHANGELOG for v2.3.0 release [skip ci] #371 (nubis-automation)
- Update CHANGELOG for v2.3.0 release [skip ci] #370 (nubis-automation)
- Update CHANGELOG for v2.3.0 release [skip ci] #368 (nubis-automation)
- Update CHANGELOG for v2.3.0 release [skip ci] #367 (nubis-automation)
- Updated graph to split with project #365 (limed)
- Update CHANGELOG for v2.3.0 release [skip ci] #364 (nubis-automation)
- Update CHANGELOG for v2.3.0 release [skip ci] #363 (nubis-automation)
- Update CHANGELOG for v2.3.0 release [skip ci] #361 (nubis-automation)
- Update CHANGELOG for v2.3.0 release [skip ci] #360 (nubis-automation)
- Add Redis CloudWatch metrics #358 (gozer)
- fix tyop #356 (gozer)
- Include additional metrics in eposition format #355 (tinnightcap)
- Reload configuration for services that support it, restart otherwise #352 (gozer)
- Categorize some other alerts as non-critical #350 (limed)
- Cron no longer categorized as critical #348 (limed)
- Make sure slack notifications show up on all alerts #347 (limed)
- Upgrade Prometheus, AlertManager, Traefik & blackbox_exporter #343 (gozer)
- Update CHANGELOG for v2.2.0 release [skip ci] #338 (nubis-automation)
- Update CHANGELOG for v2.2.0 release [skip ci] #337 (nubis-automation)
- Fix #334 #335 (gozer)
- Cleanup alert routing #333 (limed)
- All these alerts are for fluentd-elasticsearch so we just tag them as nubis #332 (limed)
- Fixing up all rules and fixing up alert routes #331 (limed)
- Tag project as a platform component #329 (limed)
- Fixing broken rule #327 (limed)
- Limit alerting to Varnish if it's seeing at least a non-trivial amount of overall traffic #325 (gozer)
- Remove old pagerduty integration key #323 (limed)
- Give NTPd 30 minutes to stabilize #321 (gozer)
- Added support for pagerduty integration #319 (limed)
- Update nubis-travis #317 (tinnightcap)
- Update CHANGELOG for v2.1.0 release [skip ci] #316 (nubis-automation)
- Update CHANGELOG for v2.1.0 release [skip ci] #315 (nubis-automation)
- Update CHANGELOG for v2.1.0 release [skip ci] #312 (nubis-automation)
- Update CHANGELOG for v2.1.0 release [skip ci] #311 (nubis-automation)
- Use base's support for swap now #309 (gozer)
- Update CHANGELOG for v2.1.0 release [skip ci] #307 (nubis-automation)
- Update CHANGELOG for v2.1.0 release [skip ci] #306 (nubis-automation)
- [pagerduty] Handle <UNSET> as magical disabled value #304 (gozer)
- Update CHANGELOG for v2.1.0 release [skip ci] #302 (nubis-automation)
- Update CHANGELOG for v2.1.0 release [skip ci] #301 (nubis-automation)
- Improve our monitoring coverage #299 (gozer)
- Initial sample documentation for Alerts #297 (gozer)
- Create .directory fake files to keep our empty directories from vanishing in S3 #293 (gozer)
v2.4.2 (2019-06-20)
Closed issues:
- Tag v2.4.0 release #393
- Predict disk checks tmpfs #391
- Tag v2.3.1 release #388
- Tag v2.3.1 release #385
- Tag v2.3.0 release #382
- Tag v2.3.0 release #379
- Tag v2.3.0 release #376
- Prometheus startup script calls a consul kv value that doesn't exist #374
- Fix metadata error on boot #372
- Tag v2.3.0 release #369
- Tag v2.3.0 release #366
- Tag v2.3.0 release #362
- Tag v2.3.0 release #359
- Scrape Redis CloudWatch metrics #357
- Support gracefully reloading configuration for services that support it #351
- Categorize other alerts #349
- [notifications] Categorize cron alerts as non-critical #346
- Slack notification to all alert routes #344
- Upgrade Traefik to 1.5.4 #342
- Upgrade blackbox_exporter to 0.12.0 #341
- Upgrade AlertManager to 0.14.0 #340
- Upgrade to Prometheus 2.2.1 #339
- Tag v2.2.0 release #336
- [TF] Cleanup for 0.11.x #334
- Rename 'sink' alert route to something more descriptive #330
- Tag project with platform tag #328
- [VarnishCacheHitRateTooLow] Don't alert if overall traffic is minimal #324
- Cleanup old pagerduty integration #322
- [monitoring] TimeSanity: 15 minutes too short of a time for NTP state to settle #320
- [pagerduty] Add more pagerduty integration #318
- Tag v2.1.0 release #314
- Fixing prometheus rules #313
- Tag v2.1.0 release #310
- Use base's swap support in favor of our own solution #308
- Tag v2.1.0 release #305
- [pagerduty] handle <UNSET> as magical value #303
- Tag v2.1.0 release #300
- Improve Monitoring Coverage #298
- [traefik] Upgrade to 1.4.6 #294
- [backup] S3 sync doesn't support empty files #292
- Tag v2.0.4 release #290
Merged pull requests:
- Update CHANGELOG for v2.4.0 release [skip ci] #395 (nubis-automation)
- Update CHANGELOG for v2.4.0 release [skip ci] #394 (nubis-automation)
- We don't need to check tmpfs for predict disk #392 (limed)
- Update CHANGELOG for v2.3.1 release [skip ci] #390 (nubis-automation)
- Update CHANGELOG for v2.3.1 release [skip ci] #389 (nubis-automation)
- Update CHANGELOG for v2.3.1 release [skip ci] #387 (nubis-automation)
- Update CHANGELOG for v2.3.1 release [skip ci] #386 (nubis-automation)
- Update CHANGELOG for v2.3.0 release [skip ci] #384 (nubis-automation)
- Update CHANGELOG for v2.3.0 release [skip ci] #383 (nubis-automation)
- Update CHANGELOG for v2.3.0 release [skip ci] #381 (nubis-automation)
- Update CHANGELOG for v2.3.0 release [skip ci] #380 (nubis-automation)
- Update CHANGELOG for v2.3.0 release [skip ci] #378 (nubis-automation)
- Update CHANGELOG for v2.3.0 release [skip ci] #377 (nubis-automation)
- Fixing prometheus startup script #375 (limed)
- Fix metadata error on boot #373 (tinnightcap)
- Update CHANGELOG for v2.3.0 release [skip ci] #371 (nubis-automation)
- Update CHANGELOG for v2.3.0 release [skip ci] #370 (nubis-automation)
- Update CHANGELOG for v2.3.0 release [skip ci] #368 (nubis-automation)
- Update CHANGELOG for v2.3.0 release [skip ci] #367 (nubis-automation)
- Updated graph to split with project #365 (limed)
- Update CHANGELOG for v2.3.0 release [skip ci] #364 (nubis-automation)
- Update CHANGELOG for v2.3.0 release [skip ci] #363 (nubis-automation)
- Update CHANGELOG for v2.3.0 release [skip ci] #361 (nubis-automation)
- Update CHANGELOG for v2.3.0 release [skip ci] #360 (nubis-automation)
- Add Redis CloudWatch metrics #358 (gozer)
- fix tyop #356 (gozer)
- Include additional metrics in eposition format #355 (tinnightcap)
- Reload configuration for services that support it, restart otherwise #352 (gozer)
- Categorize some other alerts as non-critical #350 (limed)
- Cron no longer categorized as critical #348 (limed)
- Make sure slack notifications show up on all alerts #347 (limed)
- Upgrade Prometheus, AlertManager, Traefik & blackbox_exporter #343 (gozer)
- Update CHANGELOG for v2.2.0 release [skip ci] #338 (nubis-automation)
- Update CHANGELOG for v2.2.0 release [skip ci] #337 (nubis-automation)
- Fix #334 #335 (gozer)
- Cleanup alert routing #333 (limed)
- All these alerts are for fluentd-elasticsearch so we just tag them as nubis #332 (limed)
- Fixing up all rules and fixing up alert routes #331 (limed)
- Tag project as a platform component #329 (limed)
- Fixing broken rule #327 (limed)
- Limit alerting to Varnish if it's seeing at least a non-trivial amount of overall traffic #325 (gozer)
- Remove old pagerduty integration key #323 (limed)
- Give NTPd 30 minutes to stabilize #321 (gozer)
- Added support for pagerduty integration #319 (limed)
- Update nubis-travis #317 (tinnightcap)
- Update CHANGELOG for v2.1.0 release [skip ci] #316 (nubis-automation)
- Update CHANGELOG for v2.1.0 release [skip ci] #315 (nubis-automation)
- Update CHANGELOG for v2.1.0 release [skip ci] #312 (nubis-automation)
- Update CHANGELOG for v2.1.0 release [skip ci] #311 (nubis-automation)
- Use base's support for swap now #309 (gozer)
- Update CHANGELOG for v2.1.0 release [skip ci] #307 (nubis-automation)
- Update CHANGELOG for v2.1.0 release [skip ci] #306 (nubis-automation)
- [pagerduty] Handle <UNSET> as magical disabled value #304 (gozer)
- Update CHANGELOG for v2.1.0 release [skip ci] #302 (nubis-automation)
- Update CHANGELOG for v2.1.0 release [skip ci] #301 (nubis-automation)
- Improve our monitoring coverage #299 (gozer)
- Initial sample documentation for Alerts #297 (gozer)
- Create .directory fake files to keep our empty directories from vanishing in S3 #293 (gozer)
- add missing liecnese #291 (gozer)
v2.4.0 (2019-03-06)
Closed issues:
- Predict disk checks tmpfs #391
- Tag v2.3.1 release #388
- Tag v2.3.1 release #385
- Tag v2.3.0 release #382
- Tag v2.3.0 release #379
- Tag v2.3.0 release #376
- Prometheus startup script calls a consul kv value that doesn't exist #374
- Fix metadata error on boot #372
- Tag v2.3.0 release #369
- Tag v2.3.0 release #366
- Tag v2.3.0 release #362
- Tag v2.3.0 release #359
- Scrape Redis CloudWatch metrics #357
- Support gracefully reloading configuration for services that support it #351
- Categorize other alerts #349
- [notifications] Categorize cron alerts as non-critical #346
- Slack notification to all alert routes #344
- Upgrade Traefik to 1.5.4 #342
- Upgrade blackbox_exporter to 0.12.0 #341
- Upgrade AlertManager to 0.14.0 #340
- Upgrade to Prometheus 2.2.1 #339
- Tag v2.2.0 release #336
- [TF] Cleanup for 0.11.x #334
- Rename 'sink' alert route to something more descriptive #330
- Tag project with platform tag #328
- [VarnishCacheHitRateTooLow] Don't alert if overall traffic is minimal #324
- Cleanup old pagerduty integration #322
- [monitoring] TimeSanity: 15 minutes too short of a time for NTP state to settle #320
- [pagerduty] Add more pagerduty integration #318
- Tag v2.1.0 release #314
- Fixing prometheus rules #313
- Tag v2.1.0 release #310
- Use base's swap support in favor of our own solution #308
- Tag v2.1.0 release #305
- [pagerduty] handle <UNSET> as magical value #303
- Tag v2.1.0 release #300
- Improve Monitoring Coverage #298
- [traefik] Upgrade to 1.4.6 #294
- [backup] S3 sync doesn't support empty files #292
- Tag v2.0.4 release #290
- [alerts] Group by project #287
Merged pull requests:
- We don't need to check tmpfs for predict disk #392 (limed)
- Update CHANGELOG for v2.3.1 release [skip ci] #390 (nubis-automation)
- Update CHANGELOG for v2.3.1 release [skip ci] #389 (nubis-automation)
- Update CHANGELOG for v2.3.1 release [skip ci] #387 (nubis-automation)
- Update CHANGELOG for v2.3.1 release [skip ci] #386 (nubis-automation)
- Update CHANGELOG for v2.3.0 release [skip ci] #384 (nubis-automation)
- Update CHANGELOG for v2.3.0 release [skip ci] #383 (nubis-automation)
- Update CHANGELOG for v2.3.0 release [skip ci] #381 (nubis-automation)
- Update CHANGELOG for v2.3.0 release [skip ci] #380 (nubis-automation)
- Update CHANGELOG for v2.3.0 release [skip ci] #378 (nubis-automation)
- Update CHANGELOG for v2.3.0 release [skip ci] #377 (nubis-automation)
- Fixing prometheus startup script #375 (limed)
- Fix metadata error on boot #373 (tinnightcap)
- Update CHANGELOG for v2.3.0 release [skip ci] #371 (nubis-automation)
- Update CHANGELOG for v2.3.0 release [skip ci] #370 (nubis-automation)
- Update CHANGELOG for v2.3.0 release [skip ci] #368 (nubis-automation)
- Update CHANGELOG for v2.3.0 release [skip ci] #367 (nubis-automation)
- Updated graph to split with project #365 (limed)
- Update CHANGELOG for v2.3.0 release [skip ci] #364 (nubis-automation)
- Update CHANGELOG for v2.3.0 release [skip ci] #363 (nubis-automation)
- Update CHANGELOG for v2.3.0 release [skip ci] #361 (nubis-automation)
- Update CHANGELOG for v2.3.0 release [skip ci] #360 (nubis-automation)
- Add Redis CloudWatch metrics #358 (gozer)
- fix tyop #356 (gozer)
- Include additional metrics in eposition format #355 (tinnightcap)
- Reload configuration for services that support it, restart otherwise #352 (gozer)
- Categorize some other alerts as non-critical #350 (limed)
- Cron no longer categorized as critical #348 (limed)
- Make sure slack notifications show up on all alerts #347 (limed)
- Upgrade Prometheus, AlertManager, Traefik & blackbox_exporter #343 (gozer)
- Update CHANGELOG for v2.2.0 release [skip ci] #338 (nubis-automation)
- Update CHANGELOG for v2.2.0 release [skip ci] #337 (nubis-automation)
- Fix #334 #335 (gozer)
- Cleanup alert routing #333 (limed)
- All these alerts are for fluentd-elasticsearch so we just tag them as nubis #332 (limed)
- Fixing up all rules and fixing up alert routes #331 (limed)
- Tag project as a platform component #329 (limed)
- Fixing broken rule #327 (limed)
- Limit alerting to Varnish if it's seeing at least a non-trivial amount of overall traffic #325 (gozer)
- Remove old pagerduty integration key #323 (limed)
- Give NTPd 30 minutes to stabilize #321 (gozer)
- Added support for pagerduty integration #319 (limed)
- Update nubis-travis #317 (tinnightcap)
- Update CHANGELOG for v2.1.0 release [skip ci] #316 (nubis-automation)
- Update CHANGELOG for v2.1.0 release [skip ci] #315 (nubis-automation)
- Update CHANGELOG for v2.1.0 release [skip ci] #312 (nubis-automation)
- Update CHANGELOG for v2.1.0 release [skip ci] #311 (nubis-automation)
- Use base's support for swap now #309 (gozer)
- Update CHANGELOG for v2.1.0 release [skip ci] #307 (nubis-automation)
- Update CHANGELOG for v2.1.0 release [skip ci] #306 (nubis-automation)
- [pagerduty] Handle <UNSET> as magical disabled value #304 (gozer)
- Update CHANGELOG for v2.1.0 release [skip ci] #302 (nubis-automation)
- Update CHANGELOG for v2.1.0 release [skip ci] #301 (nubis-automation)
- Improve our monitoring coverage #299 (gozer)
- Initial sample documentation for Alerts #297 (gozer)
- Create .directory fake files to keep our empty directories from vanishing in S3 #293 (gozer)
- add missing liecnese #291 (gozer)
- Fixing graph again #289 (limed)
- Add project to alert grouping #288 (gozer)
v2.3.1 (2018-08-21)
Closed issues:
Merged pull requests:
- Update CHANGELOG for v2.3.1 release [skip ci] #387 (nubis-automation)
- Update CHANGELOG for v2.3.1 release [skip ci] #386 (nubis-automation)
v2.3.0 (2018-08-01)
Closed issues:
- Prometheus startup script calls a consul kv value that doesn't exist #374
- Fix metadata error on boot #372
- Scrape Redis CloudWatch metrics #357
- Support gracefully reloading configuration for services that support it #351
- Categorize other alerts #349
- [notifications] Categorize cron alerts as non-critical #346
- Slack notification to all alert routes #344
- Upgrade Traefik to 1.5.4 #342
- Upgrade blackbox_exporter to 0.12.0 #341
- Upgrade AlertManager to 0.14.0 #340
- Upgrade to Prometheus 2.2.1 #339
- Tag v2.2.0 release #336
- Tag v2.3.0 release #379
- Tag v2.3.0 release #376
- Tag v2.3.0 release #369
- Tag v2.3.0 release #366
- Tag v2.3.0 release #362
- Tag v2.3.0 release #359
Merged pull requests:
- Update CHANGELOG for v2.3.0 release [skip ci] #381 (nubis-automation)
- Update CHANGELOG for v2.3.0 release [skip ci] #380 (nubis-automation)
- Update CHANGELOG for v2.3.0 release [skip ci] #378 (nubis-automation)
- Update CHANGELOG for v2.3.0 release [skip ci] #377 (nubis-automation)
- Fixing prometheus startup script #375 (limed)
- Fix metadata error on boot #373 (tinnightcap)
- Update CHANGELOG for v2.3.0 release [skip ci] #371 (nubis-automation)
- Update CHANGELOG for v2.3.0 release [skip ci] #370 (nubis-automation)
- Update CHANGELOG for v2.3.0 release [skip ci] #368 (nubis-automation)
- Update CHANGELOG for v2.3.0 release [skip ci] #367 (nubis-automation)
- Updated graph to split with project #365 (limed)
- Update CHANGELOG for v2.3.0 release [skip ci] #364 (nubis-automation)
- Update CHANGELOG for v2.3.0 release [skip ci] #363 (nubis-automation)
- Update CHANGELOG for v2.3.0 release [skip ci] #361 (nubis-automation)
- Update CHANGELOG for v2.3.0 release [skip ci] #360 (nubis-automation)
- Add Redis CloudWatch metrics #358 (gozer)
- fix tyop #356 (gozer)
- Include additional metrics in eposition format #355 (tinnightcap)
- Reload configuration for services that support it, restart otherwise #352 (gozer)
- Categorize some other alerts as non-critical #350 (limed)
- Cron no longer categorized as critical #348 (limed)
- Make sure slack notifications show up on all alerts #347 (limed)
- Upgrade Prometheus, AlertManager, Traefik & blackbox_exporter #343 (gozer)
v2.2.0 (2018-04-06)
Closed issues:
- [TF] Cleanup for 0.11.x #334
- Rename 'sink' alert route to something more descriptive #330
- Tag project with platform tag #328
- [VarnishCacheHitRateTooLow] Don't alert if overall traffic is minimal #324
- Cleanup old pagerduty integration #322
- [monitoring] TimeSanity: 15 minutes too short of a time for NTP state to settle #320
- [pagerduty] Add more pagerduty integration #318
- Fixing prometheus rules #313
Merged pull requests:
- Fix #334 #335 (gozer)
- Cleanup alert routing #333 (limed)
- All these alerts are for fluentd-elasticsearch so we just tag them as nubis #332 (limed)
- Fixing up all rules and fixing up alert routes #331 (limed)
- Tag project as a platform component #329 (limed)
- Fixing broken rule #327 (limed)
- Limit alerting to Varnish if it's seeing at least a non-trivial amount of overall traffic #325 (gozer)
- Remove old pagerduty integration key #323 (limed)
- Give NTPd 30 minutes to stabilize #321 (gozer)
- Added support for pagerduty integration #319 (limed)
- Update nubis-travis #317 (tinnightcap)
v2.1.0 (2018-02-23)
Closed issues:
- Use base's swap support in favor of our own solution #308
- [pagerduty] handle <UNSET> as magical value #303
- Improve Monitoring Coverage #298
- [traefik] Upgrade to 1.4.6 #294
- [backup] S3 sync doesn't support empty files #292
- [rds] Scrape rds metrics from cloudwatch #284
- Tag v2.1.0 release #314
- Tag v2.1.0 release #310
- Tag v2.1.0 release #305
- Tag v2.1.0 release #300
- [cloudwatch] Add AWS/Lambda Throttles metric #163
- [dashboard] Update rules for consul_exporter 0.3.0 #161
- [cloudwatch] Scrape EFS metrics #148
Merged pull requests:
- Update CHANGELOG for v2.1.0 release [skip ci] #316 (nubis-automation)
- Update CHANGELOG for v2.1.0 release [skip ci] #315 (nubis-automation)
- Update CHANGELOG for v2.1.0 release [skip ci] #312 (nubis-automation)
- Update CHANGELOG for v2.1.0 release [skip ci] #311 (nubis-automation)
- Use base's support for swap now #309 (gozer)
- Update CHANGELOG for v2.1.0 release [skip ci] #307 (nubis-automation)
- Update CHANGELOG for v2.1.0 release [skip ci] #306 (nubis-automation)
- [pagerduty] Handle <UNSET> as magical disabled value #304 (gozer)
- Update CHANGELOG for v2.1.0 release [skip ci] #302 (nubis-automation)
- Update CHANGELOG for v2.1.0 release [skip ci] #301 (nubis-automation)
- Improve our monitoring coverage #299 (gozer)
- Initial sample documentation for Alerts #297 (gozer)
- Create .directory fake files to keep our empty directories from vanishing in S3 #293 (gozer)
- add missing liecnese #291 (gozer)
v2.0.4 (2017-12-08)
Implemented enhancements:
- [cloudwatch] Evaluate data we are scraping from cloudwatch #260
- [cloudwatch] Cloudwatch scraping not working for ASG #258
Fixed bugs:
Closed issues:
- [alerts] Group by project #287
- [consul] consul_catalog_service_node_healthy service label is now service_id #280
- Remove wildcard *.mon. DNS entry #277
- [varnish] Add Varnish Dashboard #272
- [prometheus] Probe /-/healthy for liveness #264
- Make instance_type tunable #252
- Detect when EFS mount does not come up on boot #207
- Tag v2.0.4 release #290
Merged pull requests:
- Fixing graph again #289 (limed)
- Add project to alert grouping #288 (gozer)
- Fix squid graph #286 (limed)
- Some dashboard updates #285 (limed)
- fix path #283 (gozer)
- Use Prometheus's own health check #282 (gozer)
- consul_catalog_service_node_healthy label is service_id now #281 (gozer)
- [needs-review] Graphs update #279 (limed)
- Remove useless wildcard dns for *.mon. #278 (gozer)
- Graph updates #276 (limed)
- [Centennial] #275 (gozer)
- Added trusted ip #274 (limed)
- Fix blackbox exporter to start up #273 (limed)
- Fixing configuration error, causing traefik to not startup #270 (limed)
- Remove dimension regex for ELB #268 (limed)
- Make instance type tunable #253 (gozer)
v2.0.3 (2017-11-06)
Closed issues:
- [traefik] Upgrade traefik to 1.4.1 #249
- [rules] Get rid of IpForwardingEnabledNonNAT #248
- [memory] Create some swap on startup #246
- Tag v2.0.3 release #265
- Tag v2.0.3 release #261
Merged pull requests:
- Merge v2.0.3 release into develop. [skip ci] #267 (tinnightcap)
- Update CHANGELOG for v2.0.3 release [skip ci] #266 (tinnightcap)
- Merge v2.0.3 release into develop. [skip ci] #263 (tinnightcap)
- Update CHANGELOG for v2.0.3 release [skip ci] #262 (tinnightcap)
- Fix cloudwatch scraping for ASG #259 (limed)
- Fixing consul prometheus alert rules #257 (limed)
- Patch v2.0.2 #256 (gozer)
- Upgrade to Traefik v1.4.1 #255 (gozer)
- Make swapfile a tunable #254 (gozer)
- Remove IpForwardingEnabledNonNAT alert #251 (limed)
- Efs scrape #250 (limed)
- Merge my changes back in #247 (limed)
v2.0.2 (2017-10-25)
Fixed bugs:
- Fix broken squid alerts #230
Closed issues:
- Scrape squid exporter metrics #228
- Tag v2.0.2 release #241
- Tag v2.0.2 release #237
- Tag v2.0.2 release #234
Merged pull requests:
- Merge v2.0.2 release into develop. [skip ci] #245 (tinnightcap)
- Update CHANGELOG for v2.0.2 release [skip ci] #244 (tinnightcap)
- Update CHANGELOG for v2.0.2 release [skip ci] #243 (tinnightcap)
- Merge v2.0.2 release into develop. [skip ci] #240 (tinnightcap)
- Update CHANGELOG for v2.0.2 release [skip ci] #239 (tinnightcap)
- Update CHANGELOG for v2.0.2 release [skip ci] #238 (tinnightcap)
- Merge v2.0.2 release into develop. [skip ci] #236 (tinnightcap)
- Update CHANGELOG for v2.0.2 release [skip ci] #235 (tinnightcap)
- Disable service-discovery from the inside #233 (gozer)
- Scrape AWS/Lambda throttles metric #232 (limed)
- Fixing broken prometheus alerts #231 (limed)
- Scrape squid exporter metrics #229 (limed)
v2.0.1 (2017-10-18)
Closed issues:
- Limit heap size to 75% of available RAM #219
- Cronjob alerts are not very specific #217
- Upgrade to Prometheus 1.8.0 #215
- Disable backups in favor of snapshots #142
- [backups] Enable some swap #110
- [backups] Make the in-progress page expose a backup metric of some sort ? #106
- [duplicity] Cleanup orphaned lockfiles #66
- Tag v2.0.1 release #225
- Tag v2.0.1 release #221
Merged pull requests:
- Merge v2.0.1 release into develop. [skip ci] #227 (tinnightcap)
- Update CHANGELOG for v2.0.1 release [skip ci] #226 (tinnightcap)
- Fix byte math #224 (gozer)
- Merge v2.0.1 release into develop. [skip ci] #223 (tinnightcap)
- Update CHANGELOG for v2.0.1 release [skip ci] #222 (tinnightcap)
- Keep Heap Size under 75% of available RAM #220 (gozer)
- Report specific cron jobs that are failing #218 (gozer)
- Upgrade to Prometheus 1.8.0 #216 (gozer)
v2.0.0 (2017-10-06)
Closed issues:
- [unicreds] Cleanup resources on destruction #190
- Use persistent storage #184
- [dashboard] Add ES grafana dashboard #182
- [traefik] Move traefik port to 9100 range #176
- [traefik] Configure traefik to expose metrics endpoint as well #172
- Scrape traefik metrics #171
- Update packages #168
- Upgrade traefik to v1.3.8 #166
- [grafana] Upgrade grafana to stable #160
- Switch from atlas to using terraform image search #158
- Add IAM permission #157
- [dashboard] Update grafana json file #150
- [dashboard] Add EFS dashboard #147
- Tag v2.0.0 release #212
- Tag v2.0.0 release #208
- Tag v2.0.0 release #203
- Tag v2.0.0 release #199
- Tag v2.0.0 release #195
- Tag v2.0.0 release #192
- Tag v2.0.0 release #186
Merged pull requests:
- Merge v2.0.0 release into develop. [skip ci] #214 (tinnightcap)
- Update CHANGELOG for v2.0.0 release [skip ci] #213 (tinnightcap)
- Merge v2.0.0 release into develop. [skip ci] #211 (tinnightcap)
- Update CHANGELOG for v2.0.0 release [skip ci] #210 (tinnightcap)
- fix small delete tyop #209 (gozer)
- Move EFS mount earlier in boot #206 (gozer)
- Merge v2.0.0 release into develop. [skip ci] #205 (tinnightcap)
- Update CHANGELOG for v2.0.0 release [skip ci] #204 (tinnightcap)
- fix tyop #202 (gozer)
- Merge v2.0.0 release into develop. [skip ci] #201 (tinnightcap)
- Update CHANGELOG for v2.0.0 release [skip ci] #200 (tinnightcap)
- Merge v2.0.0 release into develop. [skip ci] #198 (tinnightcap)
- Update CHANGELOG for v2.0.0 release [skip ci] #197 (tinnightcap)
- Fix count logic #196 (gozer)
- Merge v2.0.0 release into develop. [skip ci] #194 (tinnightcap)
- Update CHANGELOG for v2.0.0 release [skip ci] #193 (tinnightcap)
- Cleanup unicreds secrets #191 (gozer)
- Merge v2.0.0 release into develop. [skip ci] #189 (tinnightcap)
- Update CHANGELOG for v2.0.0 release [skip ci] #188 (tinnightcap)
- element fix #187 (gozer)
- Run Prometheus off persistent storage #185 (gozer)
- Add elasticsearch dashboard #183 (limed)
- Arena support #181 (gozer)
- Add platform status dashboard #180 (limed)
- Fixing autoscaling grafana graph #179 (limed)
- Added graphs #178 (limed)
- Switch trafik to port 9109 #177 (limed)
- Update nubis-travis to v1.4.2 #175 (tinnightcap)
- Expose prometheus metrics #174 (limed)
- Add traefik metric to scrape using prometheus #173 (limed)
- Update apache2-util and duplicity #170 (limed)
- Update grafana to version 4.5.1 #169 (limed)
- Update traefik to v1.3.8 #167 (limed)
- AMI search #165 (limed)
- Update nubis-travis to v1.4.0 #164 (tinnightcap)
- Migrate to mozilla sslack #162 (tinnightcap)
- Add ec2 describe instance policy #159 (limed)
- Updating autoscaling dashboard to v3 #156 (limed)
- Add efs dashboard #155 (limed)
- Update apache2-utils package version for release #152 (tinnightcap)
v1.5.1 (2017-08-18)
Closed issues:
- [security] Close up access to unneeded ports #145
- Disable external services routing #143
- [traefik] Upgrade to 1.3.4 #140
- Tag v1.5.1 release #151
Merged pull requests:
- Merge v1.5.1 release into develop. [skip ci] #154 (tinnightcap)
- Update CHANGELOG for v1.5.1 release [skip ci] #153 (tinnightcap)
- Close down unnecessary open tcp ports #146 (gozer)
- Don't route for *.mon... anymore, these services are exposed via SSO now #144 (gozer)
- Upgrade to traefik v1.3.4 #141 (gozer)
v1.5.0 (2017-06-24)
Closed issues:
- [grafana] Enable ProxyAuth #135
- Upgrade Prometheus to 1.7.1 and Alertmanager to 0.7.1 #131
- ALlow discovery of custom scraping targets #130
- [datadog] Remove support #128
- [blackbox] Upgrade to 0.5.0 #122
- [alertmanager] Upgrade to 0.6.2 #121
- [prometheus] Upgrade to 1.6.2 #120
- [traefik] Upgrade to v1.2.3 #119
- Tag v1.5.0 release #137
Merged pull requests:
- Merge v1.5.0 release into develop. [skip ci] #139 (tinnightcap)
- Update CHANGELOG for v1.5.0 release [skip ci] #138 (tinnightcap)
- Use OIDC_CLAIM_email as logged-in user in Grafana #136 (gozer)
- Increasing disk space for prometheus federators - Bug 1367263 #134 (kfferrando)
- Version upgrades #133 (gozer)
- Implement custom scrape target discovery via Consul service tags #132 (gozer)
- Remove support for DataDog #129 (gozer)
- Enable SSO for Prometheus #127 (gozer)
- Upgrade blackbox exporter to 0.5.0 #126 (gozer)
- Upgrade alertmanager to 0.6.2 #125 (gozer)
- Upgrade prometheus to v1.6.2 #124 (gozer)
- Upgrade Traefik to 1.2.3 #123 (gozer)
v1.4.2 (2017-05-05)
Closed issues:
- Add nubis/builder/artifacts/AMIs.json to .gitignore #111
- Tag v1.4.2 release #116
- Tag v1.4.2 release #113
Merged pull requests:
- Merge v1.4.2 release into develop. [skip ci] #118 (tinnightcap)
- Update CHANGELOG for v1.4.2 release [skip ci] #117 (tinnightcap)
- Update CHANGELOG for v1.4.2 release [skip ci] #114 (tinnightcap)
- Add nubis/builder/artifacts/AMIs.json to .gitignore #112 (gozer)
v1.4.1 (2017-04-11)
Closed issues:
- [backups] Make sure in progress page is parseable by Prometheus #104
- [typo] curl -retry instead of --retry in prometheus-onboot #102
- Tag v1.4.1 release #107
Merged pull requests:
- Merge v1.4.1 release into develop. [skip ci] #109 (tinnightcap)
- Update CHANGELOG for v1.4.1 release [skip ci] #108 (tinnightcap)
- Make sure the backup in progress landing page is parseable by prometheus #105 (gozer)
- Fix curl -retry tyop #103 (gozer)
v1.4.0 (2017-03-31)
Closed issues:
- Upgrade Traefik to v1.2.0 #85
- Add a configurable live_app label #83
- Allow sink alerts (apps) destination to be configured #81
- Add backup in progress landing page #79
- [labels] Add technical_owner and account_number labels #77
- Disable detailled monitoring #75
- [mysql] Discover mysqld-exporter #73
- Apache alerts are application alerts, don't consider them platform alerts #71
- Alert only for platform alerts, leave application alerting up to upstream federators #69
- [upgrade] Prometheus 1.5.2 #67
- [cron] Setup random delay on intensive jobs #61
- [cloudwatch] Filter metrics on VPCs #59
- [billing] Currently reporting in triplicate #58
- [bug] Can't scrape metrics less frequently than every 5 minutes #56
- [cloudwatch] Billing only exposed in us-east-1 #54
- Add support for ingesting cloudwatch metrics #53
- Upgrade blackbox exporter to 0.4.0 #50
- Convert storage type to gp2 #47
- Upgrade Prometheus to 1.5.0 #46
- Tag v1.4.0 release #99
- Tag v1.4.0 release #95
- Tag v1.4.0 release #91
- Tag v1.4.0 release #45
Merged pull requests:
- Merge v1.4.0 release into develop. [skip ci] #101 (tinnightcap)
- Update CHANGELOG for v1.4.0 release [skip ci] #100 (tinnightcap)
- Don't expose Consul to the internet, because #98 (gozer)
- Merge v1.4.0 release into develop. [skip ci] #97 (tinnightcap)
- Update CHANGELOG for v1.4.0 release [skip ci] #96 (tinnightcap)
- Fixups to pass Travis lint checks #94 (tinnightcap)
- Merge v1.4.0 release into develop. [skip ci] #93 (tinnightcap)
- Update CHANGELOG for v1.4.0 release [skip ci] #92 (tinnightcap)
- Fix typo, missing $ #90 (gozer)
- Merge v1.4.0 release into develop. [skip ci] #89 (tinnightcap)
- Update CHANGELOG for v1.4.0 release [skip ci] #88 (tinnightcap)
- Upgrade duplicity to 0.7.12-0ubuntu0ppa1276~ubuntu14.04.1 #87 (gozer)
- Upgade Traefik to v1.2.0 #86 (gozer)
- Add a configurable live_app label #84 (gozer)
- Allow notification configuration of alert sink (app alerts) #82 (gozer)
- Add a "Backup in progress..." landing page during backup runs #80 (gozer)
- Show technical_owner and account_id in federated metrics #78 (gozer)
- Disable detailled EC2 monitoring #76 (gozer)
- Detect and scrape mysqld-exporter instances #74 (gozer)
- Remove the platform=nubis tag from Apache alerts #72 (gozer)
- Ignore non-platform alerts #70 (gozer)
- Upgrade to Prometheus 1.5.2 #68 (gozer)
- Update builder artifacts for v1.4.0 release [skip ci] #65 (tinnightcap)
- Terraform 0.8 Upgrade #64 (gozer)
- Add 10 minute jitter to the backup jobs #63 (gozer)
- Only scrape our CloudWatch Billing metrifs from the admin VPC #62 (gozer)
- Filter all CloudWatch resources with an environment filter: #60 (gozer)
- Set our billing scrape interval to the maximum allowed of 5 minutes #57 (gozer)
- Create a separete cloudwatch_exporter_billing for just AWS/Billing metrics #55 (gozer)
- Upgrade Blackbox Exporter to 0.4.0 #52 (gozer)
- Upgrade Prometheus to 1.5.0 #51 (gozer)
- Fix alert description comment for accuracy #49 (gozer)
- Switch root storage to gp2(SSD) #48 (gozer)
v1.3.0 (2017-01-18)
Closed issues:
- Randomize cron::hourly, to avoid concurrent backup runs everywhere #43
- Apache Dashboard alert overlay is wrong #41
- Backup to S3 with duply/duplicity #38
- Lower default metrics retention #36
- Upgrade to Traefik 1.1.2 #29
- Provision the monitoring password from TF #20
- Expose nubis_sudo_groups and nubis_user_groups userdata #18
- Move API secrets/keys to nubis-secret #15
- [alertmanager] Add PagerDuty support #14
- [alertmanager] Send notification of resolved alerts #12
- [squid] Pull telemetry from snmp_exporter #10
- Fix small upstart tyop #7
- Increase instance size, t2.nano is probably too small #6
- [cleanup] Productionize Prometheus #3
- Backup regularly to S3 #2
- Set the Consul environments/<env>/global/node_exporter/config/enabled boolean on startup #1
- Tag v1.3.0 release #31
Merged pull requests:
- Little improvements for Prometheus Backups #44 (gozer)
- Limit displayed alerts to the firing ones #42 (gozer)
- Update builder artifacts for v1.4.0-dev release #40 (tinnightcap)
- Use duply & duplicity to drive backups to S3 #39 (gozer)
- lower metrics retention to 14 days #37 (gozer)
- Update CHANGELOG for v1.3.0 release #35 (tinnightcap)
- Update CHANGELOG for v1.3.0 release #34 (tinnightcap)
- Update builder artifacts for v1.3.0 release #33 (tinnightcap)
- Update CHANGELOG for v1.3.0 release #32 (tinnightcap)
- Upgrade to Traefik 1.1.2 (includes our reported fix) #30 (gozer)
- Fix Links #28 (tinnightcap)
- Add Documentation #27 (tinnightcap)
- update to nubis-travis v0.1.3 #26 (gozer)
- use nubis-cron wrapper #25 (gozer)
- Scrape ES exporters if present #24 (gozer)
- enable ES in Grafana #23 (gozer)
- tell Traefik about the admin password #22 (gozer)
- Massive Prometheus merge of current state #21 (gozer)
- Exposing ldap group userdata #19 (limed)
- fix tyop #17 (gozer)
- Add PagerDuty notification support #16 (gozer)
- Send resolved notification to both email and slack notifiers #13 (gozer)
- Add snmp target for proxies #11 (gozer)
- Bump size to t2.small #9 (gozer)
- Fix tyop #8 (gozer)
- Don't alert on Consul services down if consul itself is not healthy #5 (gozer)
- Refactor most of everything to ready for deployment with nubis-deploy #4 (gozer)
* This Change Log was automatically generated by github_changelog_generator