-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
monitoring: make some prometheus alert threshold configurable via env.local #187
Conversation
….local Default values are previous hardcoded values. Different organizations with different policies and hardware can now adapt the alert threshold to their specific needs, decreasing false positive alerts. Too much false positive alerts will decrease the importance and usefulness of each alert. Alerts should not feel like spams. Fixes #66.
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/593/Result : success BIRDHOUSE_DEPLOY_BRANCH : configurable-alerting-threshold DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-91.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/520/NOTEBOOK TEST RESULTS |
bbe042c
to
190cea0
Compare
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/601/Result : success BIRDHOUSE_DEPLOY_BRANCH : configurable-alerting-threshold DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-36.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/534/NOTEBOOK TEST RESULTS |
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/602/Result : success BIRDHOUSE_DEPLOY_BRANCH : configurable-alerting-threshold DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-36.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/535/NOTEBOOK TEST RESULTS |
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/603/Result : success BIRDHOUSE_DEPLOY_BRANCH : configurable-alerting-threshold DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-36.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/536/NOTEBOOK TEST RESULTS |
export PROMETHEUS_HostUnusualDiskReadLatency_ALERT=100 # milli seconds | ||
export PROMETHEUS_HostUnusualDiskWriteLatency_ALERT=100 # milli seconds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo : milliseconds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hawk eye ! Will fix.
for: 5m | ||
labels: | ||
severity: warning | ||
annotations: | ||
summary: "Host swap is filling up (instance {{ $labels.instance }})" | ||
description: "Swap is filling up (>80%)\n VALUE = {{ $value }}\n LABELS: {{ $labels }}" | ||
description: "Swap is filling up (> $PROMETHEUS_HostSwapIsFillingUp_ALERT %)\n VALUE = {{ $value }}\n LABELS: {{ $labels }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unnecessary space before the %
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I inserted the space on purpose so I do not have to surround PROMETHEUS_HostSwapIsFillingUp_ALERT
with {}
. Same reason for all the extra spaces inserted elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to see that these threshold are now configurable, good job
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/604/Result : success BIRDHOUSE_DEPLOY_BRANCH : configurable-alerting-threshold DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-91.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/537/NOTEBOOK TEST RESULTS |
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/605/Result : failure BIRDHOUSE_DEPLOY_BRANCH : configurable-alerting-threshold DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https:// PAVICS-e2e-workflow-tests Pipeline ResultsTests URL :NOTEBOOK TEST RESULTS
|
Default values are previous hardcoded values so this is fully backward compatible
Different organizations with different policies and hardware can now
adapt the alert threshold to their specific needs, decreasing false
positive alerts.
Too much false positive alerts will decrease the importance and
usefulness of each alert. Alerts should not feel like spams.
Not all alerts are changed to make configurable. I've only changed those that I think are most likely to need customization or that logically should be configurable.
Fixes #66.