-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metric alertmanager_alerts
report incorrect number of alerts
#2619
Comments
gotjosh
added a commit
to gotjosh/alertmanager
that referenced
this issue
Jun 15, 2021
The garbage collection process within the store is in charge of determining if an alert is resolved, deleting it, and then communicating this back to the callback set. When an alert was explicitly deleted, these were not being communicated back to the callback and caused the metric to report incorrect results. Fixes prometheus#2619
gotjosh
added a commit
to gotjosh/alertmanager
that referenced
this issue
Jun 15, 2021
The garbage collection process within the store is in charge of determining if an alert is resolved, deleting it, and then communicating this back to the callback set. When an alert was explicitly deleted, these were not being communicated back to the callback and caused the metric to report incorrect results. Fixes prometheus#2619
gotjosh
added a commit
to gotjosh/alertmanager
that referenced
this issue
Jun 15, 2021
The garbage collection process within the store is in charge of determining if an alert is resolved, deleting it, and then communicating this back to the callback set. When an alert was explicitly deleted, these were not being communicated back to the callback and caused the metric to report incorrect results. Fixes prometheus#2619
gotjosh
added a commit
to gotjosh/alertmanager
that referenced
this issue
Jun 15, 2021
The garbage collection process within the store is in charge of determining if an alert is resolved, deleting it, and then communicating this back to the callback set. When an alert was explicitly deleted, these were not being communicated back to the callback and caused the metric to report incorrect results. Fixes prometheus#2619 Signed-off-by: gotjosh <josue@grafana.com>
gotjosh
added a commit
to gotjosh/alertmanager
that referenced
this issue
Jun 15, 2021
The garbage collection process within the store is in charge of determining if an alert is resolved, deleting it, and then communicating this back to the callback set. When an alert was explicitly deleted, these were not being communicated back to the callback and caused the metric to report incorrect results. Fixes prometheus#2619 Signed-off-by: gotjosh <josue@grafana.com>
gotjosh
added a commit
to gotjosh/alertmanager
that referenced
this issue
Jun 15, 2021
The garbage collection process within the store is in charge of determining if an alert is resolved, deleting it, and then communicating this back to the callback set. When an alert was explicitly deleted, these were not being communicated back to the callback and caused the metric to report incorrect results. Fixes prometheus#2619 Signed-off-by: gotjosh <josue@grafana.com>
gotjosh
added a commit
to gotjosh/alertmanager
that referenced
this issue
Jun 15, 2021
The garbage collection process within the store is in charge of determining if an alert is resolved, deleting it, and then communicating this back to the callback set. When an alert was explicitly deleted, these were not being communicated back to the callback and caused the metric to report incorrect results. Fixes prometheus#2619 Signed-off-by: gotjosh <josue@grafana.com>
gotjosh
added a commit
to gotjosh/alertmanager
that referenced
this issue
Jun 15, 2021
The garbage collection process within the store is in charge of determining if an alert is resolved, deleting it, and then communicating this back to the callback set. When an alert was explicitly deleted, these were not being communicated back to the callback and caused the metric to report incorrect results. Fixes prometheus#2619 Signed-off-by: gotjosh <josue@grafana.com>
gotjosh
added a commit
to gotjosh/alertmanager
that referenced
this issue
Jun 15, 2022
Fixes prometheus#1439 and prometheus#2619. The previous metric is not _technically_ reporting incorrect results as the alerts _are_ still around and will be re-used if that same alert (equal fingerprint) is received before it is GCed. Therefore, I have kept the old metric under a new name `alertmanager_marked_alerts` and repurpose the current metric to match what the user sees in the UI.
gotjosh
added a commit
to gotjosh/alertmanager
that referenced
this issue
Jun 15, 2022
Fixes prometheus#1439 and prometheus#2619. The previous metric is not _technically_ reporting incorrect results as the alerts _are_ still around and will be re-used if that same alert (equal fingerprint) is received before it is GCed. Therefore, I have kept the old metric under a new name `alertmanager_marked_alerts` and repurpose the current metric to match what the user sees in the UI.
gotjosh
added a commit
to gotjosh/alertmanager
that referenced
this issue
Jun 15, 2022
Fixes prometheus#1439 and prometheus#2619. The previous metric is not _technically_ reporting incorrect results as the alerts _are_ still around and will be re-used if that same alert (equal fingerprint) is received before it is GCed. Therefore, I have kept the old metric under a new name `alertmanager_marked_alerts` and repurpose the current metric to match what the user sees in the UI. Signed-off-by: gotjosh <josue.abreu@gmail.com>
gotjosh
added a commit
to gotjosh/alertmanager
that referenced
this issue
Jun 16, 2022
Fixes prometheus#1439 and prometheus#2619. The previous metric is not _technically_ reporting incorrect results as the alerts _are_ still around and will be re-used if that same alert (equal fingerprint) is received before it is GCed. Therefore, I have kept the old metric under a new name `alertmanager_marked_alerts` and repurpose the current metric to match what the user sees in the UI. Signed-off-by: gotjosh <josue.abreu@gmail.com>
roidelapluie
pushed a commit
that referenced
this issue
Jun 16, 2022
…2943) * Alert metric reports different results to what the user sees via API Fixes #1439 and #2619. The previous metric is not _technically_ reporting incorrect results as the alerts _are_ still around and will be re-used if that same alert (equal fingerprint) is received before it is GCed. Therefore, I have kept the old metric under a new name `alertmanager_marked_alerts` and repurpose the current metric to match what the user sees in the UI. Signed-off-by: gotjosh <josue.abreu@gmail.com>
Fixed by #2943 |
qinxx108
pushed a commit
to qinxx108/alertmanager
that referenced
this issue
Dec 13, 2022
…rometheus#2943) * Alert metric reports different results to what the user sees via API Fixes prometheus#1439 and prometheus#2619. The previous metric is not _technically_ reporting incorrect results as the alerts _are_ still around and will be re-used if that same alert (equal fingerprint) is received before it is GCed. Therefore, I have kept the old metric under a new name `alertmanager_marked_alerts` and repurpose the current metric to match what the user sees in the UI. Signed-off-by: gotjosh <josue.abreu@gmail.com> Signed-off-by: Yijie Qin <qinyijie@amazon.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
What did you do?
Setup the alertmanager with clustering, sent alerts to all Alertmanagers.
What did you expect to see?
The
alertmanager_alerts
metric always converge across replicas.What did you see instead? Under which circumstances?
As time passes the alertmanager alerts metric drift apart although no difference in API results - they are consistent.
As you can see in the screenshot, as time progresses the
alertmanager_alerts
metric drift apart in the range of 10s per replica.Verifying the API responses across replicas, I can see that no inconsistent results are being provided:
Looking at the metric implementation, we can see that the
Marker
is responsible for giving us the information of the current alerts but the in-memory alerting store coming from memAlerts is responsible for setting the callback that will keep the marker synced with the current alerts held in theStore
. However, it is very apparent that theStore
would execute the callback ONLY when we garbage collect and NOT when we delete an alert directly.We believe the fix is to also execute the callback when an alert is directly deleted.
alertmanager/store/store.go
Lines 73 to 116 in 58169c1
The text was updated successfully, but these errors were encountered: