-
Notifications
You must be signed in to change notification settings - Fork 491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bosun sending notifications for closed and inactive alerts #1690
Comments
Can you share the configuration of the notification declaration? (Don't need real URLs or Emails), just want to see if there are chains etc... |
Thanks for the quick reply @kylebrandt Here's the complete config:
|
Which notification(s) are the ones that keep triggering after acknowledgement? |
So for "foo.adaptor.write.rate.too.low" foo-service3-pagerduty (every 30 minutes) The original incident was a |
Oh, and Bosun version? |
Here's the complete log from the beginning of the alert instance, and the repeat triggers (real log, sed'ed and grepped to mask stuff) |
I can concur. I'm observing the same problem on one of our Bosun instances we upgraded to 0.5.0-rc2 yesterday. E-Mails are sent for non-existing alarm. OpenTSDB version is 2.2. @kylebrandt If I can assist in tracking this down by suppling logs or configurations, please let me know. |
For now, we are resetting bosun state (i.e. clearing the redis DB and state file) every 60 minutes in a cron job. |
I'm having a hard time reproducing this. Using a trimmed down config from what you provided:
The notifications will print continuously at the given intervals until I ack it, and then they all stop. I can verify directly in redis that they get purged from the pending notifications bucket as soon as I ack them. I check that with |
@captncraig I think I have some alerts currently in this state. They're ack'ed, but still keeps sending mails. Is there anything I can check to help you track down the problem? |
After upgrading several of our Bosun installations including the migration of the old state file to ledis, we always observed the same behavior: For a few days we see repeated notifications for already acked or even closed alarms. This behavior fades away after a few days. |
@vitorboschi, I pushed a small patch to a "stopNotifications" branch. Would you mind running that and see if it clears the problem up. It is essentially detecting this scenario at the time notifications are sent and skipping it if it is closed or acked at that point. Not an optimal fix, but I hope it will clear up the issue. |
I will try it and report back my results |
I saw this when I was pointing all my bosun hosts to use localhost for redisHost. So I made changes for them, I re-configured my redis to have a master/slave (redis sentinel). Then point redis to a VIP instead of localhost. So far, so good. |
@erwindwight we were not even using redis when this issue was faced. we were just using the default ledis db. |
@captncraig trying out a build from stopNotifications branch. but isn't that a post-facto check that you've put in? the underlying problem might be creating more such (possibly concurrency) issues. |
@captncraig The patch works and prevents resending notifications for old alarms. Do you have an idea why this happens in the first place? |
I've been using the patch for some time and it looks like everything is fine. I didn't spotted any regression either |
Found something promising. Looks like this occurs when you close an alert without acking it first. The old notification chain does not get cleared. Patch incoming. |
Clearing notifications on all actions, not just ack. Fixes #1690
We have a very simple rule file, with 3
notifications
(http post to PD and slack, and email) and a bunch ofalert
rules which trigger them. We are facing a weird issue wherein, the following happens:alert
triggers, sendsnotifications
acks
the alertinactive
closes
the alertto explain it through logs, quite literally this is what we're seeing:
The text was updated successfully, but these errors were encountered: