Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Metrics Alerts] Send a recovery alert when an alert returns to an OK state #64351

Closed
Zacqary opened this issue Apr 23, 2020 · 4 comments · Fixed by #83687
Closed

[Metrics Alerts] Send a recovery alert when an alert returns to an OK state #64351

Zacqary opened this issue Apr 23, 2020 · 4 comments · Fixed by #83687
Assignees
Labels
enhancement New value added to drive a business result Feature:Alerting Feature:Metrics UI Metrics UI feature Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services

Comments

@Zacqary
Copy link
Contributor

Zacqary commented Apr 23, 2020

⚠️ Depends on #49405

As brought up in #64080

Send an alert like

My Alert - elasticsearch-master-0 is in a state of OK [Recovered]

Because system.load.1 is no longer greater than a threshold of 1.0 
(current value is 0.95); system.mem.usage is no longer greater than
a threshold of 85 (current value is 76)

but only once when the alert goes green. Don't renotify after.

@Zacqary Zacqary added enhancement New value added to drive a business result Feature:Alerting Feature:Metrics UI Metrics UI feature v8.0.0 Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services v7.8.0 labels Apr 23, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/logs-metrics-ui (Team:logs-metrics-ui)

@Zacqary
Copy link
Contributor Author

Zacqary commented May 4, 2020

Related: #49405

@Zacqary
Copy link
Contributor Author

Zacqary commented May 5, 2020

I'm not sure if this will work very well with renotify periods until full support for recovery action groups is added to the alerting plugin. We have to consolidate all of our alert fired/alert recovered/etc. messages within a single default action group, and the renotify period suppresses messages based on the action group ID, not on the alert state or any other metadata. This means recovery messages may get lost if the state changes between recovery intervals.

@Zacqary
Copy link
Contributor Author

Zacqary commented May 11, 2020

Marking as blocked, we're going to wait on the alerting plugin to implement a native solution before proceeding (#49405)

@sgrodzicki sgrodzicki added this to the Metrics UI 7.9 milestone May 25, 2020
@sgrodzicki sgrodzicki removed this from the Metrics UI 7.9 milestone Jul 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New value added to drive a business result Feature:Alerting Feature:Metrics UI Metrics UI feature Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants