[Alerting] Evaluate how we can reduce the number of scenarios in which rules are disabled due to an unrecoverable failure #116919

gmmorris · 2021-11-01T09:52:49Z

We currently have several scenarios in which rules stop running or are auto-disabled by the framework.

We should document each and every case in which this happens, ensure we have telemetry to track how often this happens and, then evaluate what work would be needed (whether in our team or in other teams) to reduce the frequency of these cases.

Document every known scenario in which we a rule might stop running for any reason other than by the end-user
Ensure all the cases documented above have sufficient telemetry - we want to know how often this happens and where
Evaluate each scenario for some kind of feasible remediation that could reduce the likelihood of such a scenario and file a follow-up issue (even if this means we need upstream teams, such as Core and Elasticsearch, or downstream rule type implementors, to do work on their end).

elasticmachine · 2021-11-01T09:52:59Z

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

mikecote · 2021-11-24T18:09:57Z

Closing in favour of #119650.

gmmorris added Feature:Alerting Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) labels Nov 1, 2021

mikecote closed this as completed Nov 24, 2021

kobelb added the needs-team Issues missing a team label label Jan 31, 2022

botelastic bot removed the needs-team Issues missing a team label label Jan 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Alerting] Evaluate how we can reduce the number of scenarios in which rules are disabled due to an unrecoverable failure #116919

[Alerting] Evaluate how we can reduce the number of scenarios in which rules are disabled due to an unrecoverable failure #116919

gmmorris commented Nov 1, 2021

elasticmachine commented Nov 1, 2021

mikecote commented Nov 24, 2021

[Alerting] Evaluate how we can reduce the number of scenarios in which rules are disabled due to an unrecoverable failure #116919

[Alerting] Evaluate how we can reduce the number of scenarios in which rules are disabled due to an unrecoverable failure #116919

Comments

gmmorris commented Nov 1, 2021

elasticmachine commented Nov 1, 2021

mikecote commented Nov 24, 2021