Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Alerting] Evaluate how we can reduce the number of scenarios in which rules are disabled due to an unrecoverable failure #116919

Closed
3 tasks
gmmorris opened this issue Nov 1, 2021 · 2 comments
Labels
Feature:Alerting Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)

Comments

@gmmorris
Copy link
Contributor

gmmorris commented Nov 1, 2021

We currently have several scenarios in which rules stop running or are auto-disabled by the framework.

We should document each and every case in which this happens, ensure we have telemetry to track how often this happens and, then evaluate what work would be needed (whether in our team or in other teams) to reduce the frequency of these cases.

  • Document every known scenario in which we a rule might stop running for any reason other than by the end-user
  • Ensure all the cases documented above have sufficient telemetry - we want to know how often this happens and where
  • Evaluate each scenario for some kind of feasible remediation that could reduce the likelihood of such a scenario and file a follow-up issue (even if this means we need upstream teams, such as Core and Elasticsearch, or downstream rule type implementors, to do work on their end).
@gmmorris gmmorris added Feature:Alerting Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) labels Nov 1, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

@mikecote
Copy link
Contributor

Closing in favour of #119650.

@kobelb kobelb added the needs-team Issues missing a team label label Jan 31, 2022
@botelastic botelastic bot removed the needs-team Issues missing a team label label Jan 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Alerting Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)
Projects
None yet
Development

No branches or pull requests

4 participants