[Alerting] Evaluate how we can reduce the number of scenarios in which rules are disabled due to an unrecoverable failure #116919
Labels
Feature:Alerting
Team:ResponseOps
Label for the ResponseOps team (formerly the Cases and Alerting teams)
We currently have several scenarios in which rules stop running or are auto-disabled by the framework.
We should document each and every case in which this happens, ensure we have telemetry to track how often this happens and, then evaluate what work would be needed (whether in our team or in other teams) to reduce the frequency of these cases.
The text was updated successfully, but these errors were encountered: