[Alerting] Smarter retry interval for ES Connectivity errors #122390
Labels
estimate:small
Small Estimated Level of Effort
Feature:Alerting/RulesFramework
Issues related to the Alerting Rules Framework
Team:ResponseOps
Label for the ResponseOps team (formerly the Cases and Alerting teams)
When an alerting execution encounters an error, it reschedules itself for the next schedule interval. With this PR Kibana Core introduced a specific error type
EsUnavailableError
for transient ES connectivity errors. We should look for these errors in the alerting task runner and consider handling them slightly differently than other errors based on the configured schedule interval of a rule. If a rule is scheduled to run every 24 hours and it encounters a transient connectivity issue, it would make more sense to schedule its retry for a shorter interval, like 5 minutes instead of waiting another 24 hours for execution.Towards #119650
The text was updated successfully, but these errors were encountered: