[Monitoring] Re-evaluate how `alertInstanceId` is used in rules #109151

chrisronline · 2021-08-18T18:01:47Z

When we first created Kibana rules in stack monitoring, we were under the assumption that we could create unique alertInstanceIds to maintain separate throttle periods for a single alert firing under separate circumstances (such as a disk usage alert firing on unique throttle periods based on the list of nodes affected). This does not work well with the fact that alertInstanceIds that do not schedule actions are forgotten.

Recent work changed how these work and actually helped address this by ensuring we always create the same alertInstanceIds each time the rule runes, but it looks like we might have issues if we aren't always scheduling actions for these (which looks to be the case).

I'm opening this issue because this wasn't understood when these rules were first created and folks on the @elastic/stack-monitoring team might want to reconsider how these rules are designed as a result. Feel free to close if this is now understood and handled appropriately.

The text was updated successfully, but these errors were encountered:

chrisronline added bug Fixes for quality problems that affect the customer experience Team:Monitoring Stack Monitoring team labels Aug 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Monitoring] Re-evaluate how `alertInstanceId` is used in rules #109151

[Monitoring] Re-evaluate how `alertInstanceId` is used in rules #109151

chrisronline commented Aug 18, 2021

[Monitoring] Re-evaluate how alertInstanceId is used in rules #109151

[Monitoring] Re-evaluate how alertInstanceId is used in rules #109151

Comments

chrisronline commented Aug 18, 2021

[Monitoring] Re-evaluate how `alertInstanceId` is used in rules #109151

[Monitoring] Re-evaluate how `alertInstanceId` is used in rules #109151