Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RAC] Alert ILM policy shouldn't delete old indices after rollover #111029

Closed
Tracked by #101016
mgiota opened this issue Sep 2, 2021 · 4 comments Β· Fixed by #111139
Closed
Tracked by #101016

[RAC] Alert ILM policy shouldn't delete old indices after rollover #111029

mgiota opened this issue Sep 2, 2021 · 4 comments Β· Fixed by #111139
Assignees
Labels
bug Fixes for quality problems that affect the customer experience Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services Theme: rac label obsolete v7.15.0 v8.0.0

Comments

@mgiota
Copy link
Contributor

mgiota commented Sep 2, 2021

This bug can be reproduced only after #110788 is merged

πŸ“ Summary

While testing #110519, we figured out that the default alert ILM policy deletes the old indices after rollover.

β”‚ info [o.e.x.i.IndexLifecycleTransition] [Panagiotas-MBP] moving index [.internal.alerts-observability.logs.alerts-default-000001] from [{"phase":"hot","action":"rollover","name":"set-indexing-complete"}] to [{"phase":"hot","action":"complete","name":"complete"}] in policy [.alerts-ilm-policy]
β”‚ info [o.e.x.i.IndexLifecycleTransition] [Panagiotas-MBP] moving index [.internal.alerts-observability.logs.alerts-default-000001] from [{"phase":"hot","action":"complete","name":"complete"}] to [{"phase":"delete","action":"delete","name":"wait-for-shard-history-leases"}] in policy [.alerts-ilm-policy]
β”‚ info [o.e.x.i.IndexLifecycleTransition] [Panagiotas-MBP] moving index [.internal.alerts-observability.logs.alerts-default-000001] from [{"phase":"delete","action":"delete","name":"wait-for-shard-history-leases"}] to [{"phase":"delete","action":"delete","name":"cleanup-snapshot"}] in policy [.alerts-ilm-policy]
β”‚ info [o.e.x.i.IndexLifecycleTransition] [Panagiotas-MBP] moving index [.internal.alerts-observability.logs.alerts-default-000001] from [{"phase":"delete","action":"delete","name":"cleanup-snapshot"}] to [{"phase":"delete","action":"delete","name":"delete"}] in policy [.alerts-ilm-policy]
β”‚ info [o.e.c.m.MetadataDeleteIndexService] [Panagiotas-MBP] [.internal.alerts-observability.logs.alerts-default-000001/caiZO6w2QI-qibbcddHCKQ] deleting index

Steps to reproduce

  • Create a new rule and generate some data that should trigger an alert
  • Verify one new alert is written in the correct index .internal.alerts-observability.logs.alerts-default-000001
  • Wait for the next trigger of the alert and verify that alert is updated and no new alert is created
  • In Devtools do a rollover POST .alerts-observability.logs.alerts-default/_rollover
  • Verify you can see two indices GET .alerts-observability.logs.alerts-default
  • Create a new rule and wait for the new alert to trigger
  • New alert is written in the new index .internal.alerts-observability.logs.alerts-default-000002
  • Old alert is written in the new index .internal.alerts-observability.logs.alerts-default-000002 -> it should be written in the old index
  • The old index is deleted GET .alerts-observability.logs.alerts-default -> it shouldn't

πŸ€” Some thoughts

I am putting some thoughts here. I am still wondering what triggered ILM to delete old indices. Was it actually the rollover or the fact that I created a new rule type after rollover? In Scenario 1 of this ticket I also did a rollover, but the old index was not deleted. It is worth reproducing both scenarios to exclude the possibility of something going wrong while testing.

Update

It turns out above scenario is not always reproducible. The ILM policy deletes old indices, but ES might not evaluate the policy immediately, which can make it hard to reproduce the error. This doesn't change the fact that the policy just deleted old indices unconditionally

Solution

We should update default ILM policy https://github.com/elastic/kibana/blob/master/x-pack/plugins/rule_registry/common/assets/lifecycle_policies/default_lifecycle_policy.ts and:

  • completely remove the delete phase and keep old indices forever or
  • specify the minimum age, after which the Elasticsearch rollover index enters the delete phase
@mgiota mgiota added the bug Fixes for quality problems that affect the customer experience label Sep 2, 2021
@botelastic botelastic bot added the needs-team Issues missing a team label label Sep 2, 2021
@mgiota mgiota added the Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services label Sep 2, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/logs-metrics-ui (Team:logs-metrics-ui)

@botelastic botelastic bot removed the needs-team Issues missing a team label label Sep 2, 2021
@mgiota mgiota added the Theme: rac label obsolete label Sep 2, 2021
@mgiota mgiota changed the title [RAC] Alert ILM policy shouldn't delete the old indices after rollover [RAC] Alert ILM policy shouldn't delete old indices after rollover Sep 2, 2021
@miltonhultgren
Copy link
Contributor

So that I understand, wasn't the goal of #110788 to make it so that the old alert is written to the old index?
And now the problem that remains is the deletion of the old index due to the default ILM policy?

@weltenwort
Copy link
Member

@miltonhultgren indeed, we discovered that the ILM policy unconditionally deletes everything beyond the most recent index

@mgiota
Copy link
Contributor Author

mgiota commented Sep 3, 2021

@miltonhultgren yep exactly!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services Theme: rac label obsolete v7.15.0 v8.0.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants