-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent ILM from spuriously rolling over (many) empty indices #86203
Comments
Pinging @elastic/es-data-management (Team:Data Management) |
Original scenario
A writer (metricbeat) is in place at normal load from 5/1 to 5/5, then it cuts out partway through 5/6. After that, because of daily rollovers, empty indices begin to accumulate (one per day). Solution 1: Don't rollover empty indices (#46161, #85054)
Pros:
Cons:
Solution 2: Delete empty indices after they've been rolled over (#73349)
Pros:
Cons:
Solution 3: Lazy rollover for datastreams
If datastreams rolled over lazily (on the next write) then no additional indices would be created backing the datastream in the above scenario, because there wouldn't be any writes to it. Pros:
Cons:
|
In the same neighborhood as this, there's a problem with policies that are only size based and have no time component (or in a The infinite empty rollover problem and the never-rolled-over "tailed off" data problem are not opposite sides of the same coin, though, but they are related. edit: Note, the workaround/solution to a never-rolled-over "tailed off" index is quite straightforward -- manually hit the |
Just wanted to open an issue around this, really great to have this fixed! Is there any magic available to get rid of all the empty indices that are already created? Update: Wrote my own quick python script and just deleted 1200 empty indices ... 🥳 |
Description
This has been brought up a few times in different forms, see #46161, #73349, #83039, #85054.
Any ILM policy with a
max_age
associated with therollover
action could trigger this scenario, but in order to talk about something concrete, I'll use metricbeat as an example (double emphasizing, though, this isn't unique to metricbeat, it's just the nature of the wayrollover
currently works with amax_age
).With a test 8.1.3 Elasticsearch cluster, I ran
metricbeat-8.1.2
for a few seconds and then stopped it, and thenmetricbeat-8.1.3
for a bit longer. The defaultmetricbeat
policy has rollover with"max_age" : "30d"
(30 days) but in order to illustrate this problem better, I've set that to"1m"
(1 minute) instead:After a few minutes, my cluster looks like this:
That is, for a little while, the first writer (metricbeat version 8.1.2) wrote documents, and then it stopped and was upgraded and replaced by the second writer (metricbeat version 8.1.3). Each of those writers uses a versioned datastream (
metricbeat-8.1.2
andmetricbeat-8.1.3
respectively).The problem is easy to see -- notice that we're getting a new empty (0 document)
.ds-metricbeat-8.1.2-[...]
index every minute, and that we'll keep accumulating them forever. ILM doesn't have any special logic around empty indices like this, i.e. empty indices are treated the same as non-empty indices as far as ILM is concerned.In this simple scenario, we know that the
metricbeat-8.1.2
datastream is done now, and can be retired. However, there's no particular point in time where Elasticsearch itself or some individual metricbeat process could know that. I'm using just one metricbeat writer, but I could be running one on each of N hosts. No one writer process in this scenario knows that it is special and should "turn off the lights when it's done".To further complicate matters, maybe I have a weekly batch process which will run on Sunday evening and write some logs after a long quiet period (and its logs are still being monitored by metricbeat version 8.1.2)-- when it does so we could end up with more data flowing into the current
metricbeat-8.1.2
write index. Let's call that the "sporadic writer" case. In that case, we'd end up with periods of no data flowing in and the accumulation of empty indices, followed by one or more non-empty indices, and then back to accumulating empty indices again.ILM doesn't know whether there's a sporadic writer out there or not, and ignorant of whether more documents will be coming one day, it dutifully executes the policy, rolling over the now defunct
metricbeat-8.1.2
datastream every minute and leaving a trail of empty.ds-metricbeat-8.1.2-[...]
indices in its wake.An additional note: my illustration here is datastream specific, but in the broad strokes this issue could also exist in a pre-datastream indexing strategy built around aliases. It would be most excellent if we were able to solve both the datastream and alias -based versions of this empty index problem (but reserving a degree of freedom, I don't think the solution must necessarily be precisely the same in both cases).
The text was updated successfully, but these errors were encountered: