Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compact index when closing #42445

Closed
dnhatn opened this issue May 23, 2019 · 10 comments
Closed

Compact index when closing #42445

dnhatn opened this issue May 23, 2019 · 10 comments
Assignees
Labels
:Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can.

Comments

@dnhatn
Copy link
Member

dnhatn commented May 23, 2019

A spin-off from #33888.

Should we trim/clean translog and force-merge when closing an index? These actions can be done via the verifying-before-close step. Another option is to integrate these actions with ILM.

Should we also enforce a single commit when closing? This property does not always hold for follower indices and primaries with ongoing peer recoveries.

Relates #33888

@dnhatn dnhatn added :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. :Data Management/ILM+SLM Index and Snapshot lifecycle management labels May 23, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features

@dakrone
Copy link
Member

dakrone commented May 23, 2019

Should we trim/clean translog and force-merge when closing an index? These actions can be done via the verifying-before-close step.

I definitely don't think we should do this by default. Imagine having a cluster that's experiencing heavy load, so badly that you need to close some indices to bring it back from the edge. If we were to force-merge when an index was closed, that introduces more load into the cluster (not to mention a potential wait time depending on the number of segments we force merge to) when it is in a precarious state.

@DaveCTurner
Copy link
Contributor

DaveCTurner commented May 29, 2019

We discussed this today as a team and agreed with @dakrone about not force-merging while closing an index because this would make closing an index far too heavyweight an operation. The discussion also touched again on the idea of being able to force-merge a read-only index (#41624):

  • one might want to force-merge a closed index by re-opening it, force-merging it, and closing it again, but today we can't guarantee that nothing else is indexed into it during that process.

  • ILM may also want to be able to block writes to an index before force-merging it.

However, we concluded that trimming the translog on close is a reasonable thing to do:

  • we'd like a closed index to consume as few resources as possible, and a translog can consume considerable disk space

  • trimming the translog is a fairly lightweight operation

  • the stats on a closed index make it look like there is no translog anyway, even if it is present and consuming disk space

  • if a closed shard copy is moved elsewhere then the resulting copy has no translog

  • keeping the translog around might occasionally help recover an out-of-sync replica with an operations-based recovery when the index is re-opened, but this is a pretty rare situation and not one we felt to be important

We noted that it may not be trivial to trim the translog at close, because there may be something still holding onto the generations that we want to trim (e.g. an ongoing peer recovery).

@dakrone
Copy link
Member

dakrone commented May 31, 2019

ILM may also want to be able to block writes to an index before force-merging it.

ILM does this automatically currently.

@DaveCTurner I think after the discussions this can drop the ILM tag, is that right? Since we would automatically trim the translog on close?

@DaveCTurner DaveCTurner removed the :Data Management/ILM+SLM Index and Snapshot lifecycle management label May 31, 2019
@DaveCTurner
Copy link
Contributor

Yes, I think so. ILM's force-merge action sets index.blocks.write first (but not index.blocks.read_only).

@dnhatn dnhatn removed their assignment Jun 11, 2019
@s1monw
Copy link
Contributor

s1monw commented Jun 11, 2019

keeping the translog around might occasionally help recover an out-of-sync replica with an operations-based recovery when the index is re-opened, but this is a pretty rare situation and not one we felt to be important

just for clarification, we don't use the translog anymore for primary replica sync, right? do you mean something else?

@DaveCTurner
Copy link
Contributor

Today we read operations from the translog on the primary during peer recovery. For a while we had moved to reading them from Lucene but we reverted that in #38904.

tlrx added a commit that referenced this issue Jun 28, 2019
Today when an index is closed all its shards are forced flushed 
but the translog files are left around. As explained in #42445 
we'd like to trim the translog for closed indices in order to 
consume less disk space. This commit reuses the existing 
AsyncTrimTranslogTask task and reenables it for closed indices.

At the time the task is executed, we should have the guarantee 
that nothing holds the translog files that are going to be removed. 
It also leaves a short period of time (10 min) during which translog 
files of a recently closed index are still present on disk. This could
 also help in some cases where the closed index is reopened 
shortly after being closed (in order to update an index setting 
for example).

Relates to #42445
tlrx added a commit that referenced this issue Jun 28, 2019
Today when an index is closed all its shards are forced flushed
but the translog files are left around. As explained in #42445
we'd like to trim the translog for closed indices in order to
consume less disk space. This commit reuses the existing
AsyncTrimTranslogTask task and reenables it for closed indices.

At the time the task is executed, we should have the guarantee
that nothing holds the translog files that are going to be removed.
It also leaves a short period of time (10 min) during which translog
files of a recently closed index are still present on disk. This could
 also help in some cases where the closed index is reopened
shortly after being closed (in order to update an index setting
for example).

Relates to #42445
@ywelsch
Copy link
Contributor

ywelsch commented Jul 1, 2019

@tlrx can this be closed now?

@tlrx
Copy link
Member

tlrx commented Jul 3, 2019

Translog files are now trimmed for closed indices (#43156) and translog stats are now correctly exposed (#43752, #43825).

@tlrx tlrx closed this as completed Jul 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can.
Projects
None yet
Development

No branches or pull requests

8 participants