Flush inactive shards #31965
Labels
>bug
:Distributed Indexing/Engine
Anything around managing Lucene and the Translog in an open shard.
:Distributed Indexing/Recovery
Anything around constructing a new shard, either from a local or a remote source.
We currently have a logic that triggers a sync flush when a primary shard becomes inactive (after 5 minutes of no write activity on the primary shard). The goal of this is to ensure that sync flush markers are in place after a period of inactivity, so that a full cluster / rolling restart of nodes results in quick peer recoveries when there is no write activity on the respective shard. With operation-based recoveries, we also provide fast recoveries when there is write activity during node restarts. Operation-based recovery can, however, more frequently trigger situations where a replica shard becomes inactive, yet not all its searchable segments are flushed to disk, as the flushing is only triggered when a primary becomes inactive, and is not triggered by subsequent recoveries of replicas. This results in unnecessary extra storage (more translog generations + more Lucene segments) and possibly slows down future store- and peer-based recoveries. /cc: @jpountz
The following test illustrates the issue:
The text was updated successfully, but these errors were encountered: