Flush inactive shards #31965

ywelsch · 2018-07-11T13:01:09Z

We currently have a logic that triggers a sync flush when a primary shard becomes inactive (after 5 minutes of no write activity on the primary shard). The goal of this is to ensure that sync flush markers are in place after a period of inactivity, so that a full cluster / rolling restart of nodes results in quick peer recoveries when there is no write activity on the respective shard. With operation-based recoveries, we also provide fast recoveries when there is write activity during node restarts. Operation-based recovery can, however, more frequently trigger situations where a replica shard becomes inactive, yet not all its searchable segments are flushed to disk, as the flushing is only triggered when a primary becomes inactive, and is not triggered by subsequent recoveries of replicas. This results in unnecessary extra storage (more translog generations + more Lucene segments) and possibly slows down future store- and peer-based recoveries. /cc: @jpountz

The following test illustrates the issue:

package org.elasticsearch.indices.flush;

import org.elasticsearch.action.admin.indices.segments.IndexShardSegments;
import org.elasticsearch.action.admin.indices.segments.ShardSegments;
import org.elasticsearch.client.Client;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.indices.IndexingMemoryController;
import org.elasticsearch.test.ESIntegTestCase;
import org.elasticsearch.test.InternalTestCluster;

import java.util.List;

import static org.hamcrest.Matchers.equalTo;

@ESIntegTestCase.ClusterScope(scope = ESIntegTestCase.Scope.TEST, numDataNodes = 0)
public class FlushOnInactivityIT extends ESIntegTestCase {

    public void testFlushOnInactivity() throws Exception {
        List<String> nodes = internalCluster().startNodes(2,
            Settings.builder().put(IndexingMemoryController.SHARD_INACTIVE_TIME_SETTING.getKey(), "3s").build());

        client().admin().indices().prepareCreate("test").get();

        ensureGreen("test");

        index("test", "_doc", "1");
        refresh("test"); // create segment
        index("test", "_doc", "2");
        refresh("test"); // create segment

        internalCluster().restartNode(nodes.get(0), new InternalTestCluster.RestartCallback() {

            public Settings onNodeStopped(String nodeName) throws Exception {
                assertBusySegmentsFlushed(client(nodes.get(1)), "test");
                return super.onNodeStopped(nodeName);
            }

        });

        ensureGreen("test");

        assertBusySegmentsFlushed(client(), "test");
    }

    private void assertBusySegmentsFlushed(Client client, String index) throws Exception {
        assertBusy(() -> {
            for (IndexShardSegments indexShardSegments : client.admin().indices().prepareSegments(index).get().getIndices().get(index)
                .getShards().values()) {
                for (ShardSegments shardSegments : indexShardSegments) {
                    assertThat(shardSegments.getNumberOfCommitted(), equalTo(shardSegments.getNumberOfSearch()));
                }
            }
        });
    }

}

The text was updated successfully, but these errors were encountered:

elasticmachine · 2018-07-11T13:01:10Z

Pinging @elastic/es-distributed

bleskes · 2018-08-01T13:22:25Z

We discussed it and decided that what we can do here depends on wether we can fully rely on ops based recovery to serve as a replacement for synced flush and fast full cluster restarts/node dropping off the cluster and joining back. Ops base recovery should do the job but since it's relatively new, we want to gather feedback on how well it performs. @zuketo has offered to collect to that 🙏

With peer recovery retention leases and sequence-number based replica allocation, a regular flush can speed up recovery as a synced-flush. With this change, we will flush instead of synced-flush when a shard becomes inactive. Closes #31965

ywelsch added >bug :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. team-discuss labels Jul 11, 2018

ywelsch removed the team-discuss label Oct 2, 2019

dnhatn self-assigned this Nov 13, 2019

dnhatn mentioned this issue Nov 14, 2019

Flush instead of synced-flush inactive shards #49126

Merged

dnhatn closed this as completed in #49126 Nov 22, 2019

codebrain mentioned this issue Apr 1, 2020

7.7.0 meta ticket (Part 2) elastic/elasticsearch-net#4533

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flush inactive shards #31965

Flush inactive shards #31965

ywelsch commented Jul 11, 2018

elasticmachine commented Jul 11, 2018

bleskes commented Aug 1, 2018

Flush inactive shards #31965

Flush inactive shards #31965

Comments

ywelsch commented Jul 11, 2018

elasticmachine commented Jul 11, 2018

bleskes commented Aug 1, 2018