-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure ILM policies run safely on leader indices #38140
Ensure ILM policies run safely on leader indices #38140
Conversation
Pinging @elastic/es-core-features |
This will also need to be updated after #37951 is merged, as that PR contains changes which will cause this one to fail to build. The changes should be pretty minor, though. |
@@ -368,6 +326,70 @@ public void testUnfollowInjectedBeforeShrink() throws Exception { | |||
} | |||
} | |||
|
|||
public void testCannotShrinkLeaderIndex() throws Exception { | |||
String indexName = "shrink-leader-test"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chatted about this test @gwbrown. The stopping of ilm in this test can be problematic for other tests that run after it. First all tests are executed as part of the leader cluster test task and then as part of the follower cluster test task (this while leader test cluster is running). The other which the test methods are executed is randomized by the test framework.
In order to avoid this problem the entire test can be ran just against the follow cluster test task, like is done below.
Also what also is possible is to remove the stop and start ilm calls and in instead setup the leader index first in leader cluster, execute put follow in follower cluster and then create the leader ilm policy.
if ("leader".equals(targetCluster)) {
return;
} else if ("follow".equals(targetCluster) == false) {
throw new AssertionError("unexpected target cluster [" + targetCluster + "]");
}
String indexName = "shrink-leader-test";
String shrunkenIndexName = "shrink-" + indexName;
String policyName = "shrink-leader-test-policy";
try (RestClient leaderClient = buildLeaderClient()) {
putShrinkOnlyPolicy(leaderClient, policyName);
// Stop ILM to let us set up the follower before the policy swings into action and shrinks the index
stopILM(client());
assertBusy(() -> assertEquals("STOPPED", ilmStatus(client())));
Settings indexSettings = Settings.builder()
.put("index.soft_deletes.enabled", true)
.put("index.number_of_shards", 2)
.put("index.number_of_replicas", 0)
.put("index.lifecycle.name", policyName)
.build();
createIndex(indexName, indexSettings, "", "", leaderClient); // <-- method does not yet exist
ensureGreen(indexName, leaderClient); // <-- method does not yet exist
// Policy with the same name must exist in follower cluster too:
putUnfollowOnlyPolicy(client(), policyName);
followIndex(indexName, indexName);
ensureGreen(indexName);
startILM(leaderClient);
index(leaderClient, indexName, "1");
assertDocumentExists(leaderClient, indexName, "1");
assertBusy(() -> {
assertDocumentExists(client(), indexName, "1");
// Sanity check that following_index setting has been set, so that we can verify later that this setting has been unset:
assertThat(getIndexSetting(client(), indexName, "index.xpack.ccr.following_index"), equalTo("true"));
// We should get into a state with these policies where both leader and followers are waiting on each other
assertILMPolicy(leaderClient, indexName, policyName, "warm", "shrink", "wait-for-shard-history-leases");
assertILMPolicy(client(), indexName, policyName, "hot", "unfollow", "wait-for-indexing-complete");
});
// Manually set this to kick the process
updateIndexSettings(leaderClient, indexName, Settings.builder()
.put("index.lifecycle.indexing_complete", true)
.build()
);
assertBusy(() -> {
// The shrunken index should now be created on the leader...
Response shrunkenIndexExistsResponse = leaderClient.performRequest(new Request("HEAD", "/" + shrunkenIndexName));
assertEquals(RestStatus.OK.getStatus(), shrunkenIndexExistsResponse.getStatusLine().getStatusCode());
// And both of these should now finish their policies
assertILMPolicy(leaderClient, shrunkenIndexName, policyName, "completed");
assertILMPolicy(client(), indexName, policyName, "completed");
});
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left three really minor comments, otherwise LGTM
import java.util.Arrays; | ||
import java.util.Objects; | ||
|
||
public class WaitForNoFollowersStep extends AsyncWaitStep { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add javadoc please?
private final String message; | ||
|
||
Info() { | ||
this.message = "this index is a leader index; waiting for all following indices to cease following before proceeding"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can probably be a static final field since it's hardcoded?
public boolean equals(Object o) { | ||
if (this == o) return true; | ||
if (o == null || getClass() != o.getClass()) return false; | ||
Info info = (Info) o; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be return true;
since the messages are never different?
@elasticmachine run elasticsearch-ci/default-distro |
* elastic/master: (54 commits) Introduce retention leases versioning (elastic#37951) Correctly disable tests for FIPS JVMs (elastic#38214) AwaitsFix testAbortedSnapshotDuringInitDoesNotStart (elastic#38227) Preserve ILM operation mode when creating new lifecycles (elastic#38134) Enable trace log in FollowerFailOverIT (elastic#38148) SnapshotShardsService Simplifications (elastic#38025) Default include_type_name to false in the yml test harness. (elastic#38058) Disable bwc preparing to backport of#37977, elastic#37857 and elastic#37872 (elastic#38126) Adding ml_settings entry to HLRC and Docs for deprecation_info (elastic#38118) Replace awaitBusy with assertBusy in atLeastDocsIndexed (elastic#38190) Adjust SearchRequest version checks (elastic#38181) AwaitsFix testClientSucceedsWithVerificationDisabled (elastic#38213) Zen2ify RareClusterStateIT (elastic#38184) ML: Fix error race condition on stop _all datafeeds and close _all jobs (elastic#38113) AwaitsFix PUT mapping with _doc on an index that has types (elastic#38204) Allow built-in monitoring_user role to call GET _xpack API (elastic#38060) Update geo_shape docs to include unsupported features (elastic#38138) [ML] Remove "8" prefixes from file structure finder timestamp formats (elastic#38016) Disable bwc tests while backporting elastic#38104 (elastic#38182) Enable TLSv1.3 by default for JDKs with support (elastic#38103) ...
Thanks for the reviews and thanks for handling the merge conflict @jasontedor! |
Adds a Step to the Shrink and Delete actions which prevents those actions from running on a leader index - all follower indices must first unfollow the leader index before these actions can run. This prevents the loss of history before follower indices are ready, which might otherwise result in the loss of data.
Adds a Step to the Shrink and Delete actions which prevents those actions from running on a leader index - all follower indices must first unfollow the leader index before these actions can run. This prevents the loss of history before follower indices are ready, which might otherwise result in the loss of data.
…round-sync-6.x * elastic/6.x: Fix testRestoreIncreasesPrimaryTerms on 6.x (elastic#38314) SQL: Remove exceptions from Analyzer (elastic#38260) (elastic#38287) SQL: Move metrics tracking inside PlanExecutor (elastic#38259) (elastic#38288) Backport of elastic#38311: Move TokenService to seqno powered cas Handle scheduler exceptions (elastic#38183) Mute MlMigrationFullClusterRestartIT#testMigration (elastic#38316) 6.x Backport of elastic#38278: Move ML Optimistic Concurrency Control to Seq No Cleanup construction of interceptors (elastic#38296) Throw if two inner_hits have the same name (elastic#37645) (elastic#38194) AsyncTwoPhaseIndexerTests race condition fixed elastic#38195 Backport#37830 Enable SSL in reindex with security QA tests (elastic#38293) Ensure ILM policies run safely on leader indices (elastic#38140) Introduce ssl settings to reindex from remote (elastic#38292) Fix ordering problem in add or renew lease test (elastic#38281) Mute ReplicationTrackerRetentionLeaseTests#testAddOrRenewRetentionLease (elastic#38276) Fix NPE in Logfile Audit Filter (elastic#38120) (elastic#38271) Enable trace log in FollowerFailOverIT (elastic#38148) SQL: Generate relevant error message when grouping functions are not used in GROUP BY (elastic#38017)
Adds a Step to the Shrink and Delete actions which prevents those
actions from running on a leader index - all follower indices must first
unfollow the leader index before these actions can run. This prevents
the loss of history before follower indices are ready, which might
otherwise result in the loss of data.
Relates to #34648
There will likely be a change to the integration test, as I'm still seeing occasional failures in it, which don't appear to be "real" failures (just hitting timeouts sometimes), but the code is otherwise ready to review.