-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use peer recovery retention leases for indices without soft-deletes #50351
Conversation
This reverts commit 781a4b825a4d429cb07aeb8b51975e6b8573a8a9.
Pinging @elastic/es-distributed (:Distributed/Recovery) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good. I've left one question about strengthening the assertions in the tests.
continue; | ||
} | ||
assertNotNull(retentionLeases); | ||
for (Map<String, ?> retentionLease : retentionLeases) { | ||
if (((String) retentionLease.get("id")).startsWith("peer_recovery/")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this does not require that there is always a peer recovery retention lease. Should we require finding such a lease, and for the right node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++. Adjusted in 6071abd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -1278,7 +1278,7 @@ public void testOperationBasedRecovery() throws Exception { | |||
} | |||
} | |||
flush(index, true); | |||
ensurePeerRecoveryRetentionLeasesRenewedAndSynced(index); | |||
ensurePeerRecoveryRetentionLeasesRenewedAndSynced(index, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we set alwaysExists
to true
if minimumNodeVersion()
is on or after 7.6 (after backport) (here as well as in other places)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will do.
Thanks Yannick! |
…50351) Today, the replica allocator uses peer recovery retention leases to select the best-matched copies when allocating replicas of indices with soft-deletes. We can employ this mechanism for indices without soft-deletes because the retaining sequence number of a PRRL is the persisted global checkpoint (plus one) of that copy. If the primary and replica have the same retaining sequence number, then we should be able to perform a noop recovery. The reason is that we must be retaining translog up to the local checkpoint of the safe commit, which is at most the global checkpoint of either copy). The only limitation is that we might not cancel ongoing file-based recoveries with PRRLs for noop recoveries. We can't make the translog retention policy comply with PRRLs. We also have this problem with soft-deletes if a PRRL is about to expire. Relates #45136 Relates #46959
…lastic#50351) Today, the replica allocator uses peer recovery retention leases to select the best-matched copies when allocating replicas of indices with soft-deletes. We can employ this mechanism for indices without soft-deletes because the retaining sequence number of a PRRL is the persisted global checkpoint (plus one) of that copy. If the primary and replica have the same retaining sequence number, then we should be able to perform a noop recovery. The reason is that we must be retaining translog up to the local checkpoint of the safe commit, which is at most the global checkpoint of either copy). The only limitation is that we might not cancel ongoing file-based recoveries with PRRLs for noop recoveries. We can't make the translog retention policy comply with PRRLs. We also have this problem with soft-deletes if a PRRL is about to expire. Relates elastic#45136 Relates elastic#46959
testCancelRecoveryDuringPhase1 uses a mock of IndexShard, which can't create retention leases. We need to stub method createRetentionLease. Relates elastic#50351 Closes elastic#50424
…astic#50486) We forgot to establish peer recovery retention leases for relocating primaries without soft-deletes. Relates elastic#50351
Today, the replica allocator uses peer recovery retention leases to select the best-matched copies when allocating replicas of indices with soft-deletes. We can employ this mechanism for indices without soft-deletes because the retaining sequence number of a PRRL is the persisted global checkpoint (plus one) of that copy. If the primary and replica have the same retaining sequence number, then we should be able to perform a noop recovery. The reason is that we must be retaining translog up to the local checkpoint of the safe commit, which is at most the global checkpoint of either copy). The only limitation is that we might not cancel ongoing file-based recoveries with PRRLs for noop recoveries. We can't make the translog retention policy comply with PRRLs. We also have this problem with soft-deletes if a PRRL is about to expire.
A nice side-effect of this is that we can turn off the translog retention once all shards started. However, I prefer leaving translog disconnect to PRRLs.
Relates #45136
Relates #46959