-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add wait in replica recovery for allocation id to propagate on source node #15558
Conversation
❌ Gradle check result for fe22335: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
fe22335
to
dc1afb5
Compare
❌ Gradle check result for dc1afb5: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
server/src/main/java/org/opensearch/indices/recovery/RecoverySourceHandler.java
Show resolved
Hide resolved
.../src/test/java/org/opensearch/indices/recovery/LocalStorePeerRecoverySourceHandlerTests.java
Show resolved
Hide resolved
.../src/test/java/org/opensearch/indices/recovery/LocalStorePeerRecoverySourceHandlerTests.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments, pls address
❌ Gradle check result for 221243c: null Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #15558 +/- ##
============================================
- Coverage 71.95% 71.94% -0.02%
- Complexity 64192 64197 +5
============================================
Files 5270 5271 +1
Lines 300052 300181 +129
Branches 43368 43384 +16
============================================
+ Hits 215917 215963 +46
- Misses 66442 66493 +51
- Partials 17693 17725 +32 ☔ View full report in Codecov by Sentry. |
… node (#15558) * Add wait for target allocation id to appear Signed-off-by: Gaurav Bafna <gbbafna@amazon.com> * making waitForAssignment same Signed-off-by: Gaurav Bafna <gbbafna@amazon.com> * Add more test Signed-off-by: Gaurav Bafna <gbbafna@amazon.com> --------- Signed-off-by: Gaurav Bafna <gbbafna@amazon.com> (cherry picked from commit 3c6019d) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
… node (#15558) * Add wait for target allocation id to appear Signed-off-by: Gaurav Bafna <gbbafna@amazon.com> * making waitForAssignment same Signed-off-by: Gaurav Bafna <gbbafna@amazon.com> * Add more test Signed-off-by: Gaurav Bafna <gbbafna@amazon.com> --------- Signed-off-by: Gaurav Bafna <gbbafna@amazon.com> (cherry picked from commit 3c6019d) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
… node (opensearch-project#15558) * Add wait for target allocation id to appear Signed-off-by: Gaurav Bafna <gbbafna@amazon.com> * making waitForAssignment same Signed-off-by: Gaurav Bafna <gbbafna@amazon.com> * Add more test Signed-off-by: Gaurav Bafna <gbbafna@amazon.com> --------- Signed-off-by: Gaurav Bafna <gbbafna@amazon.com>
… node (#15558) * Add wait for target allocation id to appear Signed-off-by: Gaurav Bafna <gbbafna@amazon.com> * making waitForAssignment same Signed-off-by: Gaurav Bafna <gbbafna@amazon.com> * Add more test Signed-off-by: Gaurav Bafna <gbbafna@amazon.com> --------- Signed-off-by: Gaurav Bafna <gbbafna@amazon.com>
… node (opensearch-project#15558) * Add wait for target allocation id to appear Signed-off-by: Gaurav Bafna <gbbafna@amazon.com> * making waitForAssignment same Signed-off-by: Gaurav Bafna <gbbafna@amazon.com> * Add more test Signed-off-by: Gaurav Bafna <gbbafna@amazon.com> --------- Signed-off-by: Gaurav Bafna <gbbafna@amazon.com>
… node (opensearch-project#15558) * Add wait for target allocation id to appear Signed-off-by: Gaurav Bafna <gbbafna@amazon.com> * making waitForAssignment same Signed-off-by: Gaurav Bafna <gbbafna@amazon.com> * Add more test Signed-off-by: Gaurav Bafna <gbbafna@amazon.com> --------- Signed-off-by: Gaurav Bafna <gbbafna@amazon.com>
… node (opensearch-project#15558) * Add wait for target allocation id to appear Signed-off-by: Gaurav Bafna <gbbafna@amazon.com> * making waitForAssignment same Signed-off-by: Gaurav Bafna <gbbafna@amazon.com> * Add more test Signed-off-by: Gaurav Bafna <gbbafna@amazon.com> --------- Signed-off-by: Gaurav Bafna <gbbafna@amazon.com>
Description
When remote cluster state is enabled, cluster state propagation might get delayed . This cause replica recoveries to fail complaining that
source node does not have the shard listed in its state as allocated on the node
. This PR adds retry and backoff and gives some time for cluster state to get propagated and would prevent shards from failing due to same.Related Issues
Resolves #[Issue number to be closed when this PR is merged]
Check List
API changes companion pull request created, if applicable.Public documentation issue/PR created, if applicable.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.