-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Backport 2.x] Fix bug where retries within RemoteStoreRefreshListener cause infos/checkpoint mismatch #10760
Conversation
…heckpoint mismatch (#10655) * Fix bug where retries within RemoteStoreRefreshListener cause mismatch between ReplicationCheckpoint and uploaded SegmentInfos. Retries within RemoteStoreRefreshListener run outside of the refresh thread. This means that concurrent refreshes may occur during syncSegments execution updating the on-reader SegmentInfos. A shard's latest ReplicationCheckpoint is computed and set in a refresh listener, but it is not guaranteed the listener has run before the retry fetches the infos or checkpoint independently. This fix ensures the listener recomputes the checkpoint while fetching the SegmentInfos. This change also ensures that we only recompute the checkpoint when necessary because it comes with an IO cost to compute StoreFileMetadata. Signed-off-by: Marc Handalian <handalm@amazon.com> Update refresh listener to recompute checkpoint from latest infos snapshot. Signed-off-by: Marc Handalian <handalm@amazon.com> Fix broken test case by comparing segments gen Signed-off-by: Marc Handalian <handalm@amazon.com> spotless Signed-off-by: Marc Handalian <handalm@amazon.com> Fix RemoteStoreRefreshListener tests Signed-off-by: Marc Handalian <handalm@amazon.com> * add extra log Signed-off-by: Marc Handalian <handalm@amazon.com> --------- Signed-off-by: Marc Handalian <handalm@amazon.com> (cherry picked from commit e389a09) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Compatibility status:Checks if related components are compatible with change 23d691d Incompatible componentsIncompatible components: [https://github.com/opensearch-project/security-analytics.git] Skipped componentsCompatible componentsCompatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/performance-analyzer-rca.git] |
Gradle Check (Jenkins) Run Completed with:
|
|
Gradle Check (Jenkins) Run Completed with:
|
Codecov Report
@@ Coverage Diff @@
## 2.x #10760 +/- ##
============================================
- Coverage 70.98% 70.77% -0.21%
+ Complexity 58685 58575 -110
============================================
Files 4839 4839
Lines 277012 277118 +106
Branches 40639 40655 +16
============================================
- Hits 196629 196138 -491
- Misses 63664 64213 +549
- Partials 16719 16767 +48
|
Backport e389a09 from #10655.