Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Segment Replication] Update getLatestReplicationCheckpoint for closed indices #4532

Closed

Conversation

dreamer-89
Copy link
Member

@dreamer-89 dreamer-89 commented Sep 16, 2022

Signed-off-by: Suraj Singh surajrider@gmail.com

Description

This change updates the IndexShard.getLatestReplicationCheckpoint to return empty checkpoint for closed indices. This change is needed as when indices are closed, shard info is still shared among nodes via transport actions as mentioned in #3823 (comment)

Issues Related

#3823

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…ndices

Signed-off-by: Suraj Singh <surajrider@gmail.com>
@dreamer-89 dreamer-89 requested review from a team and reta as code owners September 16, 2022 03:11
@dreamer-89 dreamer-89 changed the title [Segment Replicaiton] Return empty replication checkpoint on closed indices [Segment Replication] Return empty replication checkpoint on closed indices Sep 16, 2022
Signed-off-by: Suraj Singh <surajrider@gmail.com>
@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@@ -1405,7 +1405,8 @@ public ReplicationCheckpoint getLatestReplicationCheckpoint() {
if (indexSettings.isSegRepEnabled() == false) {
return null;
}
if (getEngineOrNull() == null) {
final IndexMetadata indexMetadata = indexSettings.getIndexMetadata();
if (getEngineOrNull() == null || (indexMetadata != null && indexMetadata.getState() == IndexMetadata.State.CLOSE)) {
return ReplicationCheckpoint.empty(shardId);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we have a use case for actually using these checkpoints when the shard is closed, but even so I think we may want to change this logic to actually read and return the current checkpoint instead of returning empty. A shard could be closed but have a local commit. We can check for the presence of a segments_n and compute the checkpoint from the store with store.readLastCommittedSegmentsInfo. Where the processed seqNo can be read from user data.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mch2 for the review and calling this out. Updating.

@dreamer-89 dreamer-89 force-pushed the update_getCheckpoint branch 2 times, most recently from 9ab5caf to 2759e47 Compare September 16, 2022 21:24
@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

Signed-off-by: Suraj Singh <surajrider@gmail.com>
@github-actions

This comment was marked as outdated.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@codecov-commenter
Copy link

Codecov Report

Merging #4532 (703427f) into main (72e6801) will increase coverage by 0.12%.
The diff coverage is 80.00%.

@@             Coverage Diff              @@
##               main    #4532      +/-   ##
============================================
+ Coverage     70.62%   70.74%   +0.12%     
- Complexity    57263    57368     +105     
============================================
  Files          4617     4617              
  Lines        275499   275509      +10     
  Branches      40331    40332       +1     
============================================
+ Hits         194562   194909     +347     
+ Misses        64528    64259     -269     
+ Partials      16409    16341      -68     
Impacted Files Coverage Δ
...in/java/org/opensearch/index/shard/IndexShard.java 69.69% <80.00%> (+0.32%) ⬆️
...n/indices/forcemerge/ForceMergeRequestBuilder.java 0.00% <0.00%> (-75.00%) ⬇️
...n/admin/cluster/node/tasks/get/GetTaskRequest.java 30.30% <0.00%> (-63.64%) ⬇️
.../indices/forcemerge/TransportForceMergeAction.java 25.00% <0.00%> (-58.34%) ⬇️
.../java/org/opensearch/node/NodeClosedException.java 50.00% <0.00%> (-50.00%) ⬇️
...ch/transport/ReceiveTimeoutTransportException.java 50.00% <0.00%> (-50.00%) ⬇️
...cluster/coordination/PendingClusterStateStats.java 20.00% <0.00%> (-48.00%) ⬇️
...opensearch/persistent/PersistentTasksExecutor.java 22.22% <0.00%> (-44.45%) ⬇️
...h/action/ingest/SimulateDocumentVerboseResult.java 60.71% <0.00%> (-39.29%) ⬇️
.../admin/cluster/node/tasks/get/GetTaskResponse.java 28.57% <0.00%> (-35.72%) ⬇️
... and 467 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@dreamer-89 dreamer-89 changed the title [Segment Replication] Return empty replication checkpoint on closed indices [Segment Replication] Update getLatestReplicationCheckpoint for closed indices Sep 16, 2022
@dreamer-89
Copy link
Member Author

@mch2 : Addressed the review comments, can you please have a look ?

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@dreamer-89
Copy link
Member Author

Flaky test tracked in #4162

REPRODUCE WITH: ./gradlew ':server:internalClusterTest' --tests "org.opensearch.cluster.routing.allocation.decider.DiskThresholdDeciderIT.testRestoreSnapshotAllocationDoesNotExceedWatermark" -Dtests.seed=A21B2927A9499A55 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=sr-Latn-ME -Dtests.timezone=America/Detroit -Druntime.java=17

org.opensearch.cluster.routing.allocation.decider.DiskThresholdDeciderIT > testRestoreSnapshotAllocationDoesNotExceedWatermark FAILED
    java.lang.AssertionError: 
    Expected: an empty collection
         but: <[[rjewwteugy][3], node[xWCKx2aaRSSe57cBd-X-7w], [P], s[STARTED], a[id=QQKbs0HDSBSGbt78lZOipw], [rjewwteugy][1], node[xWCKx2aaRSSe57cBd-X-7w], [P], s[STARTED], a[id=HAVu-gyGTUqfxhOBO40uhw]]>
        at __randomizedtesting.SeedInfo.seed([A21B2927A9499A55:B0A38F46C2F085ED]:0)
        at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
        at org.junit.Assert.assertThat(Assert.java:964)
        at org.junit.Assert.assertThat(Assert.java:930)
        at org.opensearch.cluster.routing.allocation.decider.DiskThresholdDeciderIT.lambda$testRestoreSnapshotAllocationDoesNotExceedWatermark$2(DiskThresholdDeciderIT.java:260)
        at org.opensearch.test.OpenSearchTestCase.assertBusy(OpenSearchTestCase.java:1049)
        at org.opensearch.test.OpenSearchTestCase.assertBusy(OpenSearchTestCase.java:1022)
        at org.opensearch.cluster.routing.allocation.decider.DiskThresholdDeciderIT.testRestoreSnapshotAllocationDoesNotExceedWatermark(DiskThresholdDeciderIT.java:260)

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

// If index is closed, read replication checkpoint from last committed segments info on local store
if (indexSettings.getIndexMetadata().getState() == IndexMetadata.State.CLOSE) {
try {
final SegmentInfos segmentInfos = store.readLastCommittedSegmentsInfo();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will this behave if the engine is not yet open? I think we need to check for the existence of a commit first. This method will throw if one is not present.

@anasalkouz
Copy link
Member

@dreamer-89 Is this still required? it's pending for review for a while

@dreamer-89
Copy link
Member Author

@dreamer-89 Is this still required? it's pending for review for a while

@anasalkouz : Yes, this is needed and appears while running ClosedIndices* integration tests. This was initially needed for CCR (as mentioned in description) but now appears to be a valid use case related to closed indices. I will work on this PR.

@dreamer-89
Copy link
Member Author

Closing this stale PR, tracking fix in #6828

@dreamer-89 dreamer-89 closed this Mar 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants