-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read from a checkpoint for RFS #1149
Read from a checkpoint for RFS #1149
Conversation
Signed-off-by: Mikayla Thompson <thomika@amazon.com>
Signed-off-by: Mikayla Thompson <thomika@amazon.com>
Signed-off-by: Mikayla Thompson <thomika@amazon.com>
@@ -107,7 +107,7 @@ protected RfsLuceneDocument getDocument(IndexReader reader, int docId, boolean i | |||
|
|||
// Start reindexing in a separate thread | |||
Thread reindexThread = new Thread(() -> { | |||
reindexer.reindex("test-index", reader.readDocuments(), mockContext).block(); | |||
reindexer.reindex("test-index", reader.readDocuments(0, 0), mockContext).block(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the override exist too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -114,35 +114,40 @@ protected DirectoryReader getReader() throws IOException {// Get the list of com | |||
} | |||
} | |||
|
|||
Publisher<RfsLuceneDocument> readDocsByLeavesInParallel(DirectoryReader reader) { | |||
var segmentsToReadAtOnce = 5; // Arbitrary value | |||
/* Start reading docs from a specific segment and document id. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit - spacing
.addArgument(reader::maxDoc) | ||
.addArgument(() -> reader.leaves().size()) | ||
.log(); | ||
.addArgument(reader::maxDoc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are spacing differences in this PR that are making it look bigger than it is
int startingSegmentIndex; | ||
int startingDocId; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend renaming the class now that these two fields are here too. Feels more like a shard cursor
return new IndexAndShard(components[0], Integer.parseInt(components[1]), | ||
components.length >= 3 ? Integer.parseInt(components[2]) : 0, | ||
components.length >= 4 ? Integer.parseInt(components[3]) : 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hate java - sorry that you don't have pattern matching
Signed-off-by: Mikayla Thompson <thomika@amazon.com>
var verifier = StepVerifier.create(documents); | ||
var expectedDocumentIds = documentIds.get(i); | ||
expectedDocumentIds.forEach(id -> { | ||
verifier.expectNextMatches(doc -> { | ||
Assertions.assertEquals(id, doc.id); | ||
return true; | ||
}); | ||
}); | ||
|
||
verifier.expectComplete().verify(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really nice code - but you won't get full context back when it fails. Something to consider would be to just concatenate the expected data into a list or json & then do the same for the test data. JUnit runners are pretty good about showing diffs
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1149 +/- ##
============================================
+ Coverage 80.72% 80.75% +0.03%
- Complexity 2947 2953 +6
============================================
Files 399 399
Lines 14965 15101 +136
Branches 1017 1021 +4
============================================
+ Hits 12080 12195 +115
- Misses 2274 2292 +18
- Partials 611 614 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚨 Try these New Features:
|
Signed-off-by: Mikayla Thompson <thomika@amazon.com>
Description
One of the key components of sub-shard RFS (by any approach) is to be able to pick up and read documents from a specific point in the shard. Each shard is composed of segments, and documents are indexed sequentially within segments. By specifying the index of the starting segment and the document within that segment, a specific spot within a shard can be pinpointed.
This PR sets the scene to pick up sub-shard work by allowing
IndexAndShard
work items to also specify a starting segment index and doc id. We don't expect this to actually be exercised until https://opensearch.atlassian.net/browse/MIGRATIONS-2128 is merged (which is create work items that specify these values). If they're not present, segment and doc 0 are assumed.Issues Resolved
https://opensearch.atlassian.net/browse/MIGRATIONS-2164
Testing
Unit tests are added, based on the snapshots in https://github.com/opensearch-project/opensearch-migrations/tree/main/RFS/test-resources/snapshots, which have a variety of segment configurations.
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.