-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ShardFollowNodeTask should fetch operation once #32455
Conversation
Today ShardFollowNodeTask might fetch some operations more than once. This happens because we ask the leading for up to max_batch_count operations (instead of the left-over size) for the left-over request. The leading then can freely respond up to the max_batch_count, and at the same time, if one of the previous requests completed, we might issue another read request whose range overlaps with the response of the left-over request. Closes elastic#32453
Pinging @elastic/es-distributed |
I probably don't understand something I want to clarify. The idea of asking for more then the global checkpoint was that when we do that, we know it's the last request we're going to send out and it's also the only one requesting that range - might as well ask for more (until the request comes back with a new global checkpoint, we won't ask for anything else and when it does come back new requests will take the already fetched operations will be taken into account). The log line in the bug report said:
but I can't find that pattern in code, so I can't clarify what it means exactly. What am I missing? PS I'm not saying we shouldn't make the change you're suggesting but I want to understand it better. Regardless, it should be ok to fetch things twice in rare cases. |
Sorry for the confusion. I added this log locally for debugging.
If the requesting is the only ongoing request, we should ask for max_batch_count. However, here I am fixing the left-over request, not the peak request (MVG's term). Suppose the leader_global_checkpoint (fetched by the task) is 2018, last_request_seqno is 2, max_batch_count is 1000. The current code will send concurrently three requests:
The last_request_seqno is 2018 after issuing these three requests. If the global checkpoint on the leader has advanced to 2999; when the first request completed, we will another read request
Yep, I agree we should not enforce this all the time, but we should avoid in obvious cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for clarifying.
final long maxRequiredSeqNo = Math.min(leaderGlobalCheckpoint, from + maxBatchOperationCount - 1); | ||
final int requestBatchCount; | ||
if (numConcurrentReads == 0) { | ||
// If this is the only request, we can treat it as a peek read. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add "add let it optimistically fetch more documents if possible (but not require it)"?
Thanks @bleskes for reviewing. |
* elastic/ccr: (57 commits) ShardFollowNodeTask should fetch operation once (elastic#32455) Do not expose hard-deleted docs in Lucene history (elastic#32333) Tests: Fix convert error tests to use fixed value (elastic#32415) IndicesClusterStateService should replace an init. replica with an init. primary with the same aId (elastic#32374) REST high-level client: parse back _ignored meta field (elastic#32362) [CI] Mute DocumentSubsetReaderTests testSearch Reject follow request if following setting not enabled on follower (elastic#32448) TEST: testDocStats should always use forceMerge (elastic#32450) TEST: avoid merge in testSegmentMemoryTrackedInBreaker TEST: Avoid deletion in FlushIT AwaitsFix IndexShardTests#testDocStats Painless: Add method type to method. (elastic#32441) Remove reference to non-existent store type (elastic#32418) [TEST] Mute failing FlushIT test Fix ordering of bootstrap checks in docs (elastic#32417) [TEST] Mute failing InternalEngineTests#testSeqNoAndCheckpoints Validate source of an index in LuceneChangesSnapshot (elastic#32288) [TEST] Mute failing testConvertLongHexError bump lucene version after backport Upgrade to Lucene-7.5.0-snapshot-608f0277b0 (elastic#32390) ...
Today ShardFollowNodeTask might fetch some operations more than once. This happens because we ask the leading for up to max_batch_count operations (instead of the left-over size) for the left-over request. The leading then can freely respond up to the max_batch_count, and at the same time, if one of the previous requests completed, we might issue another read request whose range overlaps with the response of the left-over request. Closes #32453
Today ShardFollowNodeTask might fetch some operations more than once.
This happens because we ask the leading for up to max_batch_count
operations (instead of the left-over size) for the left-over request.
The leading then can freely respond up to the max_batch_count, and at
the same time, if one of the previous requests completed, we might issue
another read request whose range overlaps with the response of the
left-over request.
Closes #32453