-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Index stale operations to Lucene to have complete history #29679
Conversation
Today, when processing out of order operations, we only add it into translog but skip adding into Lucene. Translog, therefore, has a complete history of sequence numbers while Lucene does not. Since we would like to have a complete history in Lucene, this change makes sure that stale operations will be added to Lucene as soft-deleted documents if required.
Pinging @elastic/es-distributed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some minor things. this looks pretty awesome! @bleskes please take a look at the engine parts.
* @param in the input directory reader | ||
* @return the wrapped reader including soft-deleted documents. | ||
*/ | ||
public static DirectoryReader includeSoftDeletes(DirectoryReader in) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should just call wrapAllDocsLive(DirectoryReader in)
it's really unrelated to soft deletes
/** | ||
* Returns an internal posting list of the given uid | ||
*/ | ||
PostingsEnum getPostingsOrNull(BytesRef id) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use this method here as well?
if (postingsEnum == null) { | ||
continue; | ||
} | ||
final NumericDocValues seqNoDV = leaf.reader().getNumericDocValues(SeqNoFieldMapper.NAME); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can assert that both seqNoDV and primaryTermDV are non null. They are required at least for this code to work.
/** no doc was found in lucene */ | ||
LUCENE_DOC_NOT_FOUND | ||
} | ||
|
||
private OpVsLuceneDocStatus compareToLuceneHistory(final Operation op, final Searcher searcher) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we have a javadoc for this method?
/** the op is older or the same as the one that last modified the doc found in lucene*/ | ||
OP_STALE_OR_EQUAL, | ||
/** the op is stale but its history is existed in Lucene */ | ||
OP_STALE_HISTORY_EXISTED, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe OP_STALE_HISTORY_EXISTS
?
@@ -610,10 +625,10 @@ private OpVsLuceneDocStatus compareOpToLuceneDocBasedOnSeqNo(final Operation op) | |||
if (op.primaryTerm() > existingTerm) { | |||
status = OpVsLuceneDocStatus.OP_NEWER; | |||
} else { | |||
status = OpVsLuceneDocStatus.OP_STALE_OR_EQUAL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please add comments here in what cases we can get into these situations?
I talked to @dnhatn and I'm a bit uncomfortable with the PR changes to how we resolve stale operations in the engine. It seems that the main goal here was to try to avoid indexing duplicates of the same stale operation into Lucene. I'm not sure we need to do this. This is very rare. The alternative would be to accept duplicates under the condition that duplicate seq# also means the same doc version. A few things that are worth noting on top of it:
If we accept the above, we can keep the current (simpler) way we resolve stale operations and follow the following:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left a tiny comment LGTM
@@ -833,6 +834,58 @@ public int length() { | |||
}; | |||
} | |||
|
|||
/** | |||
* Wraps a directory reader to include all live docs. | |||
* The wrapped reader can be used to query documents which are soft-deleted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just say to query all documents
@elasticmachine test this please. |
run sample packaging tests |
@elasticmachine run sample packaging tests |
Today, when processing out of order operations, we only add it into translog but skip adding into Lucene. Translog, therefore, has a complete history of sequence numbers while Lucene does not. Since we would like to have a complete history in Lucene, this change makes sure that stale operations will be added to Lucene as soft-deleted documents if required. Relates #29530
Since elastic#29679 we started adding stale operations to Lucene to have a complete history in Lucene. As the stale docs are rare, we accepted to have duplicate copies of them to keep an engine simple. However, we now need to make sure that we have a single copy per stale operation in Lucene because the Lucene rollback requires a single document for each sequence number.
Today, when processing out of order operations, we only add it into
translog but skip adding into Lucene. Translog, therefore, has a
complete history of sequence numbers while Lucene does not.
Since we would like to have a complete history in Lucene, this change
makes sure that stale operations will be added to Lucene as soft-deleted
documents if required.