Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index stale operations to Lucene to have complete history #29679

Merged
merged 7 commits into from
Apr 27, 2018

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Apr 24, 2018

Today, when processing out of order operations, we only add it into
translog but skip adding into Lucene. Translog, therefore, has a
complete history of sequence numbers while Lucene does not.

Since we would like to have a complete history in Lucene, this change
makes sure that stale operations will be added to Lucene as soft-deleted
documents if required.

Today, when processing out of order operations, we only add it into
translog but skip adding into Lucene. Translog, therefore, has a
complete history of sequence numbers while Lucene does not.

Since we would like to have a complete history in Lucene, this change
makes sure that stale operations will be added to Lucene as soft-deleted
documents if required.
@dnhatn
Copy link
Member Author

dnhatn commented Apr 24, 2018

@s1monw and @bleskes Can you please have a look? I could not make review requests. Thank you!

@dnhatn dnhatn requested review from bleskes and s1monw April 24, 2018 22:13
@dnhatn dnhatn added the :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. label Apr 24, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@dnhatn dnhatn added >enhancement :Distributed Indexing/CCR Issues around the Cross Cluster State Replication features and removed :Distributed Indexing/CCR Issues around the Cross Cluster State Replication features labels Apr 24, 2018
Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some minor things. this looks pretty awesome! @bleskes please take a look at the engine parts.

* @param in the input directory reader
* @return the wrapped reader including soft-deleted documents.
*/
public static DirectoryReader includeSoftDeletes(DirectoryReader in) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should just call wrapAllDocsLive(DirectoryReader in) it's really unrelated to soft deletes

/**
* Returns an internal posting list of the given uid
*/
PostingsEnum getPostingsOrNull(BytesRef id) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use this method here as well?

if (postingsEnum == null) {
continue;
}
final NumericDocValues seqNoDV = leaf.reader().getNumericDocValues(SeqNoFieldMapper.NAME);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can assert that both seqNoDV and primaryTermDV are non null. They are required at least for this code to work.

/** no doc was found in lucene */
LUCENE_DOC_NOT_FOUND
}

private OpVsLuceneDocStatus compareToLuceneHistory(final Operation op, final Searcher searcher) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have a javadoc for this method?

/** the op is older or the same as the one that last modified the doc found in lucene*/
OP_STALE_OR_EQUAL,
/** the op is stale but its history is existed in Lucene */
OP_STALE_HISTORY_EXISTED,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe OP_STALE_HISTORY_EXISTS ?

@@ -610,10 +625,10 @@ private OpVsLuceneDocStatus compareOpToLuceneDocBasedOnSeqNo(final Operation op)
if (op.primaryTerm() > existingTerm) {
status = OpVsLuceneDocStatus.OP_NEWER;
} else {
status = OpVsLuceneDocStatus.OP_STALE_OR_EQUAL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please add comments here in what cases we can get into these situations?

@bleskes
Copy link
Contributor

bleskes commented Apr 25, 2018

I talked to @dnhatn and I'm a bit uncomfortable with the PR changes to how we resolve stale operations in the engine. It seems that the main goal here was to try to avoid indexing duplicates of the same stale operation into Lucene. I'm not sure we need to do this. This is very rare. The alternative would be to accept duplicates under the condition that duplicate seq# also means the same doc version. A few things that are worth noting on top of it:

  1. The Translog already has this property and may have duplicates.
  2. To fully achieve these semantics we need to implement lucene roll backs.

If we accept the above, we can keep the current (simpler) way we resolve stale operations and follow the following:

  1. If an incoming op with a seq# < local checkpoint, skip lucene and only store in the translog (as now)
  2. If not, resolve the latest seq# for the doc id to decide if the op is stale (as today)
  3. Stale ops with seq# > local checkpoint go into lucene with the soft delete flag "on".

@dnhatn dnhatn added the review label Apr 25, 2018
@dnhatn
Copy link
Member Author

dnhatn commented Apr 25, 2018

The alternative would be to accept duplicates under the condition that duplicate seq# also means the same doc version.

@bleskes I pushed 70d5359 to back out the dedup logic as we discussed. Can you please have another look? Thank you!

@dnhatn dnhatn requested a review from s1monw April 25, 2018 21:41
Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a tiny comment LGTM

@@ -833,6 +834,58 @@ public int length() {
};
}

/**
* Wraps a directory reader to include all live docs.
* The wrapped reader can be used to query documents which are soft-deleted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just say to query all documents

@dnhatn
Copy link
Member Author

dnhatn commented Apr 27, 2018

@elasticmachine test this please.

@dnhatn
Copy link
Member Author

dnhatn commented Apr 27, 2018

run sample packaging tests

@dnhatn
Copy link
Member Author

dnhatn commented Apr 27, 2018

@elasticmachine run sample packaging tests

@dnhatn
Copy link
Member Author

dnhatn commented Apr 27, 2018

Thanks @simonw and @bleskes for reviewing.

@dnhatn dnhatn merged commit 8ebca76 into elastic:ccr Apr 27, 2018
@dnhatn dnhatn deleted the lucene-ops-history branch April 27, 2018 23:39
dnhatn added a commit that referenced this pull request May 10, 2018
Today, when processing out of order operations, we only add it into
translog but skip adding into Lucene. Translog, therefore, has a
complete history of sequence numbers while Lucene does not.

Since we would like to have a complete history in Lucene, this change
makes sure that stale operations will be added to Lucene as soft-deleted
documents if required.

Relates #29530
dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request Jul 4, 2018
Since elastic#29679 we started adding stale operations to Lucene to have a complete history in Lucene. As the stale docs are rare, we accepted to have duplicate copies of them to keep an engine simple.

However, we now need to make sure that we have a single copy per stale operation in Lucene because the Lucene rollback requires a single document for each sequence number.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. >enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants