[CCR] Read changes from Lucene instead of translog #30120

martijnvg · 2018-04-25T09:37:11Z

This change cuts over from translog to Lucene changes history in CCR component.
The autoGeneratedIdTimestamp will be handled in a follow-up.

of using the translog

elasticmachine · 2018-04-25T09:37:12Z

Pinging @elastic/es-distributed

s1monw · 2018-04-27T11:16:31Z

x-pack/plugin/ccr/src/main/java/org/elasticsearch/xpack/ccr/action/ShardChangesAction.java

+        return versionDvField.longValue();
+    }
+
+    private static boolean isDeleteOperation(LeafReaderContext leafReaderContext, int segmentDocId) throws IOException {


this is not correct I think we are blocked here unitl we can identify tombstones

dnhatn · 2018-05-03T13:00:27Z

@martijnvg Discussed with @jasontedor, I will continue your work here.

dnhatn · 2018-05-07T02:19:49Z

@martijnvg I've backed out the mapping logic to make this PR as a cut-over from translog to Lucene ops. We can make that in a follow-up. Thank you.

martijnvg · 2018-05-07T08:50:42Z

@dnhatn No problem, I can make that change in a different pr.

bleskes

I like this. I left some initial comments and questions

bleskes · 2018-05-07T15:29:32Z

server/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

+        if (lastRefreshedCheckpoint() < maxSeqNo) {
+            refresh(source, SearcherScope.INTERNAL);
+        }
+        refresh(source, SearcherScope.INTERNAL);


This seems like a mistake?

bleskes · 2018-05-07T15:30:02Z

server/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

@@ -2388,4 +2404,28 @@ public long softUpdateDocuments(Term term, Iterable<? extends Iterable<? extends
            return super.softUpdateDocuments(term, docs, softDeletes);
        }
    }
+
+    /**
+     * Returned the maximum local checkpoint value has been refreshed internally.


nit: the last local checkpoint

bleskes · 2018-05-07T15:30:35Z

server/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

+        @Override
+        public void afterRefresh(boolean didRefresh) {
+            if (didRefresh) {
+                refreshedCheckpoint.getAndUpdate(prev -> Math.max(prev, pendingCheckpoint));


I think this is unsafe? you make capture things that didn't make it into the reader

We only mark seq# as completed after adding its op to Lucene and RefreshListener is notified serially under lock. I think it's safe but we need to discuss to make sure that we won't add something unsafe here.

are you sure this blocks ongoing indexing? I would definitely double check with @s1monw that we want to rely on this semantics (if it is the case). IMO we should keep it simple and just pre-capture the local checkpoint.

@bleskes Yeah, I think I made it too complicated. I replaced getAndUpdate by set.

bleskes · 2018-05-07T15:32:09Z

server/src/main/java/org/elasticsearch/index/engine/LuceneChangesSnapshot.java

+        this.lastSeenSeqNo = fromSeqNo - 1;
+        this.requiredFullRange = requiredFullRange;
+        boolean success = false;
+        final Engine.Searcher engineSearcher = searcherFactory.get();


why do we need supplier? can we just give the searcher as a param?

Oh, I think I see why, it's for closing. I think it's still to pass in a search and close it on exception as you did now.

+1. I passed an engine searcher directly.

bleskes · 2018-05-07T15:35:04Z

server/src/main/java/org/elasticsearch/index/engine/LuceneChangesSnapshot.java

+        final Query rangeQuery = LongPoint.newRangeQuery(SeqNoFieldMapper.NAME, fromSeqNo, toSeqNo);
+        final Sort sortedBySeqNoThenByTerm = new Sort(
+            new SortedNumericSortField(SeqNoFieldMapper.NAME, SortField.Type.LONG),
+            new SortedNumericSortField(SeqNoFieldMapper.PRIMARY_TERM_NAME, SortField.Type.LONG, true)


As discussed - this should be needed in the future. Maybe we should remove it and instead assert that we never have duplicate seq#

I think I miss something here because I think we need it for now but not in the future after we have a Lucene rollback. I will reach out to discuss this.

I'm sorry but I dropped a not in my comment. "this should not be needed in the future." . It's only relevant in cases where the primary dies while indexing is ongoing and we have more than 1 replica. In these cases this primary sort doesn't help because you also need some kind of a deduping mechanism to realy make it work. Such deduping is fairly easy to implement but I'm on the fence to whether we should.

We have dedup in this PR already (line 161-163). The lastSeenSeqNo is used for dedup and range check. I am fine to remove the primary sort and dedup mechanism.

I see. I missed it. I think it's surprising to put it in readDocAsOp and shortcut. I'd prefer to do it in next where do all our state updates and then everything together. it's rare anyway and doesn't require optimization imo. That said, it's all nits. If you prefer it otherwise I'm good. Thanks for clarifying.

I agree, we should not mutate anything in readDocAsOp. I will update this.

@bleskes I moved this to next but we also need to dudup for nested docs then I moved this to readDocAsOp again. I think we should optimize for nested docs. I am open to suggestions here.

bleskes · 2018-05-07T15:36:34Z

server/src/main/java/org/elasticsearch/index/engine/LuceneChangesSnapshot.java

+    @Override
+    public Translog.Operation next() throws IOException {
+        final Translog.Operation op = nextOp();
+        if (requiredFullRange && lastSeenSeqNo < toSeqNo) {


why we to check here that lastSeenSeqNo is < toSeqNo? shouldn't we stop reading before this happens?

Do we also want to assert that seqNo != lastSeeSeqNo?

The caller should continue consuming the snapshot until the next method returns null. In the last call, lastSeenSeqNo equals to toSeqNo and op is null. This guard is added to avoid checking in this case. I am +1 on the assertion.

I'm confused - you check for op==null later on? maybe just put the op!=null check on this outer if?

bleskes · 2018-05-07T15:37:53Z

server/src/main/java/org/elasticsearch/index/engine/LuceneChangesSnapshot.java

+        final Translog.Operation op;
+        final boolean isTombstone = isTombstoneOperation(leaf, segmentDocID);
+        if (isTombstone && fields.uid() == null) {
+            op = new Translog.NoOp(seqNo, primaryTerm, ""); // TODO: store reason in ignored fields?


I tend to say yes? It's very rare and it feels like a good debugging tool. I wonder what other people think?

I will make it in a follow-up.

s1monw · 2018-05-08T15:03:31Z

server/src/main/java/org/elasticsearch/index/engine/LuceneChangesSnapshot.java

+        return op;
+    }
+
+    private boolean isTombstoneOperation(LeafReaderContext leaf, int segmentDocID) throws IOException {


maybe this should just take a LeafReader?

I also wonder if we want to pull the tombstoneDV in the ctor next to List<LeafReaderContext> leaves and a List<NumericDocValues> for seqIds... I think this would be nice and prevent getting stuff from the reader over and over again.

s1monw · 2018-05-08T15:03:35Z

server/src/main/java/org/elasticsearch/index/engine/LuceneChangesSnapshot.java

+        return false;
+    }
+
+    private long readNumericDV(LeafReaderContext leaf, String field, int segmentDocID) throws IOException {


maybe this should just take a LeafReader?

s1monw · 2018-05-08T15:04:43Z

test/framework/src/main/java/org/elasticsearch/index/engine/EngineTestCase.java

@@ -610,7 +609,7 @@ protected static void assertVisibleCount(InternalEngine engine, int numDocs, boo
                default:
                    throw new UnsupportedOperationException("unknown version type: " + versionType);
            }
-            if (randomBoolean()) {
+            if (true || randomBoolean()) {


looks like a left over?

Yeah, I removed it.

s1monw · 2018-05-09T08:28:09Z

server/src/main/java/org/elasticsearch/index/engine/LuceneChangesSnapshot.java

+        final Translog.Operation op;
+        final boolean isTombstone = isTombstoneOperation(leaf, segmentDocID);
+        if (isTombstone && fields.uid() == null) {
+            op = new Translog.NoOp(seqNo, primaryTerm, ""); // TODO: store reason in ignored fields?


s1monw

very cool change! I left some ideas but LGTM overall

s1monw · 2018-05-09T08:29:44Z

server/src/main/java/org/elasticsearch/index/engine/LuceneChangesSnapshot.java

+            this.onClose = engineSearcher;
+            success = true;
+        } finally {
+            if (success == false) {


I think this should be handled on the caller side? We are not responsible for this reference to engine searcher unless fully constructued?

s1monw · 2018-05-09T08:32:13Z

server/src/main/java/org/elasticsearch/index/engine/LuceneChangesSnapshot.java

+    }
+
+    private boolean isTombstoneOperation(LeafReaderContext leaf, int segmentDocID) throws IOException {
+        final NumericDocValues tombstoneDV = leaf.reader().getNumericDocValues(SeqNoFieldMapper.TOMBSTONE_NAME);


I wonder if we can pull all these in the constructor into an array that we can access by index of the leaf reader. this is how we do things in lucene for stuff we access frequently.

@s1monw
I tried but realized that NumericDocValues#advanceExact method requires increasing docID values but it's not the case here. Do you have any suggestion for this?

/** Advance the iterator to exactly {@code target} and return whether * {@code target} has a value. * {@code target} must be greater than or equal to the current * {@link #docID() doc ID} and must be a valid doc ID, ie. ≥ 0 and * < {@code maxDoc}. * After this method returns, {@link #docID()} retuns {@code target}. */ public abstract boolean advanceExact(int target) throws IOException;

I think I need to reset the DV :)

dnhatn · 2018-05-09T15:33:59Z

@s1monw I've updated the snapshot to cache/reload docValues. Can you please have a look? Thank you!

s1monw

left a comment but LGTM in general

s1monw · 2018-05-09T17:19:32Z

server/src/main/java/org/elasticsearch/index/engine/LuceneChangesSnapshot.java

+        private NumericDocValues tombstoneDV;
+
+        CombinedDocValues(LeafReader leafReader) {
+            this.leafReader = leafReader;


please don't load stuff lazily. go and load it all in the ctor. they are in memory anyways.

s1monw · 2018-05-09T17:19:50Z

server/src/main/java/org/elasticsearch/index/engine/LuceneChangesSnapshot.java

+        }
+    }
+
+    private static final NumericDocValues EMPTY_DOC_VALUES = new NumericDocValues() {


this is unnecessary.

s1monw

minor nit. No need for another review.

s1monw · 2018-05-09T20:22:28Z

server/src/main/java/org/elasticsearch/index/engine/LuceneChangesSnapshot.java

@@ -223,76 +224,57 @@ private boolean assertDocSoftDeleted(LeafReader leafReader, int segmentDocId) th
        private NumericDocValues primaryTermDV;


these can all be final?

@s1monw Sadly no. We sometimes need to reload these DocValues if the targeting docId is smaller than the current docId.

if (seqNoDV.docID() > segmentDocId) { seqNoDV = leafReader.getNumericDocValues(SeqNoFieldMapper.NAME) }

Do you have any other idea for this?

dnhatn · 2018-05-09T21:33:11Z

Thanks @martijnvg for the great initial work and @bleskes and @s1monw for helpful reviews.

This commit adds an API to read translog snapshot from Lucene, then cut-over from the existing translog to the new API in CCR. Relates #30086 Relates #29530

* es/ccr: (78 commits) Upgrade to Lucene-7.4-snapshot-6705632810 (elastic#30519) add version compatibility from 6.4.0 after backport, see elastic#30319 (elastic#30390) Security: Simplify security index listeners (elastic#30466) Add proper longitude validation in geo_polygon_query (elastic#30497) Remove Discovery.AckListener.onTimeout() (elastic#30514) Build: move generated-resources to build (elastic#30366) Reindex: Fold "with all deps" project into reindex (elastic#30154) Isolate REST client single host tests (elastic#30504) Solve Gradle deprecation warnings around shadowJar (elastic#30483) SAML: Process only signed data (elastic#30420) Remove BWC repository test (elastic#30500) Build: Remove xpack specific run task (elastic#30487) AwaitsFix IntegTestZipClientYamlTestSuiteIT#indices.split tests Enable soft-deletes in v6.4 LLClient: Add setJsonEntity (elastic#30447) [CCR] Read changes from Lucene instead of translog (elastic#30120) Expose CommonStatsFlags directly in IndicesStatsRequest. (elastic#30163) Silence IndexUpgradeIT test failures. (elastic#30430) Bump Gradle heap to 1792m (elastic#30484) [docs] add warning for read-write indices in force merge documentation (elastic#28869) ...

This PR integrates Lucene soft-deletes (LUCENE-8200) into Elasticsearch. Highlight works in this PR include: 1. Replace hard-deletes by soft-deletes in InternalEngine 2. Use _recovery_source if _source is disabled or modified (elastic#31106) 3. Soft-deletes retention policy based on the global checkpoint (elastic#30335) 4. Read operation history from Lucene instead of translog (elastic#30120) 5. Use Lucene history in peer-recovery (elastic#30522) These works have been done by the whole team; however, these individuals (lexical order) have significant contribution in coding and reviewing: Co-authored-by: Adrien Grand <jpountz@gmail.com> Co-authored-by: Boaz Leskes <b.leskes@gmail.com> Co-authored-by: Jason Tedor <jason@tedor.me> Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com> Co-authored-by: Nhat Nguyen <nhat.nguyen@elastic.co> Co-authored-by: Simon Willnauer <simonw@apache.org>

This PR integrates Lucene soft-deletes(LUCENE-8200) into Elasticsearch. Highlight works in this PR include: - Replace hard-deletes by soft-deletes in InternalEngine - Use _recovery_source if _source is disabled or modified (#31106) - Soft-deletes retention policy based on the global checkpoint (#30335) - Read operation history from Lucene instead of translog (#30120) - Use Lucene history in peer-recovery (#30522) Relates #30086 Closes #29530 --- These works have been done by the whole team; however, these individuals (lexical order) have significant contribution in coding and reviewing: Co-authored-by: Adrien Grand jpountz@gmail.com Co-authored-by: Boaz Leskes b.leskes@gmail.com Co-authored-by: Jason Tedor jason@tedor.me Co-authored-by: Martijn van Groningen martijn.v.groningen@gmail.com Co-authored-by: Nhat Nguyen nhat.nguyen@elastic.co Co-authored-by: Simon Willnauer simonw@apache.org

This PR integrates Lucene soft-deletes(LUCENE-8200) into Elasticsearch. Highlight works in this PR include: - Replace hard-deletes by soft-deletes in InternalEngine - Use _recovery_source if _source is disabled or modified (elastic#31106) - Soft-deletes retention policy based on the global checkpoint (elastic#30335) - Read operation history from Lucene instead of translog (elastic#30120) - Use Lucene history in peer-recovery (elastic#30522) Relates elastic#30086 Closes elastic#29530 --- These works have been done by the whole team; however, these individuals (lexical order) have significant contribution in coding and reviewing: Co-authored-by: Adrien Grand <jpountz@gmail.com> Co-authored-by: Boaz Leskes <b.leskes@gmail.com> Co-authored-by: Jason Tedor <jason@tedor.me> Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com> Co-authored-by: Nhat Nguyen <nhat.nguyen@elastic.co> Co-authored-by: Simon Willnauer <simonw@apache.org>

This PR integrates Lucene soft-deletes(LUCENE-8200) into Elasticsearch. Highlight works in this PR include: - Replace hard-deletes by soft-deletes in InternalEngine - Use _recovery_source if _source is disabled or modified (#31106) - Soft-deletes retention policy based on the global checkpoint (#30335) - Read operation history from Lucene instead of translog (#30120) - Use Lucene history in peer-recovery (#30522) Relates #30086 Closes #29530 --- These works have been done by the whole team; however, these individuals (lexical order) have significant contribution in coding and reviewing: Co-authored-by: Adrien Grand <jpountz@gmail.com> Co-authored-by: Boaz Leskes <b.leskes@gmail.com> Co-authored-by: Jason Tedor <jason@tedor.me> Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com> Co-authored-by: Nhat Nguyen <nhat.nguyen@elastic.co> Co-authored-by: Simon Willnauer <simonw@apache.org>

weizijun · 2019-11-04T08:03:40Z

@martijnvg @dnhatn Excuse me, What is the reason of "Read changes from Lucene instead of translog"?

jasontedor · 2019-11-04T12:30:40Z

@weizijun The read patterns of CCR effectively require random access to the translog, which it doesn’t support. Without random access, reading changes was too slow. We could bolt random access on top of the translog, but then it over-complicates the translog.

martijnvg added 2 commits April 25, 2018 11:17

[CCR] use IndexSearcher to read operations from Lucene index instead

f3a8f95

of using the translog

moved CCRIndexReader to Lucene.java and added a simple test

341eb39

martijnvg added review :Distributed/CCR Issues around the Cross Cluster State Replication features labels Apr 25, 2018

martijnvg requested a review from s1monw April 25, 2018 09:37

s1monw reviewed Apr 27, 2018

View reviewed changes

dnhatn added 9 commits May 4, 2018 13:52

Merge branch 'ccr' into ccr_from_translog_to_lucene

df85c61

use existing Lucene

59b69e3

Move to lucene snapshot

1b69093

Merge branch 'ccr' into ccr_from_translog_to_lucene

98ab2ea

Use the changes snapshot

f86dc1d

More test

1fe57c0

backout mapping changes

ce6d8da

harden tests

974c44c

Simulate rollback in test

23b8c51

dnhatn self-assigned this May 7, 2018

dnhatn requested review from bleskes, s1monw and jasontedor May 7, 2018 02:14

dnhatn changed the title ~~[CCR] use IndexSearcher to read operations from Lucene index instead of using the translog~~ [CCR] Read changes from Lucene instead of translog May 7, 2018

dnhatn added the >feature label May 7, 2018

Remove onClose callback

f2415e7

bleskes reviewed May 7, 2018

View reviewed changes

dnhatn added 3 commits May 7, 2018 17:49

Boaz’s feedbacks

29a145e

Merge branch 'ccr' into ccr_from_translog_to_lucene

8d8c6b1

Capture and set checkpoint

2b559b5

dnhatn added 2 commits May 8, 2018 17:31

index.soft_deletes -> index.soft_deletes.enabled

09c48ea

Merge branch 'ccr' into ccr_from_translog_to_lucene

f8b74fa

dnhatn requested a review from bleskes May 9, 2018 03:12

s1monw reviewed May 9, 2018

View reviewed changes

s1monw approved these changes May 9, 2018

View reviewed changes

dnhatn added 2 commits May 9, 2018 11:10

Cache DocValues

aa1f1c0

Let caller release searcher when failed to open snapshot

c3b0e7a

s1monw approved these changes May 9, 2018

View reviewed changes

Load DocValues eagerly

3b8c63b

s1monw approved these changes May 9, 2018

View reviewed changes

dnhatn merged commit bb6586d into elastic:ccr May 9, 2018

dnhatn added the backport pending label May 9, 2018

dnhatn removed the backport pending label May 10, 2018

dnhatn mentioned this pull request May 10, 2018

Use soft-deletes to maintain document history #29530

Closed

14 tasks

martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request May 11, 2018

[CCR] Added validation checks that were left out of elastic#30120

80f5ccb

martijnvg added a commit that referenced this pull request May 16, 2018

[CCR] Add validation checks that were left out of #30120 (#30463)

596ec18

martijnvg added a commit that referenced this pull request May 16, 2018

[CCR] Add validation checks that were left out of #30120 (#30463)

556ae8f

dnhatn mentioned this pull request Aug 29, 2018

Integrates soft-deletes into Elasticsearch #33222

Merged

		@@ -223,76 +224,57 @@ private boolean assertDocSoftDeleted(LeafReader leafReader, int segmentDocId) th
		private NumericDocValues primaryTermDV;

[CCR] Read changes from Lucene instead of translog #30120

[CCR] Read changes from Lucene instead of translog #30120

Conversation

martijnvg commented Apr 25, 2018 • edited by dnhatn Loading

elasticmachine commented Apr 25, 2018

Choose a reason for hiding this comment

dnhatn commented May 3, 2018

dnhatn commented May 7, 2018

martijnvg commented May 7, 2018

bleskes left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

s1monw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dnhatn commented May 9, 2018

s1monw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

s1monw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dnhatn commented May 9, 2018

weizijun commented Nov 4, 2019

jasontedor commented Nov 4, 2019

martijnvg commented Apr 25, 2018 •

edited by dnhatn

Loading