Allow _update and upsert to read from the transaction log #29264

s1monw · 2018-03-27T14:03:58Z

We historically removed reading from the transaction log to get consistent
results from _GET calls. There was also the motivation that the read-modify-update
principle we apply should not be hidden from the user. We still agree on the fact
that we should not hide these aspects but the impact on updates is quite significant
especially if the same documents is updated before it's written to disk and made serachable.

This change adds back the ability to read from the transaction log but only for update calls.
Calls to the _GET API will always do a refresh if necessary to return consistent results ie.
if stored fields or DocValues Fields are requested.

Closes #26802

We historically removed reading from the transaction log to get consistent results from _GET calls. There was also the motivation that the read-modify-update principle we apply should not be hidden from the user. We still agree on the fact that we should not hide these aspects but the impact on updates is quite significant especially if the same documents is updated before it's written to disk and made serachable. This change adds back the ability to read from the transaction log but only for update calls. Calls to the _GET API will always do a refresh if necessary to return consistent results ie. if stored fields or DocValues Fields are requested.

elasticmachine · 2018-03-27T14:04:00Z

Pinging @elastic/es-distributed

jpountz

Looks good to me overall. I left some minor questions.

jpountz · 2018-03-27T15:42:51Z

server/src/main/java/org/elasticsearch/index/engine/TranslogLeafReader.java

+        }
+        if (operation.routing() != null && visitor.needsField(FAKE_ROUTING_FIELD) == StoredFieldVisitor.Status.YES) {
+            visitor.stringField(FAKE_ROUTING_FIELD, operation.routing().getBytes(StandardCharsets.UTF_8));
+        }


I know some visitors won't like that _id is not seen, so maybe assert that visitor.needsField("_id") is false?

jpountz · 2018-03-27T16:09:53Z

server/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

+                                        new VersionsAndSeqNoResolver.DocIdAndVersion(0, ((Translog.Index) operation).version(), reader, 0));
+                                }
+                            } catch (IOException e) {
+                                throw new UncheckedIOException(e);


we seem to be using EngineException as a wrapper in other places?

jpountz · 2018-03-27T16:11:21Z

server/src/main/java/org/elasticsearch/index/engine/TranslogLeafReader.java

+                }
+            };
+        }
+        return null;


should we throw an exception?

jpountz

LGTM

ywelsch

I've left some minor questions and comments. Thanks @s1monw

ywelsch · 2018-03-28T09:15:19Z

server/src/main/java/org/elasticsearch/index/engine/VersionValue.java

+    /**
+     * Returns the translog location for this version value or null. This is optional and might not be tracked all the time.
+     */
+    public Translog.Location getLocation() {


can you add @Nullable?

ywelsch · 2018-03-28T09:25:48Z

server/src/main/java/org/elasticsearch/index/translog/Translog.java

+                    TranslogReader translogReader = readers.get(i);
+                    if (translogReader.generation == location.generation) {
+                        reader = translogReader;
+                        onClose = acquireTranslogGenFromDeletionPolicy(current.generation);


closing looks to be an expensive call, because upon closing, the write lock is acquired twice (once in trimUnreferencedReaders, and once in closeFilesIfNoPendingRetentionLocks). I wonder if we need to optimize those two methods now.

I will look into it

ywelsch · 2018-03-28T09:26:52Z

server/src/main/java/org/elasticsearch/index/translog/Translog.java

+            }
+            if (current.generation == location.generation) {
+                // fsync here to ensure all buffers are written to disk
+                current.syncUpTo(location.translogLocation + location.size);


why require this?

Because the buffer that we have in memory need to be written to disk if we point to it from the record. I will add a comment

ywelsch · 2018-03-28T09:37:20Z

server/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

+                        if (versionValue.getLocation() != null) {
+                            try {
+                                Translog.Operation operation = translog.readOperation(versionValue.getLocation());
+                                if (operation != null) {


when do we expect this to be null?

Given the translog generation is not available anymore I think it’s unlikely but I see a chance. I can add a comment in the code

ywelsch · 2018-03-28T09:40:41Z

server/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

+                                        new VersionsAndSeqNoResolver.DocIdAndVersion(0, ((Translog.Index) operation).version(), reader, 0));
+                                }
+                            } catch (IOException e) {
+                                throw new EngineException(shardId, "failed to read operation from translog", e);


do we want to fail the engine (as we do when indexing)?

So this is the read side of things I am not sure any imo exceptions are fatal?!

s1monw · 2018-03-28T11:41:54Z

@ywelsch I pushed changes. thanks

ywelsch

LGTM

ywelsch · 2018-03-28T12:15:58Z

server/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

+                                        new VersionsAndSeqNoResolver.DocIdAndVersion(0, ((Translog.Index) operation).version(), reader, 0));
+                                }
+                            } catch (IOException e) {
+                                maybeFailEngine("realtime_get", e); // lets check if the translog has failed with a tragic event


In other places, we have wrapped the maybeFailEngine call as follows:

try { maybeFailEngine("index", e); } catch (Exception inner) { e.addSuppressed(inner); } throw ...;

that's bogus I think. There is no exception throw from maybeFaileEngine only errors which should not be handled. I think we are good here?

I will clean this up in a followup

right, makes sense

bleskes

LGTM. Thanks for not letting this go.

We historically removed reading from the transaction log to get consistent results from _GET calls. There was also the motivation that the read-modify-update principle we apply should not be hidden from the user. We still agree on the fact that we should not hide these aspects but the impact on updates is quite significant especially if the same documents is updated before it's written to disk and made serachable. This change adds back the ability to read from the transaction log but only for update calls. Calls to the _GET API will always do a refresh if necessary to return consistent results ie. if stored fields or DocValues Fields are requested. Closes #26802

jonaf · 2018-06-11T20:57:41Z

@s1monw Is there any chance this fix will ever get back ported to the 5.x release branch? I'm coming from Elasticsearch 1.7, and it's significantly more work to upgrade to ES 6.x than ES 5.x due to several breaking changes, notably scripting (we currently make heavy use of Groovy). I'm hoping to upgrade to 5.x as a stepping-stone to 6.x, but my use-case involves heavy, rapid indexing, and this bug slows down indexing in 5.x badly.

bleskes · 2018-06-11T21:22:24Z

@jonaf this is a major change and 5.6 is in it's last maintenance mode (until 7.0 is released). It only receive small crucial bug fixes or security fixes. I'm afraid this one can't go there. I understand you want to upgrade from 1.7 (it's impressive how long it worked for you) but you'd have to invest in moving to 6.x, if you can't find a way to work around this (like batching updates to make sure the extra refresh in 5.x has less costs).

jonaf · 2018-06-11T21:56:05Z

Thanks for your reply, @bleskes . We're already batching (to the maximum extent that it improves indexing speed), so the impact is too dramatic for us to ignore. We'll have to go all the way to 6.x, then.

luyuncheng · 2019-08-29T15:09:59Z

Hi, I checked the code shows that _GET call always read from a reader for consistency, How about If we use _GET api with preference realtime=false which actually cause refresh action, or preference version the same as document version in translog can get from translog? @s1monw

The realtime GET API currently has erratic performance in case where a document is accessed that has just been indexed but not refreshed yet, as the implementation will currently force an internal refresh in that case. Refreshing can be an expensive operation, and also will block the thread that executes the GET operation, blocking other GETs to be processed. In case of frequent access of recently indexed documents, this can lead to a refresh storm and terrible GET performance. While older versions of Elasticsearch (2.x and older) did not trigger refreshes and instead opted to read from the translog in case of realtime GET API or update API, this was removed in 5.0 (#20102) to avoid inconsistencies between values that were returned from the translog and those returned by the index. This was partially reverted in 6.3 (#29264) to allow _update and upsert to read from the translog again as it was easier to guarantee consistency for these, and also brought back more predictable performance characteristics of this API. Calls to the realtime GET API, however, would still always do a refresh if necessary to return consistent results. This means that users that were calling realtime GET APIs to coordinate updates on client side (realtime GET + CAS for conditional index of updated doc) would still see very erratic performance. This PR (together with #48707) resolves the inconsistencies between reading from translog and index. In particular it fixes the inconsistencies that happen when requesting stored fields, which were not available when reading from translog. In case where stored fields are requested, this PR will reparse the _source from the translog and derive the stored fields to be returned. With this, it changes the realtime GET API to allow reading from the translog again, avoid refresh storms and blocking the GET threadpool, and provide overall much better and predictable performance for this API.

s1monw added >enhancement v7.0.0 v6.3.0 :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. labels Mar 27, 2018

s1monw requested review from jpountz and bleskes March 27, 2018 14:03

jpountz approved these changes Mar 27, 2018

View reviewed changes

s1monw added 2 commits March 27, 2018 21:00

apply feedback from @jpountz

3fa0fc0

Merge branch 'master' into read_from_translog_for_updates

41a3c32

jpountz approved these changes Mar 27, 2018

View reviewed changes

ywelsch reviewed Mar 28, 2018

View reviewed changes

s1monw added 2 commits March 28, 2018 13:40

apply feedback from @ywelch

3b84354

add annontation

5be76c5

ywelsch approved these changes Mar 28, 2018

View reviewed changes

bleskes approved these changes Mar 28, 2018

View reviewed changes

s1monw merged commit 13e19e7 into elastic:master Mar 28, 2018

s1monw added the backport pending label Mar 28, 2018

s1monw mentioned this pull request Mar 28, 2018

Slow bulk updates #23792

Closed

martijnvg mentioned this pull request Mar 29, 2018

Make doc values accessible from update scripts #29290

Closed

jasontedor mentioned this pull request Apr 5, 2018

[CI] TranslogTests#testFatalIOExceptionsWhileWritingConcurrently failure #29390

Closed

bleskes added the release highlight label Apr 9, 2018

bleskes removed the backport pending label May 16, 2018

jimczi mentioned this pull request May 30, 2018

Elasticsearch should support returning doc values in get and update APIs too #27374

Closed

bra-fsn mentioned this pull request Jul 29, 2018

Deprecating _primary preference makes getting consistent results impossible(?) #31929

Closed

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

This was referenced Oct 30, 2019

Return consistent source in updates #48707

Merged

Allow realtime get to read from translog #48843

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow _update and upsert to read from the transaction log #29264

Allow _update and upsert to read from the transaction log #29264

s1monw commented Mar 27, 2018 •

edited

Loading

elasticmachine commented Mar 27, 2018

jpountz left a comment

jpountz Mar 27, 2018

jpountz Mar 27, 2018

jpountz Mar 27, 2018

jpountz left a comment

ywelsch left a comment

ywelsch Mar 28, 2018

ywelsch Mar 28, 2018

s1monw Mar 28, 2018

ywelsch Mar 28, 2018

s1monw Mar 28, 2018

ywelsch Mar 28, 2018

s1monw Mar 28, 2018

ywelsch Mar 28, 2018

s1monw Mar 28, 2018

s1monw commented Mar 28, 2018

ywelsch left a comment

ywelsch Mar 28, 2018

s1monw Mar 28, 2018

s1monw Mar 28, 2018

ywelsch Mar 28, 2018

bleskes left a comment

jonaf commented Jun 11, 2018

bleskes commented Jun 11, 2018

jonaf commented Jun 11, 2018

luyuncheng commented Aug 29, 2019

Allow _update and upsert to read from the transaction log #29264

Allow _update and upsert to read from the transaction log #29264

Conversation

s1monw commented Mar 27, 2018 • edited Loading

elasticmachine commented Mar 27, 2018

jpountz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jpountz left a comment

Choose a reason for hiding this comment

ywelsch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

s1monw commented Mar 28, 2018

ywelsch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bleskes left a comment

Choose a reason for hiding this comment

jonaf commented Jun 11, 2018

bleskes commented Jun 11, 2018

jonaf commented Jun 11, 2018

luyuncheng commented Aug 29, 2019

s1monw commented Mar 27, 2018 •

edited

Loading