Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow _update and upsert to read from the transaction log #29264

Merged
merged 5 commits into from
Mar 28, 2018

Conversation

s1monw
Copy link
Contributor

@s1monw s1monw commented Mar 27, 2018

We historically removed reading from the transaction log to get consistent
results from _GET calls. There was also the motivation that the read-modify-update
principle we apply should not be hidden from the user. We still agree on the fact
that we should not hide these aspects but the impact on updates is quite significant
especially if the same documents is updated before it's written to disk and made serachable.

This change adds back the ability to read from the transaction log but only for update calls.
Calls to the _GET API will always do a refresh if necessary to return consistent results ie.
if stored fields or DocValues Fields are requested.

Closes #26802

We historically removed reading from the transaction log to get consistent
results from _GET calls. There was also the motivation that the read-modify-update
principle we apply should not be hidden from the user. We still agree on the fact
that we should not hide these aspects but the impact on updates is quite significant
especially if the same documents is updated before it's written to disk and made serachable.

This change adds back the ability to read from the transaction log but only for update calls.
Calls to the _GET API will always do a refresh if necessary to return consistent results ie.
if stored fields or DocValues Fields are requested.
@s1monw s1monw added >enhancement v7.0.0 v6.3.0 :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. labels Mar 27, 2018
@s1monw s1monw requested review from jpountz and bleskes March 27, 2018 14:03
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me overall. I left some minor questions.

}
if (operation.routing() != null && visitor.needsField(FAKE_ROUTING_FIELD) == StoredFieldVisitor.Status.YES) {
visitor.stringField(FAKE_ROUTING_FIELD, operation.routing().getBytes(StandardCharsets.UTF_8));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know some visitors won't like that _id is not seen, so maybe assert that visitor.needsField("_id") is false?

new VersionsAndSeqNoResolver.DocIdAndVersion(0, ((Translog.Index) operation).version(), reader, 0));
}
} catch (IOException e) {
throw new UncheckedIOException(e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we seem to be using EngineException as a wrapper in other places?

}
};
}
return null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we throw an exception?

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left some minor questions and comments. Thanks @s1monw

/**
* Returns the translog location for this version value or null. This is optional and might not be tracked all the time.
*/
public Translog.Location getLocation() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add @Nullable?

TranslogReader translogReader = readers.get(i);
if (translogReader.generation == location.generation) {
reader = translogReader;
onClose = acquireTranslogGenFromDeletionPolicy(current.generation);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

closing looks to be an expensive call, because upon closing, the write lock is acquired twice (once in trimUnreferencedReaders, and once in closeFilesIfNoPendingRetentionLocks). I wonder if we need to optimize those two methods now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will look into it

}
if (current.generation == location.generation) {
// fsync here to ensure all buffers are written to disk
current.syncUpTo(location.translogLocation + location.size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why require this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the buffer that we have in memory need to be written to disk if we point to it from the record. I will add a comment

if (versionValue.getLocation() != null) {
try {
Translog.Operation operation = translog.readOperation(versionValue.getLocation());
if (operation != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when do we expect this to be null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the translog generation is not available anymore I think it’s unlikely but I see a chance. I can add a comment in the code

new VersionsAndSeqNoResolver.DocIdAndVersion(0, ((Translog.Index) operation).version(), reader, 0));
}
} catch (IOException e) {
throw new EngineException(shardId, "failed to read operation from translog", e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to fail the engine (as we do when indexing)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is the read side of things I am not sure any imo exceptions are fatal?!

@s1monw
Copy link
Contributor Author

s1monw commented Mar 28, 2018

@ywelsch I pushed changes. thanks

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

new VersionsAndSeqNoResolver.DocIdAndVersion(0, ((Translog.Index) operation).version(), reader, 0));
}
} catch (IOException e) {
maybeFailEngine("realtime_get", e); // lets check if the translog has failed with a tragic event
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other places, we have wrapped the maybeFailEngine call as follows:

try {
    maybeFailEngine("index", e);
} catch (Exception inner) {
    e.addSuppressed(inner);
}
throw ...;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's bogus I think. There is no exception throw from maybeFaileEngine only errors which should not be handled. I think we are good here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will clean this up in a followup

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, makes sense

Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for not letting this go.

@s1monw s1monw merged commit 13e19e7 into elastic:master Mar 28, 2018
@s1monw s1monw mentioned this pull request Mar 28, 2018
s1monw added a commit that referenced this pull request Mar 28, 2018
We historically removed reading from the transaction log to get consistent
results from _GET calls. There was also the motivation that the read-modify-update
principle we apply should not be hidden from the user. We still agree on the fact
that we should not hide these aspects but the impact on updates is quite significant
especially if the same documents is updated before it's written to disk and made serachable.

This change adds back the ability to read from the transaction log but only for update calls.
Calls to the _GET API will always do a refresh if necessary to return consistent results ie.
if stored fields or DocValues Fields are requested.

Closes #26802
@jonaf
Copy link

jonaf commented Jun 11, 2018

@s1monw Is there any chance this fix will ever get back ported to the 5.x release branch? I'm coming from Elasticsearch 1.7, and it's significantly more work to upgrade to ES 6.x than ES 5.x due to several breaking changes, notably scripting (we currently make heavy use of Groovy). I'm hoping to upgrade to 5.x as a stepping-stone to 6.x, but my use-case involves heavy, rapid indexing, and this bug slows down indexing in 5.x badly.

@bleskes
Copy link
Contributor

bleskes commented Jun 11, 2018

@jonaf this is a major change and 5.6 is in it's last maintenance mode (until 7.0 is released). It only receive small crucial bug fixes or security fixes. I'm afraid this one can't go there. I understand you want to upgrade from 1.7 (it's impressive how long it worked for you) but you'd have to invest in moving to 6.x, if you can't find a way to work around this (like batching updates to make sure the extra refresh in 5.x has less costs).

@jonaf
Copy link

jonaf commented Jun 11, 2018

Thanks for your reply, @bleskes . We're already batching (to the maximum extent that it improves indexing speed), so the impact is too dramatic for us to ignore. We'll have to go all the way to 6.x, then.

@luyuncheng
Copy link
Contributor

Hi, I checked the code shows that _GET call always read from a reader for consistency, How about If we use _GET api with preference realtime=false which actually cause refresh action, or preference version the same as document version in translog can get from translog? @s1monw

ywelsch added a commit that referenced this pull request Nov 9, 2019
The realtime GET API currently has erratic performance in case where a document is accessed
that has just been indexed but not refreshed yet, as the implementation will currently force an
internal refresh in that case. Refreshing can be an expensive operation, and also will block the
thread that executes the GET operation, blocking other GETs to be processed. In case of
frequent access of recently indexed documents, this can lead to a refresh storm and terrible
GET performance.

While older versions of Elasticsearch (2.x and older) did not trigger refreshes and instead opted
to read from the translog in case of realtime GET API or update API, this was removed in 5.0
(#20102) to avoid inconsistencies between values that were returned from the translog and
those returned by the index. This was partially reverted in 6.3 (#29264) to allow _update and
upsert to read from the translog again as it was easier to guarantee consistency for these, and
also brought back more predictable performance characteristics of this API. Calls to the realtime
GET API, however, would still always do a refresh if necessary to return consistent results. This
means that users that were calling realtime GET APIs to coordinate updates on client side
(realtime GET + CAS for conditional index of updated doc) would still see very erratic
performance.

This PR (together with #48707) resolves the inconsistencies between reading from translog and
index. In particular it fixes the inconsistencies that happen when requesting stored fields, which
were not available when reading from translog. In case where stored fields are requested, this
PR will reparse the _source from the translog and derive the stored fields to be returned. With
this, it changes the realtime GET API to allow reading from the translog again, avoid refresh
storms and blocking the GET threadpool, and provide overall much better and predictable
performance for this API.
ywelsch added a commit that referenced this pull request Nov 9, 2019
The realtime GET API currently has erratic performance in case where a document is accessed
that has just been indexed but not refreshed yet, as the implementation will currently force an
internal refresh in that case. Refreshing can be an expensive operation, and also will block the
thread that executes the GET operation, blocking other GETs to be processed. In case of
frequent access of recently indexed documents, this can lead to a refresh storm and terrible
GET performance.

While older versions of Elasticsearch (2.x and older) did not trigger refreshes and instead opted
to read from the translog in case of realtime GET API or update API, this was removed in 5.0
(#20102) to avoid inconsistencies between values that were returned from the translog and
those returned by the index. This was partially reverted in 6.3 (#29264) to allow _update and
upsert to read from the translog again as it was easier to guarantee consistency for these, and
also brought back more predictable performance characteristics of this API. Calls to the realtime
GET API, however, would still always do a refresh if necessary to return consistent results. This
means that users that were calling realtime GET APIs to coordinate updates on client side
(realtime GET + CAS for conditional index of updated doc) would still see very erratic
performance.

This PR (together with #48707) resolves the inconsistencies between reading from translog and
index. In particular it fixes the inconsistencies that happen when requesting stored fields, which
were not available when reading from translog. In case where stored fields are requested, this
PR will reparse the _source from the translog and derive the stored fields to be returned. With
this, it changes the realtime GET API to allow reading from the translog again, avoid refresh
storms and blocking the GET threadpool, and provide overall much better and predictable
performance for this API.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. >enhancement release highlight v6.3.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Possible Bulk Update Optimizations
8 participants